I was able to attend Flipkart Slash-N 2013 Conference last week (1st Feb 2013). It was an interesting conference, with lot of emphasis on learnings by Flipkart over the years. The speakers were overall good, and some of the invited panel were fantastic.
Although it was tough to keep up with the constant flow of information, I was able to jot down notable stuff from some sessions.
Sharad Sharma – Keynote
1. Organizations turning Strategic Capabilities into Commoditized offerings
2. Commoditization is a boon to the organizations. It helps create building Eco-system
- See Android, Hadoop
3. Two exciting books recommended
- Dealing with Darwin
- The Only Sustainable Edge
4. Search key to Flipkart and other ecommerce companies
Panel Discussion – Data: Stores, Processing and Trends – Dr. Pramod Varma, Joydeep Sen Sarma, Ashok Banerjee, Utkarsh, Regunath Balasubramanian, Anshum Gupta
1. How do we select the Data Processing and Storage Platform ?
2. Selecting HBASE for Scale (Put it the edge of the app, near to the user)
3. HIVE – Build for Analytics and Reliability
4. Selecting NoSQL – How frequent Schema changes happen
5. Selecting between storing Aggregate vs. Exact information
- Sometimes we do not need the accurate information to be available to the end user. Aggregate is good enough.
6. Check out storage companies
- http://www.nimblestorage.com/
7. Data De-duplication a key concern
Flipkart – Lessons learned
1. Prioritize Read vs. Write Queries
- Critical and Non-critical Read and Writes
- Separately scale each type of Data infrastructure (Read scales separately from the Write)
- Find Slow Queries
- Can come from Front-end or Analytics system
- Separate the data infrastructure (separate Read Query infrastructure)
2. Isolate Databases
3. Logging can cause latency – too much instrumentation can be problem
4. SOA – too many connections on services – causes contention and race conditions
5. UI is highly configurable. Configuration is stored in Config DB.
- Config Cache was built
- Agent model that sits on the Web Server and provides a proxy for Cache hits
- This was a bespoke solution
- Did not use Memcache as it would have required PHP to make a separate Network call
6. Deployment was using Ring pattern (stage wise deployment)
7. GearMan – framework that allows for asynchronous inter services communication (similar to Agent model build by Flipkart)
8. Flipkart was caching fully constructed pages for few minutes
- Only for high trafficked pages – like Homepage
- But lots of changes happen every few minutes
9. Flipkart cached PHP serialized product pages
10. Found Caching impacts A/B testing heavily
11. Explored Thrift Serialization
12. Cache modification was done by separate processes that lead to cache change notification. This is build to introduce asynchronous behavior and avoid blocking call to Caching system
- The cache change notification is serviced by their front-end layer
13. Cache was sharded
14. Evaluate Membase and CouchBase
15. Cache TTL invalidation – hardest problem to solve
- made it infinite
- on-demand partial invalidation done by Cache modification layer