From Archive – My talk at Business Technology Summit 2010

I found this video dug out from my archive. It was a short talk I gave at Business Technology Summit 2010 in Bangalore. The talk was meant to describe a project I had completed short while back in early 2010. The project was to build a mixed breed cloud infrastructure for a large Telecom provider. Mixed breed essentially means mix of Architecture choices – x86 and Sun SPARC here.

Scalable Cloud Backend for Mobile Developers

Mobile Apps do not work in isolation. They need to work with scalable backend to allow for customized experience for each user. Apps that need to provide storage, personalization, integration with other services, notifications usually require powerful backend features. Mobile app developers usually do not have experience in working with server side backend implementation and this becomes a bottleneck when working on building great mobile apps. Mobile platforms that allows for easy integration of scalable backend with mobile apps are on rise. In this talk, we will look at how mobile developers can build strong backend foundation for their next big mobile app. This talk will look into code examples and patterns that developers can readily apply to mobile apps.

BTW, Parse as mentioned in the video is acquired by Facebook. It indicates the PaaS has arrived and is going to be the platform of choice.

The presentation is following:-

Notes from Flipkart Slash-N 2013 Conference

I was able to attend Flipkart Slash-N 2013 Conference last week (1st Feb 2013). It was an interesting conference, with lot of emphasis on learnings by Flipkart over the years. The speakers were overall good, and some of the invited panel were fantastic.

Although it was tough to keep up with the constant flow of information, I was able to jot down notable stuff from some sessions.

Sharad Sharma – Keynote

1. Organizations turning Strategic Capabilities into Commoditized offerings

2. Commoditization is a boon to the organizations. It helps create building Eco-system
- See Android, Hadoop

3. Two exciting books recommended
- Dealing with Darwin
- The Only Sustainable Edge

4. Search key to Flipkart and other ecommerce companies

Panel Discussion – Data: Stores, Processing and Trends – Dr. Pramod Varma, Joydeep Sen Sarma, Ashok Banerjee, Utkarsh, Regunath Balasubramanian, Anshum Gupta

1. How do we select the Data Processing and Storage Platform ?

2. Selecting HBASE for Scale (Put it the edge of the app, near to the user)

3. HIVE – Build for Analytics and Reliability

4. Selecting NoSQL – How frequent Schema changes happen

5. Selecting between storing Aggregate vs. Exact information
- Sometimes we do not need the accurate information to be available to the end user. Aggregate is good enough.

6. Check out storage companies

7. Data De-duplication a key concern

Flipkart – Lessons learned

1. Prioritize Read vs. Write Queries
- Critical and Non-critical Read and Writes
- Separately scale each type of Data infrastructure (Read scales separately from the Write)
- Find Slow Queries
- Can come from Front-end or Analytics system
- Separate the data infrastructure (separate Read Query infrastructure)

2. Isolate Databases

3. Logging can cause latency – too much instrumentation can be problem

4. SOA – too many connections on services – causes contention and race conditions

5. UI is highly configurable. Configuration is stored in Config DB.
- Config Cache was built
- Agent model that sits on the Web Server and provides a proxy for Cache hits
- This was a bespoke solution
- Did not use Memcache as it would have required PHP to make a separate Network call

6. Deployment was using Ring pattern (stage wise deployment)

7. GearMan – framework that allows for asynchronous inter services communication (similar to Agent model build by Flipkart)

8. Flipkart was caching fully constructed pages for few minutes
- Only for high trafficked pages – like Homepage
- But lots of changes happen every few minutes

9. Flipkart cached PHP serialized product pages

10. Found Caching impacts A/B testing heavily

11. Explored Thrift Serialization

12. Cache modification was done by separate processes that lead to cache change notification. This is build to introduce asynchronous behavior and avoid blocking call to Caching system
- The cache change notification is serviced by their front-end layer

13. Cache was sharded

14. Evaluate Membase and CouchBase

15. Cache TTL invalidation – hardest problem to solve
- made it infinite
- on-demand partial invalidation done by Cache modification layer

IaaS giving way to “Data Science as a Service”

As organisations start embracing public infrastructure cloud for their critical data and application needs, Data Science SaaS will become more practical for use. The biggest challenge for Data Science SaaS is to have customers expose / store their data to / on their multi-tenant public platform. Increasingly, public cloud services are becoming a compelling ground for enterprises. This means with applications, data also becomes a good prospect to be moved to these platforms.
This is due to low data access latency requirements for apps in cloud and increasing confidence in shared public cloud services.
Once the data is out of the enterprise data center, inter cloud service integration becomes easier. For eg:- Imagine an AWS EC2 infrastructure running your enterprise application, with data lying on S3 / RDS. To extract Business Intelligence and inference out of this data, it is practical to use an existing public SaaS that can work on this data, rather than using in-house analytics infrastructure. Greenfield apps having their genesis on public cloud are already candidate to be used with Data Science services (aka Analytics as a Service).
The future of Data Science SaaS looks promising. Startups like Datameer, ClearStory are doing pioneering work on this.
More on this at :-

Run in Public, Bring it back when its ready

The road to set up your own cloud infrastructure in your backyard is not a long shot, thanks to the array of techniques / patterns / solutions in the space. However, more organizations are looking at using Public Cloud services, especially IaaS and PaaS to experiment what it takes to scale their apps, and then bring the homework back to their own internal infrastructure a.k.a Z-Cloud of Zynga. This is proving to be a common place for organizations these days. And it makes a lot of sense.
Using public cloud services to test the waters in terms of what it takes to sustain and manage application scale is a good pattern. Once the pattern is understood, and the organization understands the nitty-gritties of managing the scale, an internal infrastructure can be setup, thereby owning the control back. This allows for efficient capacity planning and avoiding the common gotchas of deploying and managing the app at cloud scale.

The rise of Cost aware Architectures

Cost aware architectures have been on rise, thanks to the “as a service” phenomena. A lot have been said around these new evolving architecture patterns. The challenge is now to discover some of patterns that architects can use to build cost-aware architectures. In a gist, a Cost aware architecture is defined as an architecture that evolves based on how it sustains and grows revenues when using price varying infrastructure / application / platform services a.k.a cloud services. This means, as the business gets successful, and starts pooling in more revenue from the user, the cost-dynamics of adding capacity (infrastructure, application and platform) becomes less than linear.
As we see more and more application architectures incorporate “as a service” models, this is going to be the trend for future. I like what Werner Vogels talked about it in an article not long back.