Our previous posts provided an overview of the machine learning models that power the recommendations at Paytm and the improvement we achieved with a semi-deep learning model. This series of blog posts focuses on how our data and platform engineers built the systems that serve real-time recommendation in production at scale! with a microservices architecture implemented using Akka cluster. We plan to cover following topics in this series.
- Introduction and Teaser: context, problem definition, data summary, use case, expected added value to all stakeholders including customer, business and machine learning engineers
- Architecture and Technology Stack: System components and interactions, microservices using Akka cluster
- Deployment and Production: production infrastructure, traffic pattern, deployment automation.
- Lesson Learned: this might be the largest post in the series! How did we fix production issues and stabilised the system?
- Cool Technologies in Action: Split Brain Resolver, Circuit Breakers, Kamon (Dynamic SLA), Akka Agent
- What’s Next…
It All Starts With a REST API
When a user navigates to PayTM marketplace, they typically view products grouped by containers as shown in the below example:
When the page loads, we get an API call with the containerId and customerId for logged in customers. Within 50 ms, the API has to respond with a sorted list of recommended productIds that should be displayed to the user. There are two types of product containers: lists with a limited number of products typically displayed in a carousel, and grids of products with an infinite scrolling.
In our first version, the API only supported lists because of a limit of less than 100 recommended products that we store for each customer per container. Serving a recommendation request was equivalent to query ElasticSearch for the list of productIds given the containerId and customerId. However, because we have millions of customers and around 2,000 categories, storing more than 60 recommended products for each customer per container was a real challenge. Furthermore, because of this limitation we could not support grids with infinite scrolling, because of the small pool size of recommended products that we store.
Nevertheless, our first version was a real success and provided all the stakeholders including business folks at PayTM with a hard evidence of the value of the recommendation engine. Obviously, we did not settle with V1 and immediately started V2 design with a primary goal to have real-time scoring for recommendations that removes the 60 products limit and adds support to serve recommendations for grids with infinite scrolling (up to the maximum number of available products).
The key motivation for the real-time scoring for recommendations was to support grids which drive much larger traffic compared to lists. The API interface for V2 is the same, given a containerId and a customerId as input, return a list of recommended productIds. The major difference is that in V2 we need to call 3 different machine learning models in real time, then rack and sort the list of recommended products. We no longer have a pre-sorted list of recommended products that we query directly from ElasticSearch as in V1.
Low latency and high throughput is a typical trade-off in building scalable systems. However, for our use case we needed to achieve both. The recommendation system had a challenging SLA of 50 ms average latency and 4,000 rps throughput. Serving each recommendation request has to go through 3 different machine learning models, each of which has its own data storage and computational logic. Moreover, assembling and sorting the final list of recommended products has a non-trivial logic as well. Oh and we had only 6 weeks to deliver the system in production!
In the following posts will go over our 6-week journey to build the real-time scoring. From designing the system and selecting the technology stack to running and monitoring our production system!