This is the second blog in our blog series explaining how we serve millions of recommendations to Paytm users everyday. In this blog we are giving you a view into the architecture of the personalization engine along with sharing some learnings for tech choices.

System Components

Each recommendation request passes through 3 machine learning models as you may remember from this post, each model is implemented in a dedicated component as follows:

  • Category Affinity (CA) determines the top K categories for a given user
  • Product Pool Selection (PPS) retrieves the top N products in the specified container filtered by the category if it is given in the request
  • Collaborative Filter (CF) calculates the personalization score for each given product for this particular customer

picture1

The orchestrator component calls all the three components, potentially more in the future, and racks/sorts the results into the final recommended list of products. Components failure, timeout and recommended products assembly logic are all handled in the orchestrator as well as most of the performance metrics collection. The orchestrator can be called directly if the client uses Akka, but we also provide a thin REST API layer for HTTP clients and JSON response which is what the current client systems use in production.

Designing these components and their overall interaction is an interesting challenge because of the low latency(~30ms) and high throughput SLA (6-10K rps) requirement. At the same time, we wanted to adapt a microservice architecture to have the flexibility that each component can be independently developed, deployed, upgraded and scaled. Initially, the design was a typical HTTP-based microservices, but soon we realised that it is really hard to meet the SLA with JSON serialization and HTTP communication between microservices. The criteria to select the tech stack as follows:

  • Easy way to scale up and out (i.e. vertically and horizontally)
  • Efficient binary serialization for inter-service communication
  • Simple service discovery

We quickly decided to go with Akka cluster as our main stack for our microservice architecture as it meets all our criteria.

  • To scale up we simply add more actors per JVM and to scale out we can run more actors of the same role on different nodes or servers.
  • Messages between actors are serialized objects, we use Kryo instead of the default Java serialization for efficiency. Twitter chill Akka serializer worked well for us in production and was easy to configure.
  • Akka cluster-aware routers simplifies the service discovery. Each service has a fixed name (e.g. cf-service). Any service can define a router that points to the name of the target service. Instances of the same service can join the leave the cluster anytime, the router that points to a particular service name will automatically update its routees list.

Technical Implementation

For simplicity, we started with a single git repo for the entire project. However, to ensure that each microservice can be developed independently, we use SBT multi-project build. Each microservice is an independent project with its own dependencies that can be imported to the IDE to be developed by its owner. All messages between services were defined in a dedicated project that all services depend on. This shared project also included common settings that all services share such as akka cluster name and kryo serialization config.

Each service exposes an actor that understands all the messages that can be processed by the service which represents the microservice API. Internally the service can have any number of actors to perform its logic. Services run in their own JVM and join the Akka cluster as node with a role that matches the service name. In production, we typically run multiple nodes of different roles per server. When the service JVM starts, a router with the service main actor is created. Number of instances for the router is configurable which controls the level of vertical scalability.

For example, when the Collaborative Filtering (CF) service boots, the service is registered with the following code snippet:

The most important part is the name of the service, which must be fixed for other services to discover and communicate with the CF service. For example, for the orchestrator service to communicate with the cfService, it create a cluster-aware router that points to the cfService by its name. Here is a snippet from the code and configuration of the orchestrator service:

The orchestrator service application.conf, has the following snippet:

This works as a very simple, yet powerful service discovery. The only thing that has to be known in advance and fixed is the service name. The cluster-aware router will automatically update the routees list as nodes join and leave the cluster. Consumers of the service only refer to it by the logical name without worrying about the number or the addresses of the actual instances of this particular service.

It is really critical to test that the correct serializer has been used for all messages, that’s why in our test cases we always add a test for the serializer similar to the following snippet:

Orchestrator implementation was an interesting challenge because for each request it has to call all other microservices, wait and combine the results. We did not want to maintain a state for all requests in one place, also we wanted to avoid Akka ask pattern for performance. The solution was to use aggregator pattern and spin an actor per request to do orchestration logic and maintain state for this single request. This actor will also handle the microservices failure and timeout for this particular request. Circuit breakers are injected into this short-lived actor so failure and timeout across multiple requests can be handled properly. This pattern worked very well in production as we did not experience memory leaks or GC-related performance despite the fact that at peak demand we create 3,000+ actors per second across the entire Akka cluster!

Conclusion

Akka cluster implementation for our microservices architecture worked very well for us in production and enabled us to meet our challenging low latency and high throughput SLA. We open sourced some Akka cluster related utilities that we use in production which we now call it Akka batteries. Microservices architecture simplified our development process because services were developed in parallel by its owners. However, it adds deployment and operation complexity. Unlike deploying a single artifact for monolithic systems, now we have many parts to deploy and operate. Next post will cover how did we built our CI/CD pipeline with Ansible to automate the deployment and operation of our system.