Distributed Cache in SaaS

Distributed Cache in SaaS



There are 2 areas where a cache can be pretty handy for a web application.

The first obviously is performance. A distributed cache is generally used to enhance performance which is essential for any kind of application which demands minimal response times even during the peak load. This is because of the huge difference in latency between fetching data from a CPU register vs a cache vs RAM vs disk vs network.

Below is a graphic that depicts these latencies

distributed-cache-in-saasThe second area is scalability. Generally, the database operations are always costly and database scalability is relatively not as easy as scaling other tiers such as web or application. So by using a caching layer in the middle, the load on the database can be lessened, since results that are cached can be returned quickly from the cache itself instead of a round trip to the database. Of course this strategy would depend upon the nature and volatility of the data.

Yet another usage of a cache w.r.t scalability would be the storing of the user session info. This technique would obliterate the need for sticky sessions or session replication across the load-balancing cluster nodes. Some frameworks like Magento and WordPress have used this technique to bolster their platform.

The SaaS context

One of the obvious requirements for a SaaS application is invariably higher availability due to the diverse nature of the tenants and situations the software would be used.
Also, in a SaaS application invariably the loads will be higher than an enterprise solution as there would be multiple tenants.
For a SaaS scenario, from an availability perspective as well as to cope up with the volume of data, having a distributed cache becomes important. If a particular node of the cache cluster goes down then it will not bring down the system. Hence a distributed cache fits the requirements better.
A pertinent question that one might ask would be, ‘wouldn’t the network latency involved in writing or reading from a remote store just dissipate the time savings of using a cache at all? After all cache is a dish best served local’. The answer to that question would depend on the network bandwidth and speed. If the internal network is capable, the latency on the network would be minimal compared to the disk I/O of the database.

Below are some of the advantages of using a distributed cache:

  • Quite often distributed caches present a REST API to consume for the client applications, due to which multiple applications could make use of a centralized cache cluster in a consistent fashion.
  • Easy maintenance of the cluster without worrying about app server nodes failing

Often for security and reporting reasons the data stores are modeled on a per tenant basis. But the web-layer is transparent and homogeneous across tenants.

What are the data that we can cache?

Immutable data i.e. that does not change is the best candidate for caching.

  • Master data of tenants are good candidates to be pushed into a cache
  • It is a known fact that users most often deal with the same data on a regular basis. So a lot of info has a certain ‘local’-ness about it. Usually it is user-specific or process-specific. For eg. timesheet for a user will usually contain certain projects which the user is working on or even the set of VMs or container instances that the user works with. It is precisely this type of data that can be potential candidates for caching due to their less mutable nature.
  • Data related to authorization can be cached but due attention should be paid to the security aspects. RBAC policies that affect the tenants and their users’ access to specific functionality can be effectively cached as these definitions seldom change once they are defined. Disk I/O due to storing these in the DB can potentially throttle down performance as authorization is often costly (in terms of compute) and is an often performed operation in a SaaS application.

Distributed Cache

The idea of distributed caching has become feasible now because main memory has become very cheap and network cards have become very fast, with 1 Gbit now standard everywhere and 10 Gbit gaining traction. Also, a distributed cache works well on lower cost machines usually employed for web servers as opposed to database servers which require expensive hardware. Some of the distributed caches are:

  • Redis
  • memcached
  • NCache
  • Hazelcast
  • Infinispan
  • Coherence
  • Ehcache

Cache-in the Cloud for Cash in the pocket

There are many reasons to architect a solution as a SaaS offering. The primary among them are:

  • Capex vs Opex
  • Easier maintainability
  • Faster customer on-boarding

The above factors could affect the rate at which customers are acquired for a particular SaaS solution. If the SaaS solution is hosted in a private cloud (the data center), there is a possibility that the infrastructure limits are reached quickly. So a public cloud solution may be better, if there is rapid adoption and many customers are lined up to get on-board. In such a situation, it would be better to simplify the SaaS offering by delegating the caching-related nuances to a cache-as-a-service offered by the public providers. AWS provides ElastiCache and Azure offers its RedisCache in this category. They provide default settings for the cache service and management consoles for managing the distributed cache. It is possible to grow the cache from a few MB to a few TB using these services. As with any public cloud offering, it is possibly to deallocate nodes on the cache cluster via their API once a seasonal spike has been successfully navigated, thereby reducing infrastructure costs.

With a Converged Orchestration framework like Corestack it is possible to seamlessly manage the scale-up and scale-down of the nodes in the cache cluster (private or public service) in real-time. Ping us for your orchestration needs and we would be glad to help you with our secret sauce!


Distributed caches are most useful to bolster performance when there are a large number of objects or a large number of concurrent users accessing some immutable objects repeatedly. In a system where only a few type of objects will be of a significant count (probably tens of thousands), it might not help much in improving the throughput.
But the single most burdensome overhead about using an additional cache layer in any application, Saas or otherwise is, the data store no longer is the single point of truth. There is a penalty to be paid in terms of complexity of the system. The persistent data store as well as the cache need to be kept in sync. This would entail revisiting existing services and modifying their business logic. If the DAO pattern was used it could help to a large extent to deal with the added complexity in an isolated manner to restrict complexity crawl across the system.