Scaling Ruby on Rails by Caching your database queries
It’s pretty good to use active record in ruby on rails, relationships, find_by methods, where queries everything makes your life much simpler as a developer but sooner or later your application becomes slow. At that time you need to work on optimizations. There are a lot of work around for optimizations and one of them in caching the queries. The company I work for is StudyPad and we have a product ie splashmath and that has over 9 millions users using it across multiple platforms. The amount of traffic that we get is pretty huge and caching one of the things that helped us to solve out scaling problems. Here is a quick idea to cache your queries in Memcache.
Basic requirements
There are few requirements for this. First is the dalli gem and Memcache installed on your machine.
Basic structure
We have a lot of static/seed data which change very less and the queries on those things are pretty high. So first thing to handle that static data.
So in order to get skills for a grade, its pretty simple
Now for that matter every time we are fetching skills for a grade its gonna hit the database and get the results. So its time to get this query and its results cached.
Assuming you have configured dalli gem easily and you have Memcached up and running. Here is a simple method to cache the results
So instead of fetching skills like
we are gonna get it like
So whats its gonna do. It’s gonna fetch the results first time from the database and store it in a cache. Next time its gonna return the results from the cache. Note few things here.
- Understand the pattern of the key. [self.class.name, id, :skills] is your key here.
- Cache will expire in 240.hours. You can customize it as per your needs. Better keep a constant for this somewhere in your application.
- In cached_skills methods we keep records not the active record relations that why we have to convert into array by using to_a else active-record-relation will be cached and database query will be executed.
Expiring the cache.
We are caching the query results but we are not expiring the results. What if some skill has changed. Grade object is not getting any notification for that, so the cache is stale, we need to expire it. So we can write an after_commit hook for skill to expire its grade object’s cache
This is enough to make sure you cache is never stale. There is another way to handle the expiring cache. Let’s see that.
Another way
We redefine the models like this
Note we have added touch: true in skills, and now we redefine our cached_skills method again:
Now just caching this we don’t need to expire the cache manually, whenever skills get updated it will touch its parent object grade, that will update its updated_at value and that specific cache will be never used, as key attribute updated_at has been changed.
The problem with the second approach
But there is a problem. Assume you have 10 different has_many relationships for grade and you are caching it all, now every time a skill has been changed all the other cache keys for grade relationships will be useless too. For example Grade has_many topics
Now in this case changing any skill will make topics cache useless, but that’s not the case when you are trying to expire it manually. So both approaches has pros and cons, first will ask you write more code and second expire cache more frequently. You have to make that choice as per your needs.
What else?
Using the same base principle you can cache lot of queries like
This approach helped us to reduce the load on RDS and make things pretty fast. I hope this will help you too. Let me know your feedback or some tips that made you system faster