Caching: Popular Channels

We keep track of subscriptions to channels; it’s how we tell what channels are popular and what channels are similar. It’s also the largest table in our database, and accessing it has gotten increasingly expensive. To find out what channels are the most popular, we have to count through all those records. Today, I rolled out some caching to fix that.
Each channel has three subscription counts: one for all time, one for this month, and one for today. Each of these values is stored in the cache and retrieved when needed. If a value for a channel isn’t found, it’s calculated and placed into the cache. When someone subscribes to a channel, those values are incremented.
The change that’s controversial (at least between myself and BDK) is how the today and month values were calculated. The used to be calculated over the past 24 hours and 31 days respectively. But that means that the values need to be recalculated much more frequently. The new code calculates them from the start of today and the start of the month respectively. This means that the values will be 0 at the start of the day/month. BDK thinks that this is a big deal; I do not.
One idea I had was that if the top-n results (say, 10) are all 0, return the values for the previous time period. This will keep the display from ever being 0, but keep the efficiency.
Update: The old miss ratio was roughly 80%. Now with this new caching, it’s only 6.5%.

Leave a Reply

You must be logged in to post a comment.