Tuesday 23 April 2013

How Mysql query cache can kill your perf

query_cache_size is a tricky tricky variable to tune.

Here is how the Query cache works in 4 lines:
  • For every SELECT made MySQL looks into the cache if there is an entry, if an entry is found it is returned immediately
  • If there is no entry for a query the string of the SELECT is store with the list of tables involved and the result
  • For every Update/Insert/Delete on a table MySQL delete from the cache all the queries that use this table
  • When a thread is looking for something in the cache nobody else can

 I have seen that the cost of those search if you have query_cache_size > 0 is around 15% more cpu but it can be a LOT more than that
In my case the query_cache_size was set to 512Mb and was not a problem for a month, then the system reach a critical point where very small queries were spending 20x more time waiting for their turn to look into the cache than executing the query itself even if the query was in the cache.

The system is a Sugarcrm database for a big company, between 9am and 5pm there is around 1M database opp per hour. If we speak in seconds it is around 300 selects/s and  17updates/s. The problem is that with 17 updates per second the cache is invalidated almost immediately from a front end perspective.
I changed this value to 16Mb yesterday and on with more opp/h the server is using 3 times less CPU, 100% to 30% !! The avg response time of the front-end web pages dropped from 1500ms to 300ms the result is people were spending less time waiting and more time working, it is why the number of opp raise on this day.
I will continue to monitor the CPU use and the avg response time and next week I will try to turn the query cache completely of to see if there is any other gains.

Conclusions:
MySQL query cache is really fantastic for server with long selects on a database not often updated, but in this case most of the time it is better to do this caching on an application layer with memcache for example.
MySQL query cache is horrible if you have a high concurrency system with a lot of update, in this case you can test with a small value 16Mb for example or with no cache at all.

1 comment: