Saturday, 17 August 2013

mysql connection cost in a simple python script

Using New Relic to profile a website in Django of my company I found out that a non negligible amount of time was spend on mysqldb:connect the application is not using a connection pool.

On my local computer I tested the cost of mysql connections here is the script

import cProfile, pstats, MySQLdb

def connect():
    return MySQLdb.connect('127.0.0.1', 'root', 'root')
        
def close(connect):
    connect.close()
        
def query(connection, query_string):
    cursor = connection.cursor()
    cursor.execute(query_string)
    res = cursor.fetchall()
    cursor.close()
    return res

def do_something_simple_on_result(res):
    for row in res:
        a = row[0].lower()
        
def test_multi_connect():       
    for i in range(0, 10000): 
        conn = connect()
        res = query(conn, "select SQL_NO_CACHE host from mysql.user where user='root'")
        do_something_simple_on_result(res)
        close(conn)
        
def test_one_connect():       
    conn = connect()
    for i in range(0, 10000): 
        res = query(conn, "select SQL_NO_CACHE host from mysql.user where user='root'")
        do_something_simple_on_result(res)
    close(conn)

if __name__ == '__main__':
    cProfile.run('test_multi_connect()', 'restats')
    p = pstats.Stats('restats')
    p.sort_stats('cumulative').print_stats('profile_db.py')
    
    cProfile.run('test_one_connect()', 'restats')
    p = pstats.Stats('restats')
    p.sort_stats('cumulative').print_stats('profile_db.py')

And the result:
         1369839 function calls (1369578 primitive calls) in 17.390 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.237    0.237   17.390   17.390 profile_db.py:20(test_multi_connect)
    10000    0.045    0.000   10.750    0.001 profile_db.py:3(connect)
    10000    0.111    0.000    5.517    0.001 profile_db.py:9(query)
    10000    0.029    0.000    0.744    0.000 profile_db.py:6(close)
    10000    0.076    0.000    0.110    0.000 profile_db.py:16(do_something_simple_on_result)

         440096 function calls in 4.263 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.081    0.081    4.263    4.263 profile_db.py:27(test_one_connect)
    10000    0.076    0.000    4.071    0.000 profile_db.py:9(query)
    10000    0.055    0.000    0.083    0.000 profile_db.py:16(do_something_simple_on_result)
        1    0.000    0.000    0.001    0.001 profile_db.py:3(connect)
        1    0.000    0.000    0.000    0.000 profile_db.py:6(close)

Whoo, 66% of the time is spend in opening and closing connection. In simple term opening+closing a connection with mysqldb is taking the same time than 2 simple queries.
If your application is opening and closing a connection for each query you can are maybe trippeling the time spend in your database.

1ms is the average time to connect while of simple query the avg time is 0.5ms

Even if you open only one connection per page view and you are doing 10 queries per page the cost of the connection is still 20% of your total database time.

DB connection pools is the solution (sqlalchemy pool for example)