Analytics

Monday, January 17, 2011

Using systemtap userspace tracing...

At the 27C3 we were running a GSM network and during the preparation I noticed a strange performance problem coming from the database library we are using running. I filled our database with some dummy data and created a file with the queries we normally run and executed time cat queries | sqlite3 file as a mini benchmark. I also hacked this code into our main routine and ran it with time as well. For some reason the code running through the database library was five times slower.

I was a bit puzzled and I decided to use systemtap to explore this to build a hypothesis and to also have the tools to answer the hypothesis. I wanted to find out if if it is slow because our database library is doing some heavy work in the implementation, or because we execute a lot more queries behind the back. I was creating the below probe:


probe process("/usr/lib/libsqlite3.so.0.8.6").function("sqlite3_get_table")
{
a = user_string($zSql);
printf("sqlite3_get_table called '%s'\n", a);
}


This probe will be executed whenever the sqlite3_get_table function of the mentioned library will be called. The $zSql is a variable passed to the sqlite3_get_table function and contains the query to be executed. I am converting the pointer to a local variable and then can print it. Using this simple probe helped me to see which queries were executed by the database library and helped me to do an easy optimisation.

In general it could be very useful to build a set of probes (I think one calls set a tapset) that check for API misusage, e.g. calling functions with certain parameters where something else might be better. E.g. in Glib use truncate instead of assigning "" to the GString, or check for calls to QString::fromUtf16 coming from Qt code itself. On second thought this might be better as a GCC plugin, or both.

3 comments:

milianw said...

I've always wanted to try systemtap. What I don't get in your example: How do you see that a given function is slow? You only print the query - not how long it took to execute. So what did you do?

zecke said...

@milianw: SystemTap provides a 'lot' (not like GNU R) statistics. What should work is to store the pid+fname in a hash and have a probe called on entry and exit of a function.

In my case I saw that for something like SELECT * from SMS; it executed a PRAGMA TABLE_INFO(field_name) for each field of the SMS table. So I knew that a lot more queries were executed compared to the 'cat queries | sqlite3 db' case.

Omar said...

I agree with Zecke, System Tap gives you tons of stats and it's overhead is very negligible. I have been running it on my Linux to track some DVFS system calls and optimize on thermal management. Now i am just wondering how did you use System Tap with android, did you compile it with the kernel or just ran on android emulator?