One idea is in my mind. How can libmemusage.so, libmemintercept of memprof, memprof be merged into one and provide more and better results. E.g. sometimes you want to trace and get the profile when the heap maxes, or you want to have a histogram of allocations. It is possible to write C code for that and integrate it into one. On the other hand with technologies like DTrace one can easily write the histogram generation, profiling, etc with a trace script.
So what can we do on Linux? The thing coming closest to it is Systemtap. You write a trace script and the script gets compiled into a kernel module that is then loaded and is doing the probing. In theory one is able to even trace userspace with it.
The only problem is that SystemTap is not ready yet. To do really useful stuff one has to patch the Kernel with the utrace patch, Ubuntu/Debian are not featuring recent enough elfutils. So it will probably take another year until memprof can be a simple gui around SystemTap and such and probably two years until most distros come with a ready to use SystemTap.
Friday, October 30, 2009
Monday, October 26, 2009
memprof 0.6.2 release
Today I have released memprof 0.6.2. The most prominent change is merging a merge of raster's timegraph for memory allocations and fixing various stability bugs introduced post 0.6.0. The code is currently located on gitorious and the release tarball is here and the shortlog can be seen below:
Cristi Magherusan (2):
some other minor changes, mostly guint -> gsize's
fixed a typo, bug #51556 in the gnome bugzilla
Holger Hans Peter Freyther (10):
mi-perfctr.c: Remove the O_CREAT (from the openSUSE buildservice)
memprof.glade: Open and save the file
Provide a GtkFileChooseButton to select the executable.
merge rasterman's extra window
.gitignore: Ignore generated files
process_find_line: Clarify who is owning the returned pointer
detailwin.c: Fix possible crash when opening the maps file fails
process_locate_symbol: Make sure a valid string is always returned
add_leaf_to_tree: Avoid running into a crash
memprof release 0.6.2
Stefan Schmidt (2):
configure.in: Use AM_SILENT_RULES if available
stack-frame: Introduce macros for stack pointer regs and use them.
Tomasz Mon (2):
configure.in: Search for bfd.h provided by binutils development package
Integrate the detailwin into the main GtkNotebook
William Pitcock (1):
use elf_demangle() in more places
Cristi Magherusan (2):
some other minor changes, mostly guint -> gsize's
fixed a typo, bug #51556 in the gnome bugzilla
Holger Hans Peter Freyther (10):
mi-perfctr.c: Remove the O_CREAT (from the openSUSE buildservice)
memprof.glade: Open and save the file
Provide a GtkFileChooseButton to select the executable.
merge rasterman's extra window
.gitignore: Ignore generated files
process_find_line: Clarify who is owning the returned pointer
detailwin.c: Fix possible crash when opening the maps file fails
process_locate_symbol: Make sure a valid string is always returned
add_leaf_to_tree: Avoid running into a crash
memprof release 0.6.2
Stefan Schmidt (2):
configure.in: Use AM_SILENT_RULES if available
stack-frame: Introduce macros for stack pointer regs and use them.
Tomasz Mon (2):
configure.in: Search for bfd.h provided by binutils development package
Integrate the detailwin into the main GtkNotebook
William Pitcock (1):
use elf_demangle() in more places
Monday, October 12, 2009
What is the size of a QList::Data, RenderObject?
We tend to write classes without really caring about what the compile will do to create the binary file. When looking into performance and specially memory usage and you create certain objects thousands of times it becomes interesting of how much memory one is wasting for padding/no good reason.
The Linux kernel hackers wrote a tool called pahole that will analyze the DWARF2 symbols and then spit out friendly messages like the one below.
In this case QList::Data could have used at least three bytes less memory and changing the definition of sharable and array would have removed a whole in the struct. Maybe that is something for Qt5 to keep in mind.
The research question. Can QtWebKit memory usage be reduced by shrinking some of the Qt structs without losing functionality?
The Linux kernel hackers wrote a tool called pahole that will analyze the DWARF2 symbols and then spit out friendly messages like the one below.
struct Data {
class QBasicAtomicInt ref; /* 0 4 */
int alloc; /* 4 4 */
int begin; /* 8 4 */
int end; /* 12 4 */
uint sharable:1; /* 16:31 4 */
/* XXX 31 bits hole, try to pack */
void * array[1]; /* 20 4 */
/* size: 24, cachelines: 1, members: 6 */
/* bit holes: 1, sum bit holes: 31 bits */
/* last cacheline: 24 bytes */
In this case QList::Data could have used at least three bytes less memory and changing the definition of sharable and array would have removed a whole in the struct. Maybe that is something for Qt5 to keep in mind.
The research question. Can QtWebKit memory usage be reduced by shrinking some of the Qt structs without losing functionality?
Friday, October 02, 2009
Memory profiling on GNU systems
This is a small guide on how to observe memory allocations of a process. When carrying out a change it is not only of interest if all test cases still pass, if the benchmarks are faster but it is also important to figure out if there was a change in storage (stack and RAM) requirement.
If you are using the GNU libc it is likely you have a /lib/libmemusage.so installed on your system. This library can be preloaded using LD_PRELOAD and will intercept calls to malloc,free,realloc and various other calls. In short it will trace memory allocations for you. The limitation of that tool is that it will not tell you how much memory the kernel actually mapped, anything about memory fragmentation, etc.
To use libmemusage all you have to do is to prepend MEMUSAGE_OUTPUT=mytrace and LD_PRELOAD=/lib/libmemusage.so to your application. This will instruct the library to write out a trace to the mytrace file.
This trace file can be converted to a graph using the memusagestat utility. It is not installed by most GNU distributions and can be either build from the glibc sources or from the QtWebKit performance measurement utilities. Using
NOTES: As of today the graph creation with the x-Axis being the time is broken as the generated trace file has some issues, I'm looking into the problem but it will take some more days.

If you are using the GNU libc it is likely you have a /lib/libmemusage.so installed on your system. This library can be preloaded using LD_PRELOAD and will intercept calls to malloc,free,realloc and various other calls. In short it will trace memory allocations for you. The limitation of that tool is that it will not tell you how much memory the kernel actually mapped, anything about memory fragmentation, etc.
To use libmemusage all you have to do is to prepend MEMUSAGE_OUTPUT=mytrace and LD_PRELOAD=/lib/libmemusage.so to your application. This will instruct the library to write out a trace to the mytrace file.
This trace file can be converted to a graph using the memusagestat utility. It is not installed by most GNU distributions and can be either build from the glibc sources or from the QtWebKit performance measurement utilities. Using
memusagestat -o output.png mytrace an image with memory allocations and stack usage like the one at the end of this post will be created. The redline is the heap usage, the green one is the stack usage of the application. The x-scale is the number of allocations.NOTES: As of today the graph creation with the x-Axis being the time is broken as the generated trace file has some issues, I'm looking into the problem but it will take some more days.

Subscribe to:
Posts (Atom)