Monday, July 27, 2009

python profiling and optimization

python's builtin cProfile/pstats are great, and it is really easy to get started with them. (just look at the top of the doc page for cProfile.) one problem, though, is that cProfile doesn't do line-by-line profiling. hotshot did, but it's deprecated and arguably harder to work with. a little poking around turned up line_profiler and kernprof. looks like they're the pet project of one of the enthought people. i haven't tried it yet but it looks pretty easy to use, even through the api as a part of a larger development tool. i'm thinking the best way to use it would be to run cProfile first and take the top 3 time hog functions (or 1 or 2 if they get over 90% total time) and auto run it again with line_profiler turned on for those functions. maybe i could build this into my doctester by first running the small and fast doctests to confirm correctness, then run some __profileTests__ module string that has more realistic usages. while i'm writing code, i can interrupt the profile tests to rerun the doctests, and i'll see the profile results any time i happen to let it run all the way through. kernprof is pure python and acts as a helper/wrapper for both/either cProfile and line_profiler. but line_profiler uses cython. speaking of cython, for optimization cython looks like an impressive way to generate c source and build it to run fast (and obfuscate, if that matters). i might look at it as a possible alternative to pypy/rpython, if i can't get that to work. one problem, though, is that it looks like so many changes and type annotations might be needed with numpy that i might as well just write the c. i've been very impressed lately with results i get using scipy.weave, with both the blitz converter and the default c-ish converter. i can narrow the hotspots down to one or two methods, and translating the python to c is not too hard then (especially with the default converter). scipy.weave.inline is not too shabby either, for speeding up numpy expressions. so pypy is another very interesting possibility for autogenerating c from python. someone has already shown an example of how to do this, including some initial work with numpy. EDIT: maybe using cython is not as bad as i had thought. apparently it will compile just about any valid python (with a couple of restrictions on closures) with a modest speed up. all the extra syntax is just for making it run faster, and even those annotations can be done in a way that preserves python compatibility of the original source. maybe i can even look at multiple-dispatch packages to make an auto-type discovery based on doctests and profile tests to make my augmenting .pxd files for me. probably the first thing i should do is try to replicate some of the speed-up results from the cython numpy tutorial. there's also another tutorial with a convolution example (similar to chapter 10 in the manual, but slightly updated including warning about None args and calling numpy c funcs from cython) that might be a better place to start.
chapter 11 in the user manual gives an example of how to profile with python's profiling tools, while the wiki page on profiling shows a (partial) example of low-overhead profiling with valgrind. debugging with ddd and gdb is also discussed in a wiki page at wiki.cython.org. also, since a discussion on their email list in april/may 2009, an --embed option has emerged that facilitates embedding a python interpreter with a python entry point. i had been looking at pyinstaller, which looks like a very nice cross-platform way to make standalone executables from python. maybe i won't even need it with cython --embed. details on the above + more cython tips on the wiki. okay, looks like the readme for cython_freeze says cython --embed can only put one module in with the interpreter, while cython_freeze can put any number of cython modules in there. also, it can put a normal interpreter shell in, not just a 'main' module so you can use it interactively. not sure if cython --embed can do that. also, i should look into using plipy for this sort of thing. it will allow the executable to run on any linux machine, from ancient redhat boxen to the latest ubuntu machine. 'twould be nice to preserve some of python's portability when i freeze it to binary. and there's a link there to portable python, which seems to do the same thing for windows. also, virtualenv is worth a look. lots of people endorse it.

https://translate.svn.sourceforge.net/svnroot/translate/src/trunk/virtaal/devsupport/profiling.py
saw a ref to this, which appears to be a way to profile python callables with KCacheGrind.

No comments: