Duhctaep's Lab: 2012

Saturday, December 29, 2012

information and the nature of reality: from physics to metaphysics

is god an information-theoretic principle? i'd like to read this book and find out.

Tuesday, December 11, 2012

hft open source group

looks like interesting discussions, maybe links to code and/or references.

http://www.linkedin.com/groups/Open-Source-HighFrequency-Trading-4405119

Saturday, December 1, 2012

i used pybliographer, and it was fine on linux. but i didn't want to mess around with getting working on windows or cygwin.

i need to check out citeulike.
http://www.citeulike.org/

http://jabref.sourceforge.net/
foss, java cross-platform. seems to be recommended by a lot of people. integration with http://code.google.com/p/pdfmeat/ which i also need to check out.

bibtool
http://www.ctan.org/tex-archive/biblio/bibtex/utils/bibtool/

tkbibtex?

bibtexgui
http://linux.softpedia.com/downloadTag/bibtex+GUI
don't know much about it.

more refs at
http://liinwww.ira.uka.de/bibliography/Bib.Format.html
http://mactex-wiki.tug.org/wiki/index.php/GUI_Tools
http://www.fauskes.net/nb/bibtools/

Wednesday, November 28, 2012

A Stochastic Correlations Model for Default Risk

by Toby Daglish¤and Shaloub Razak
http://www.nzfc.ac.nz/archives/2008/papers/0820.pdf

interesting model for default probabilities used in a cdo: closed form for single firm, monte carlo for multiple firms. calibrate with credit default swap using an extended kalman filter. explains changing default correlation over time in copula models.

empirical mode decomposition

interesting work by danilo mandic at imperial college on emd and related stuff:
http://www.commsp.ee.ic.ac.uk/~mandic/

time-frequency analysis for multichannel and nonstationary data, specifically with emd:
http://www.commsp.ee.ic.ac.uk/~mandic/research/emd.htm

some really impressive stuff with frequency and mode separation, much better than ordinary wavelets.

Multivariate multiscale entropy: A tool for complexity analysis of multichannel data

http://www.commsp.ee.ic.ac.uk/~mandic/research/MUA_DPM_Multivariate_MSE_PRE_2011.pdf

just skimmed an interesting article about estimating multivariate entropy, over multiple time scales. possible applications for mutual information estimation?

Saturday, November 24, 2012

texworks and shell escape

tried to get shell escape working with texworks, so i could do naughty and dangerous things with write18 from latex. there are checkboxes that sound like they are made for this sort of thing in the preferences, but i found them to have no such effect. finally edited the options for the typesetting commands for pdflatex, and added --shell-escape. now it works. unfortunately, i couldn't get it work for the texify combo (pdflatex, makeindex, bibtex) but whatever.

oddly, the same option works for texworks with either miktex or texlive, even though they are different on the command line: -shell-escape for texlive and -enable-write18 for miktex.

Saturday, November 17, 2012

sobol sequences and python

tried it, seems to work just fine.

https://github.com/naught101/sobol_seq/blob/master/sobol_test_output.txt
http://people.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html

latex building

here are some tools for building documents from latex/tex.

texcaller has interfaces for python and c++, among others:
http://www.profv.de/texcaller/download.html

http://obrecht.fr/texwrapper/
http://latex-mk.sourceforge.net/
http://mirror.ox.ac.uk/sites/ctan.org/support/latexmk/latexmk.pdf
http://tug.ctan.org/pkg/autolatex

python + latex

python.sty looks very good, very small, relatively simple, and almost everything i want for calling python from latex. only problem is that it starts a new interpreter for each python environment block.

https://bitbucket.org/brotchie/python-sty/changesets/tip/branch(%22default%22)

to use python.sty with rubber, i will likely need this:
http://bazaar.launchpad.net/~brotchie/rubber/shell-escape/revision/419?start_revid=419

here's some good info on working with files with tex/latex:
http://stackoverflow.com/questions/2115379/write-and-read-from-a-latex-temporary-file

a latex builder similar to rubber, but maybe simpler and easier. i'd reffed this before:
https://bitbucket.org/nmandery/python-pdflatex/

more info on calling out to python from latex:
http://thewikiblog.appspot.com/blog/python-inside-latex
https://github.com/gpoore/pythontex/tree/master/pythontex

mahotas

http://mdp-toolkit.sourceforge.net/

interesting package that's apparently well known in the computer vision world. python, with numerical stuff in c++. algorithms include SURF, watershed, thresholding, convex hull, polygon drawing, haralick textures, local binary patterns, zernike moment, distance transform, and freeimage and imread interface.

modular toolkit for data processing

http://mdp-toolkit.sourceforge.net/

interesting project with a number of capabilities. python code for pca, ica, slow feature analysis, manifold learning methods ([Hessian] local linear embedding), classifiers, factor analysis, rbm, etc.

according to the 'intro to scipy' talk at pydata 2012, it has the fastest pca available in python (even if the interface is more difficult than scipy svd or sklearn.decomposition.PCA).

numba and cython

interesting comparison between numba and cython (and pure python). both projects i want to keep and eye on.

http://jakevdp.github.com/blog/2012/08/24/numba-vs-cython/

Friday, October 26, 2012

kanban

seems to be relatively recent (at least, newer than other agile methods) type of software project management that emerged from a card system at toyota. or, as this decent summary article claims, is more of an overlay on top of other agile methods. focuses on evolutionary improvement with minimal organizational resistance. https://leankitkanban.com/ seems to be one of the prominent sites, not sure if they claim to be originators. but they do offer free trials of online kanban boards if i decide to try it.

'people do not resist change, they resist being changed.'

1. Visualize: Make invisible knowledge work, and make its flow visible.
2. Limit work-in-progress: Implement a kanban system to pull work through from initial idea to finished and delivered.
3. Manage flow: Observe work items to see if they are proceeding at an expected rate.
4. Make management policies explicit: Agree upon and post policies about how work will be handled, including guidelines of risk, governance and priorities.
5. Improve collaboratively using “safe to fail” experiments: Adapt the process experimentally through collaborative agreement on observed problems and the use of models to suggest changes.

nuitka

http://www.nuitka.net/pages/overview.html

interesting project that generates c++ from python. not sure if it's ready for me to grab and use, without becoming a dev. but it's one to keep an eye on and file away with numba, shedskin, cython, py2exe, etc.

Tuesday, October 16, 2012

agile software development

by alistair cockburn, 2002, 005.1 COC

much more philosophical than other dev and planning books i've looked at recently, from a higher-level perspective. maybe relevant for a longer term because of this. i actually like a lot of the earlier chapters' discussion of things like how people think, communicate, and learn, and how habits and mental models affect performance, etc. i especially like the way he characterizes three stages of behavior in learning, based on shu-ha-ri (learn, detach, transcend) in aikido: following, detaching, and fluent. (see www.aikidofaq.com/essays/tin/shuhari.html) nice little snippets giving examples of successes and failures.

agile and iterative development: a manager's guide

by craig larman, 2004 (8th printing in 2007) 005.1 LAR

from the series edited by alistair cockburn of 'agile software development' fame. haven't read it, really, though it looks not to be written for in-depth reading. gives a high-level view of a number of agile techniques, including iterative & evolutionary, agile, scrum, extreme programming, unified process, and evo. some evidence for what works, lots of 'harvard business school' plots to give the gist of the ideas. looks like a good reference to cram with when a team needs to stay on course.

refactoring to patterns

by joshua kerievsky, 2005 (8th printing 2009) 005.16 KER

nice view on a more practical side of design patterns. instead of just looking at an ideal abstraction, this looks at how you refactor toward or away from a particular design pattern, or how you can migrate gradually between them. makes a good point that you rarely are making a dp from a blank slate; even if you are the head project architect, you can only do that at the beginning. once things are underway, or if you're working with an existing codebase, you have to refactor rather than design from scratch. handy tables in the inside covers as a quick reference. i read through chapter 4, to the beginning of the catalog of patterns.

i think i looked into buying this directly from the author's site, and i think you get a complementary ebook when you buy the hard copy.

Friday, September 7, 2012

presenting science: a practical guide to giving a good talk

by cigdem issever (+unicode goodies...) and ken peach

decent book with pointers about what makes a good (and bad) presentation.

some tips:
produce a written set of notes for each slide -- you can give this to someone who couldn't attend, as well as use it for preparation and reference while speaking.
provide a link on the title slide to a paper or papers covering more detail.

the speaker is responsible for everything that appears, and does not appear, on each slide.
the structure and appearance of the presentation, as well as the content, are part of the communication process.

scoping
knowing your audience tells you where you start, and knowing your conclusions tells you where you finish.
find out who the audience is, why they might be interested, and exactly how much time you have.
then write the conclusions slide -- why are you giving the talk, what is the message, and what do you want them to do or change once they've heard you?
now write the contents slide, to map the route that takes the audience from their start to your finish.

writing process
can go beginning to end, or maybe end to beginning.
or, start with the key slide ('money slide') and go forward and backward from there.

like goodlad said, you need to tell them what they need to hear, not what you want to say.

Saturday, September 1, 2012

kindle hacking

nice links for cracking into those spare kindles:

http://www.youtube.com/watch?feature=endscreen&NR=1&v=uWvbaN-3q6I
http://www.mobileread.com/forums/showthread.php?p=1873256
http://www.youtube.com/watch?v=sNFXPDtg0xo
http://www.turnkeylinux.org/blog/kindle-root
http://www.mobileread.com/forums/showthread.php?t=128704
http://www.mobileread.com/forums/showthread.php?t=97745
http://wiki.mobileread.com/wiki/Kindle_Touch_Hacking
http://www.mobileread.mobi/forums/showthread.php?t=185837

Tuesday, July 24, 2012

presentations: creating and presenting

had to give a presentation and listen to a number of other peoples' presentations recently, and it got me thinking about how to do it better -- more effectively and more efficiently. here are my notes.

getting motivated to document/present/publish: consider the knowledge transfer as a part of the software/system deployment. the job is not finished until the product is in the customers' hands and they're using it.

i have finally decided that draft presentations in front of other people is not optional for me. i am just not capable of doing a good job without feedback from an external perspective. i'm not good at anticipating what others will be thinking, what they already know, what will be misleading, or what info is required for them to reach the right conclusion. i need to plan for this, and schedule the time and line up the people to listen as part of my preparation. it took me about a week to prepare for a 15 min presentation. i want to shrink that down, but i've realized that i need to allow some time to pass, doing other things, between iterations to get a fresh look at it.

early iterations should produce some factor (2x?) beyond the length of what will end up in the final. it's much easier to pick out things to trim from a complete narrative than to figure out what's missing. it also allows me to try different ways to explain the same thing. for getting feedback, it's easier for the audience to sat "i didn't understand this part" or "that part was repetitive/redundant/boring" vs. "here's what you should have added to make me understand this thing i don't understand". and i can't really crystalize the single main point and the few underlying points without seeing some content first.

i also realized that it is faster for me to write out a script of exactly what i will say, rather than just trying to wing it. by the time i have practised enough to be able to wing it, i will have come up with and memorized an unwritten script anyway, and it will have taken longer. i do still need to practice saying it, though, because the flow doesn't work the same when it's spoken vs. read. once i have written the script and practised it a few times, i won't need to actually look at it when i'm presenting. and i need to be able to deliver the spoken words and gestures/pointing without thinking about it too much, so that i will be free to observe the audience reaction and respond to it. also helps with getting the timing right.

characteristics of a good presentation and questions
-what do they want to know? what do they want to know?
-they will feel good about what they have learned if they have learned something they wanted to know.
-they have to be convinced that they want to know it, or else they won't even pay enough attention to learn it.
-would someone who didn't _need_ to know this material still be interested in its presentation?

try writing conclusions first, then content, then intro and outline last.

plan prep time based on scope of the concepts presented, not on the speaking time allotted.
-related, but not exactly linear relationship because of need to close off a complete idea
-complex ideas will take more prep time because there will be more potential pitfalls on places to get lost, so will take more time to craft carefully and more iterations with feedback from other people. this prep time increase holds true even when the time to present it is held constant at a short duration.
-i think part of the difficulty is that concepts and ideas are connected in complex, nonlinear, and possibly tree-like or digraph patterns, but a coherent presentation of an idea must be a linear narrative (in fact, i want desperately to keep their thoughts from wondering off the linear path.

review presentation slides from the perspective of someone with different types of gaps: caps in background knowledge, gaps in attention (someone missing the beginning or (more likely) not paying attention for 1 or more slides in the middle).

put in a "credibility" slide every once in a while to show that there is real, complex, impressive work behind the talk. Make all the other slides as simple as possible, to focus attention and ensure basic ideas are communicated.
-one mistake i see people making is to make jumbled slides with way too much to absorb in short time they flash it up there. (maybe it comes from trying to condense a longer talk or throw together slides from everyone who works for them.) it makes me think they're trying too hard to impress me, but i'm not impressed when nothing sinks in.
-otoh, i don't want to make it all so simple that someone will miss the relevance or genius of the simplicity. i don't want to leave the impression that i haven't done anything difficult or important.
-maybe make one complex slide at the beginning to establish credibility and gain interest. then immediately drop the complexity very low for pedagogical purposes. then increase complexity gradually as you explain the concepts. then jump complexity up again at the end when you show the application to demonstrate the relevance to and usefulness of the application.

if you have to use a big slide full of text, don't expect people to read it linearly. highlight words and phrases with color cues to guide their eyes to the important parts and things they need to remember.
-i just did a presentation with a number of moving parts that people needed to remember and keep distinct and organized throughout the talk. i tried using a color-coding system where i put every reference to a particular type of variable in a consistent color. i think that helped people follow the references, and it helped me organize things, too.

visuals should support what you want to say, not impede or force you to interrupt yourself to explain it. this is also a good reason to practice saying it aloud, because the speaking pace should determine placement of the visual information.

no eyecharts, and don't put something on there that you then tell me to ignore! it makes me wonder what else i should be ignoring.

sometimes the truth is misleading. leave it out. i won't go so far as to be dishonest, but you need to reduce detail until the main point is learnable within the time you have. the exigencies of communication mean that no one will care about the assumptions and simplifications you need to make in order to clarify the immediate point. they should be detailed only when they are necessary for the communication and relevant to the context.

Thursday, July 5, 2012

agile planning links

http://neilperkin.typepad.com/only_dead_fish/agile-planning.html
http://www.ambysoft.com/essays/agileProjectPlanning.html
http://www.ibm.com/developerworks/linux/library/l-agile-plan/
http://www.agilehelpline.com/2011/04/6-levels-of-agile-planning.html
http://www.projectconnections.com/articles/120506-mcdonald.html

standalone python executables

some good links i've dug up for making standalone programs from python

http://docs.cython.org/src/userguide/extension_types.html
http://hg.cython.org/cython/file/8bff3332e34f/bin/cython_freeze
http://wiki.python.org/moin/Freeze
http://stackoverflow.com/questions/1681021/detailed-explanation-about-pythons-freeze
http://cx_freeze.readthedocs.org/en/latest/index.html
http://www.py2exe.org/index.cgi/Tutorial
http://hg.cython.org/cython-devel/file/tip/Demos/freeze/README.txt#l1
http://wiki.wxpython.org/CreatingStandaloneExecutables
http://stackoverflow.com/tags/cx-freeze/hot

book: agile estimating and planning

by mike cohn, 2006 (11th printing 2011) 005.1 COH

p xxii. can't plan away uncertainty. we reduce uncertainty by gaining knowledge, and we gain knowledge by executing the plan. plan, plan, plan-do vs. plan-do-adapt, plan-do-adapt
anticipation vs. adaptation
estimate size, derive duration

ch. 1
p 3. "planning is everything. plans are nothing." -- field marshal helmuth graf von moltke (i always thought eisenhower said something like this. maybe he ripped it off.) also, "no plan survives contact with the enemy."
p ?. planning helps us see risk (uncertainty) and make decisions about it. eg, we can try the riskiest part first to see if it will work.
p 6. the most critical risk facing most projects is the risk of developing the wrong product
p 6. failed project: no one came up with any better ideas than what was on the initial list of requirements
p 9. an agile plan is one that we are not only willing, but also eager to change. we don't want to change the plan just for the sake of changing, but we want to change because change means we've learned something or that we've avoided a mistake.
p 10. spread planning evenly over the duration of the project

two week timebox, with half day each fortnight for planning. don't let external pressures change requirements during the timebox; they have to wait until the next one.

the knowledge and insight we gain from planning persists long after one plan is torn up and a revised one put in its place. agile planning is focused more on the activity of planning than on the plan, and an agile plan is one that is easy to change.

ch. 2
2/3 of projects have significant cost overruns
64% features rarely or never used
100% average schedule estimate overrun

traditional planning concerns activities performed rather than features delivered.
this leads to schedule overruns:
-activities don't finish early
-lateness is passed down
-activities are not independent

p. 12 ...

Friday, June 22, 2012

hudson

i figured out how to use a number of hudson's features for greater awesomeness in keeping my code base under control.

for svn+ssh:
on svn server: (make sure it's openssh)
ssh-keygen -t dsa -f newkey
append newkey.pub to ~/.ssh/authorized_keys
on Hudson: upload newkey as private key file

select 'trigger builds remotely'
put this script into the svn server, in hooks/post-commit
# force a build on the Continuous Integration server
echo "calling url to build r$2 in $1" >> /tmp/svnBuild
/usr/bin/wget -O - http://hudsonserver:8080/job/JobName/build?token=tokenString&cause=SVN_commit > /dev/null

to do the build, i put everything into a bash script at the root of my svn repo. so in hudson, i put this into the 'execute shell' 'command':
#!C:\cygwin\bin\bash.exe
. hudsonBuild.sh

(i also had to make a windows link from one directory to another at one point -- i didn't know how to do this before, but from cygwin you can use this:)
cmd.exe /c mklink /D testdata "c:\otherTestdata"

i also tried a few other things that i didn't end up using, mostly because their outputs didn't jive with hudson.
pylint -f parseable python > pylint.txt # i used pep8 instead
sloccount --wide --details python > sloccount.sc # i used pynocle instead

under post-build actions,
'publish junit test result report' with **/nosetests.xml as the filename
'publish covertura coverage report' with **/coverage.xml
'publish html report' with directory 'complexity' and name metrics.html, directory pynocle and name index.html
'report violations' with **/clonedigger.xml for cpd and **/pep8.txt for pylint

the hudsonBuild.sh script is like this:

echo "PATH="
echo $PATH

echo "building extension"
cd python/module/extension
/cygdrive/c/Python27/python -c 'import __init__'
cd ../../

echo "starting doctest/coverage"
#cd python
which python
echo "PYTHONPATH="
echo $PYTHONPATH
coverage run 'c:\Python27\Scripts\nosetests-script.py' --with-doctest --with-xunit --with-coverage --with-profile --profile-stats-file=nosetests.hotshot --verbose
coverage xml
mv coverage.xml coverage_nopath.xml
sed 's//D:\\Hudson\\.hudson\\jobs\\JobName\\workspace\\python<\/source><\/sources>/g' coverage_nopath.xml > coverage.xml
# put hotshot output into the html report dir, so it will get saved for each build
mv nosetests.hotshot ../complexity/
cd ..

echo "starting pep8"
pep8 --repeat python | perl -ple 's/:\d+: ([WE]\d+)/: [$1]/' > pep8.txt

echo "starting clonedigger"
clonedigger --cpd-output -o clonedigger.xml python

echo "starting pymetrics"
pymetrics `/usr/bin/find python -iname "*.py"` > $COMPLEXITY_DIR/complexity.txt

echo "starting pycabehtml"
pycabehtml.py -i $COMPLEXITY_DIR/complexity.txt -o $COMPLEXITY_DIR/metrics.html -a $ACC -g $GRAPH

echo "starting pynocle"
cd python
./pynocleGenerate.py
cd ..

echo "build script finished"

continuous integration

recently decided it was time to stop putting off trying continuous integration for software development. (i'm only a decade behind the times; not bad.)

since i mostly use python, i had to look at buildbot. apache gump and cruisecontrol also seemed like possibilities. but in the end i tried hudson since i'd read it was easy to set up and use, and it really was. all i had to do was download the war file and run

java -jar .\hudson.war *>output.txt

(i had to redir output so the blocking to console wouldn't make it wait for me to scroll or press a key.)

here are some motivational/informative quotes on ci:
wikipedia:
continuous integration -- the practice of frequently integrating one's new or changed code with the existing code repository -- should occur frequently enough that no intervening window remains between commit and build, and such that no errors can arise without developers noticing them and correcting them immediately.

martin fowler:
continuous integration doesn't get rid of bugs, but it does make them dramatically easier to find and remove. in this respect it's rather like self-testing code. if you introduce a bug and detect it quickly it's far easier to get rid of. since you've only changed a small bit of the system, you don't have far to look. since that bit of the system is the bit you just worked with, it's fresh in your memory -- again making it easier to find the bug. you can also use diff debugging -- comparing the current version of the system to an earlier one that didn't have the bug.

bugs are also cumulative. the more bugs you have, the harder it is to remove each one. this is partly because you get bug interactions, where failures show as the result of multiple faults -- making each fault harder to find. It's also psychological -- people have less energy to find and get rid of bugs when there are many of them...

if you have continuous integration, it removes one of the biggest barriers to frequent deployment. frequent deployment is valuable because it allows your users to get new features more rapidly, to give more rapid feedback on those features, and generally become more collaborative in the development cycle. this helps break down the barriers between customers and development -- barriers which i believe are the biggest barriers to successful software development.

paul duvall, cto, stelligent incorporated:
6 anti patters
infrequent checkins, which lead to delayed integration
broken builds, whcih prevent teams from moving on to other tasks
minimal feedback, which prevents action from occurring
receiving spam feedback, wich causes people to ignore messages
possessing a slow machine, which delays feedback
relying on a bloated build, which reduces rapid feedback

Tuesday, June 12, 2012

finding methods from parent classes in python

instanceName.methodName.im_func.func_code
or
instanceName.methodName.im_func.__code__

i always forget how i can find out where a method is defined if it comes from somewhere up the inheritance tree. maybe there's a better way, but the special attrs above will at least give me the file and line number.

guitar patches

http://www.jameslimborg.com/boss-gt-10-patches-download.html

cool. lots of van halen and boston sounds.

the guy mentions wanting another 64-band spectrum analyzer. i wonder if he is aware of wavelet transforms, or empirical mode decomposition/other hilbert-huang based transforms that could make his life easier by splitting time and frequency more optimally than stft.

Saturday, May 26, 2012

PowerMenu

i love powermenu. always on top, transparency... it's like having double the screen real estate.

Sunday, April 1, 2012

my own doctest runner

#!/cygdrive/c/Python27/python -i 'c:\crunch6SVN\python\pyTest.py'
#!/cygdrive/c/Python26_64/python  'c:\crunch6SVN\python\pyTest.py'
#!/usr/bin/env python
#!/usr/local/python/bin/python

# this works from powershell, but not from xterm or within spyder:
# C:\Python27\python ..\..\pyTest.py .\scanCoverage.py -g
# i think spyder puts in some trace hooks into pdb of its own
import sys
import os
#import ipdb
#dbg = ipdb.set_trace
from pdb import set_trace as dbg

def lineProfile(runStr,runContext={},module=None,moduleOnly=False):
    # with the run string set up, i can use cProfile to find the worst offenders
    import cProfile,pstats
    import line_profiler
    import sys
    prof = cProfile.Profile()
    #r = prof.runctx(runStr,{},{'p':p,'sout':sout})
    r = prof.runctx(runStr,{},runContext)
    # maybe use prof.dump_stats() to spit out to a file
    r = pstats.Stats(prof).strip_dirs().sort_stats('time').print_stats(5)

    #get line profiling on top 3 time hog functions
    ss = pstats.Stats(prof).sort_stats('time')
    def b(fn): return fn.rstrip('.py').rstrip('.pyc') #### rstrip takes or
    if moduleOnly:
        # only show functions in this file
        hogs = [f[2] for f in ss.fcn_list if b(f[0])==b(__file__)][:3]
        ts = [ss.stats[f][2] for f in ss.fcn_list if b(f[0])==b(__file__)][:3]
    else:
        #hogs = [f[2] for f in ss.fcn_list][:3]
        hogs = ss.fcn_list[:3]
        ts = [ss.stats[f][2] for f in ss.fcn_list][:3]
    fts = [t/ss.total_tt for t in ts]
    # ignore any functions beyond what accounts for 80% of the time
    for i in range(len(fts)):
        if sum(fts[:i])>.8: break
    hogs,ts,fts = hogs[:i],ts[:i],fts[:i]
    hogs.reverse();ts.reverse();fts.reverse() # i want longest time last
    # can't line prof builtins, so take them out of the list
    fts = [f for f,h in zip(fts,hogs) if not h[0]=='~']
    ts = [t for t,h in zip(ts,hogs) if not h[0]=='~']
    hogs = [h for h in hogs if not h[0]=='~']
    # this probably won't work in pyTest:
    #fs = [[getattr(x,h) for x in locals().values() if hasattr(x,h)][0]
    #fs = [[getattr(x,h) for x in sys.modules.values() if hasattr(x,h)][0]
    # pstats only saves module filename, so match files and search within them
    # rstrip for .pyc, .pyo
    modules = [x.__file__.rstrip('oc') for x in sys.modules.values() if hasattr(x,'__file__')]
    indices = [modules.index(h[0].rstrip('oc')) for h in hogs]
    modules = [x for x in sys.modules.values() if hasattr(x,'__file__')]
    hogMods = [modules[i] for i in indices]
    # find functions/methods within module
    #     only searches down one level instead of a full tree search, so don't
    #      get too crazy with deeply nested defs
    fs = []
    for ln,h,m in zip(*zip(*hogs)[1:3]+[hogMods]):
        #import pdb;pdb.set_trace()
        if hasattr(m,h) and hasattr(getattr(m,h),'__code__') and getattr(m,h).__code__.co_firstlineno == ln: fs.append(getattr(m,h))
        else:
            for a in [getattr(m,x) for x in dir(m)]:
                if hasattr(a,h) and hasattr(getattr(a,h),'__code__') and getattr(a,h).__code__.co_firstlineno == ln:
                    fs.append(getattr(a,h))
                    break
    #fs = [[getattr(x,h) for x in runContext.values() if hasattr(x,h)][0]
    #      for h in hogs]
    lprof = line_profiler.LineProfiler()
    for f in fs: lprof.add_function(f)
    #stats = lprof.runctx(runStr,{},{'p':p,'sout':sout}).get_stats()
    stats = lprof.runctx(runStr,{},runContext).get_stats()
    for ((fn,lineno,name),timings),ft in zip(sorted(stats.timings.items(),reverse=True),fts):
       line_profiler.show_func(fn,lineno,name,stats.timings[fn,lineno,name],stats.unit)
       print 'this function accounted for \033[0;31m%2.2f%%\033[m of total time'%(ft*100)
    #import pdb;pdb.set_trace()


# monkey patches to allow coverage analysis to work
#     just a little disturbing that (as of 2.4) doctest and trace coverage
#      don't work together...
def monkeypatchDoctest():
    # stolen from http://coltrane.bx.psu.edu:8192/svn/bx-python/trunk/setup.py
    #
    # Doctest and coverage don't get along, so we need to create
    # a monkeypatch that will replace the part of doctest that
    # interferes with coverage reports.
    #
    # The monkeypatch is based on this zope patch:
    # http://svn.zope.org/Zope3/trunk/src/zope/testing/doctest.py?rev=28679&r1=28703&r2=28705
    #
    try:
        import doctest
        _orp = doctest._OutputRedirectingPdb
        class NoseOutputRedirectingPdb(_orp):
            def __init__(self, out):
                self.__debugger_used = False
                _orp.__init__(self, out)

            def set_trace(self):
                self.__debugger_used = True
                #_orp.set_trace(self)
                pdb.Pdb.set_trace(self)

            def set_continue(self):
                # Calling set_continue unconditionally would break unit test coverage
                # reporting, as Bdb.set_continue calls sys.settrace(None).
                if self.__debugger_used:
                    #_orp.set_continue(self)
                    pdb.Pdb.set_continue(self)

        doctest._OutputRedirectingPdb = NoseOutputRedirectingPdb
    except:
        raise #pass
    return doctest

def monkeypatchTrace():
    import trace
    try:
        t = trace.Trace
        class NoDoctestCounts(t):
            def results(self):
                cs = self.counts
                newcs = {}
                # throw away 'files' that start with = 2.6 will not allow import by filename
        # i should refactor the whole thing to use imp module
        sys.path.insert(0,os.path.dirname(n))
        n = os.path.splitext(os.path.basename(n))[0]
    if not n.startswith('-'):
        if True:#try:
            if debug:
                # __import__ needs a non-empty fromlist if it's a submodule
                if '.' in n:
                    try: m = __import__(n,None,None,[True,])
                    except ImportError: # just run doctests for an object
                            modName = '.'.join(n.split('.')[:-1])
                            #objName = n.split('.')[-1]
                            m = __import__(modName,None,None,[True,])
                            #doctest.run_docstring_examples(m.__dict__[objName],m.__dict__,name=objName)
                            doctest.debug(m,n,True)
                            import sys
                            sys.exit()
                else: m = __import__(n)
                for i in m.__dict__.values():
                    import abc
                     # if it's a class (from a metaclass or metametaclass) or function
                    if type(i) == type or type(i) == abc.ABCMeta or \
                       (type(type(i)) == type and hasattr(i,'__name__')) \
                       or type(i) == type(lineProfile):
                        try:
                            print 'Testing',i.__name__
                            doctest.debug(m,n+'.'+i.__name__,True)
                        except ValueError:
                            print 'No doctests for', i.__name__
            else:
                import pdb
                if coverage:
                    #### need a better way to get module filenames without
                    #     importing them. (after initial import, the class and
                    #     def lines will not be executed, so will erroneously
                    #     be flagged as not tested.)
                    #d,name = os.path.split(m.__file__)
                    d,name = '.',n
                    #bn = trace.fullmodname(name)
                    bn = name.split('.')[-1]
                    # ignore all modules except the one being tested
                    ignoremods = []
                    mods = [trace.fullmodname(x) for x in os.listdir(d)]
                    for ignore,mod in zip([bn != x for x in mods], mods):
                        if ignore: ignoremods.append(mod)
                    tracer = trace.Trace(
                        ignoredirs=[sys.prefix, sys.exec_prefix],
                        ignoremods=ignoremods,
                        trace=0,
                        count=1)
                    if '.' in n:
                        tracer.run('m = __import__(n,None,None,[True,])')
                    else: tracer.run('m = __import__(n)')
                    tracer.run('doctest.testmod(m)')
                    r = tracer.results()
                    r.write_results(show_missing=True, coverdir='.')
                else:
                    # __import__ needs a non-empty fromlist if it's a submodule
                    if '.' in n:
                        try: m = __import__(n,None,None,[True,])
                        except ImportError: # just run doctests for an object
                            modName = '.'.join(n.split('.')[:-1])
                            objName = n.split('.')[-1]
                            m = __import__(modName,None,None,[True,])
                            doctest.run_docstring_examples(m.__dict__[objName],m.__dict__,name=objName)
                            import sys
                            sys.exit()
                    else:
                        #import pdb; pdb.set_trace()
                        m = __import__(n)
                    # dangerously convenient deletion of any old coverage files
                    try: os.remove(trace.modname(m.__file__)+'.cover')
                    except OSError: pass
                    # need to call profile function from the doctest
                    # so that it can set up the context and identify the run string, because anything not passed back will get garbage collected
                    # and there's no way to pass anything back
                    # but how can i call something within pyTest from the doctest string? some kind of callback?
                    # i want pyTest to decide if it gets called, so i can switch from the command line

                    doctest.testmod(m)
                    if profile:
                        runStr,runContext = m._profile()
                        lineProfile(runStr,runContext,m)
        else:#except Exception,e:
            print 'Could not test '+n
            print e
            raise e

q = quit
from sys import exit as e

Tuesday, January 31, 2012

Funding beyond discounting: collateralagreements and derivatives pricing

apparently ground-breaking article by vladimir piterbarg, the head of barcap quantitative research and author of a well-known 3 volume series on interest rate modeling. http://www.scribd.com/doc/34328165/Risk-Magazine-Piterbarg-Funding-Beyond-Discounting-Collateral-Agreements-and-Derivatives-Pricing important for correctly valuing derivatives, and what the 'risk-free' rate really is.

Wednesday, January 11, 2012

debugging c++ extensions to python

trying to debug a python extension module written in c++ (wrapped with swig). i think this would be so much easier if i were using gcc, but python is built with msvc... setup.py wants the debug versions of python libs, but i don't have them and don't really want to try to build python from scratch right now. these refs seem relevant: http://www.velocityreviews.com/forums/t677466-please-include-python26_d-lib-in-the-installer.html http://vtk.org/gitweb?p=VTK.git;a=blob;f=Wrapping/Python/vtkPython.h;h=9d01ac21bafae0a24252398f268b6b3563df62cd

Tuesday, January 10, 2012

design patterns

comment from duffy on most useful from gof (gang of four) book of object-oriented design patterns: "GOF is useful but should not become an objective in itself. The most useful ones in general (dependent on the domain of course) are Visitor, Strategy, Facade and Template Method pattern. Singleton and Observer can best be avoided."

most relaxing music

The top 10 most relaxing tunes were: 1. Marconi Union - Weightless 2. Airstream - Electra 3. DJ Shah - Mellomaniac (Chill Out Mix) 4. Enya - Watermark 5. Coldplay - Strawberry Swing 6. Barcelona - Please Don't Go 7. All Saints - Pure Shores 8. AdelevSomeone Like You 9. Mozart - Canzonetta Sull'aria 10. Cafe Del Mar - We Can Fly Stream it from SoundCloud right here

Malliavin calculus

saw a reference to Malliavin calculus. used in financial math to take derivatives of stochastic processes. looks interesting, might be useful to learn some day.

Blog Archive

About Me