Duhctaep's Lab: 2009

Tuesday, December 22, 2009

inverse problems

this month's issue of inverse problems has a number of articles on imaging and inverse scattering.

math for people

blog post in response to 'math for programmers'. for those of you with an interest in math education.

the pragmatic programmer

book on programming recommended by travis oliphant. promises to take me from journeyman to master, if i decide to become a master someday.

interactive brokers news pages

ib has news pages on interest rates in various countries, fx markets (including an rss feed), and futures and options (also with a commentary rss feed on a few individual tickers). the futures and options page includes a summary explanation at the top that explains what things like implied vs. historical volatility and futures arbitrage can tell you about what derivatives traders believe will happen in the market short-term.

the eagle and the lion

partway through reading a very good (if scholarlyly dry) book by james a. bill on us-iran relations covering the installation and deposing of the shah and the aftermath of the islamic revolution. only through chapter 7 so far, and the shah has just been overthrown. given the prominence of iron in world news these days, i think this history is important to understand.

cleaning up the office

found an old paper in my office: 'analytical parametric analysis of the contact problem of human buttocks and negative poisson's ratio foam cushions'. dead serious research, and i'm not sure but i think i did cite it once. it has the only account i could ever find for an analytical solution to hertz-like contact with a finite-thickness plate. 'game theory, maximum entropy, minimum discrepancy and robust bayesian decision theory', peter d. gr\:{u}nwald and a. philip dawid. shows the connection between maximizing entropy and minimizing worst-case expected loss. hefty tome, but relatively readable. the book, 'the statistical mechanics of financial markets', j. voit, 2005, looks like an interesting read. goes into some of the real areas of interest in quantitative finance from a physics perspective.

Monday, December 21, 2009

scipy india 2009

some interesting stuff to check out from the scipy conference in india. chandrashekhar kaushik is using cython to do sph simulations, apparently using python down to a fairly low level. senthil kumaran gave a talk explaining how the gil affects code speed. i will probably need to understand this at some point. chris burns talked about the fmri project, nipype. i'm curious to see if they have written some ica code in there. david cournapeau is coming out with a new package distribution system. might look at that as a possible alternative to distutils. akshay srivivasan showed how to use python with an avr microcontroller to make cheap and easy instrumentation. asokan pichai of fossee showed how they use open source software to make web video tutorials. travis oliphant very briefly alludes to ultrasound imaging as a scipy application area. i wonder if i can get more detail on that. stefan van der walt mentioned his github repository with scikit addons, with goodies like image processing and gpu algorithms for python.

bloons

finishing the year on a high note. bloons tower defense 3 is already out, so i can raise the bar for next year.

EDIT: now bloons 4 is out. this is getting to be hard work just keeping up.

Thursday, December 17, 2009

more estimating mutual information

the mutual information: detecting and evaluating dependencies between variables r. steuer, j. kurths, c. o. daub, j. weise, and j. selbig bootstrap confidence on mutual information good practical info eq 32: what about -1/N \sum \log (\hat{f(x_i)}\hat{f(y_i)}N) ? no relative scaling in x vs. y, minimum wrt h_x, h_y (?) fig 8: pearson correlation ~= 0, mutual information > 0 is the nonlinear region not detectable with linear test a minimax mutual information scheme for supervised feature extraction and its application to eeg-based brain-computer interfacing farid oveisi and abbas erfanian adaptive histogram pdf; better than fixed histogram and gram-charlier polynomial (?) not very useful to me examining methods for estimating mutual information in spiking neural systems christopher j. rozell, don h. johnson the data processing inequality proves that I(X,\hat{X}) \le I(X,Y) regardless of the estimator used to obtain \hat{X}. useful for checking the validity of a coding model if you know a correct upper bound for I(X,Y). fast algorithm for estimating mutual information, entropies and score functions dinh tuan pham bias-canceling estimator, O(N) vs O(N^2) density functional (cubic spline), emphasis on gradients (scores) blind-source separation for linear mix and nonlinear maps

http://arxiv.org/pdf/comp-gas/9405006.pdf
Prichard and Theiler. Generalized redundancies for time series analysis.
good info on advantages of Renyi MI.

independence with Renyi MI requires that it be ==0 for all of q > 1 or 0 < q < 1. maybe it's more robust for estimation since i could evaluate for multiple qs. apparently q=2 is best, statistically?

i found that i could reduce a taylor series in q down to a bunch of terms containing expected values of log(p) in a probability measure defined by p_i^q. but i'm not sure where to go from there or if that's more useful.

http://www.ima.umn.edu/preprints/July90Series/663.pdf
Parzen. Time series, statistics, and information
kind of old, but still looks like good info

Dongxin Xu and Deniz Erdogmuns. Renyi's entropy, divergence and their nonparametric estimators
book chapter with good info on interpreting and approximating the spectrum of renyi entropies, especially quadratic (q=2).

Tuesday, December 15, 2009

python documentation generation

a few tools out there for generating docs for/from python code. docutils has a lot of core support and no dependencies, but it seems to be more of a library since the front-end stuff seems a little bare-bones and many others use its ReST parser. sphinx is a very impressive doc generator, with commensurately impressive packages using it. but they say it's not really designed for auto-api docs, more for stand-alone rst files written alongside the code. epydoc, otoh, is an auto-api generator that analyzes .py source to build the docs. it can use it's own format, or it can read rst (i think the better choice). can make dependency graphs, etc. i think i'll try this first. happydoc is another auto-api tool. i don't think it's used as much as epydoc.

Monday, December 14, 2009

probability density estimation

'density estimation by dual ascent of the log-likelihood' by tabak and vanden-eijnden shows an interesting method for joint pdf estimation by mapping to gaussians. i wonder if the technique is related to 'whitening as a tool for estimating mutual information in spatiotemporal data sets' by galka, ozaki, bayard, and yamashita. they also cite the fact that innovations that are the sum of gaussian and poisson rvs can represent any continuous-time markov process. for continous dynamics, only the gaussian noise term will be present.

Thursday, December 10, 2009

new youtube api

apparently youtube has discontinued their old api. i'll need to look here to update things. or i could just use this. it doesn't do the part selection like my script did, but if you can search well enough to get the first hit right, it's easy enough: youtube-dl "ytsearch:hampster dance part 1" -o 1.flv

Wednesday, December 9, 2009

python coverage testing

coverage.py: actively developed, simple command line execution for html output. integration into my test framework would require some work with the api, though it already has a nose plugin i could look at. figleaf is based on coverage.py but runs faster because it ignores python builtins by default. better separation between code analysis and reporting, so you can more easily combine results from multiple runs. maybe not quite as much spit and polish. the coverage langlet module (?) takes a different approach to coverage monitoring by inserting sensor objects into block entry points of the compiled code. the module seems to be more alpha quality than ned batchelder's, but perhaps an interesting alternative if that one doesn't work out for some reason. i think i'll try coverage and/or figleaf to see if they are helpful beyond the builtin trace capability.

Monday, November 30, 2009

printable diff

here are some handy commands for making printable landscape diff outputs with long (120 char) lines. diff -y --left-column --width=240 version1 version2 > /tmp/t a2ps --columns=1 -l 240 /tmp/t -o /tmp/t.ps

factorization of matrices with unknown elements

been thinking about a problem that can be represented as a factorization (like svd or ica) of a matrix when only some of the elements are known. clearly not a simple problem, but one that apparently comes up in image recognition. google turned up an interesting report from oxford from 5 years ago that gives a good review of the problem. good descriptions of a few algorithms, observations on optimization methods (that may or may not carry over to other applications), and some synthetic examples that show the effect of the distribution of known elements. the residual function uses a hadamard (elementwise) product with a mask matrix to represent the partial knowledge, although sec 4.3 points out that other forms might better incorporate prior knowledge.

more python testing tools

pester looks like something i should try. it uses mutation testing to test your tests, by seeing if there are changes to the code that still pass all tests. unfortunately, it hasn't been updated since 2002, even though jester (the java version) is more recent. mutation testing is sorta related to fuzz testing, which feeds your code modified data until it chokes. fusil and peachfuzzer both look very feature-rich but complicated. peachfuzzer has a gag-me xml interface and fusil provides libs for writing python scripts that test cli programs. svnmock might be a good tool to use if i write code that uses the svn python hooks. (been thinking about making my repository mail itself somewhere as a backup with each commit.) also hasn't been updated in a while, though -- 2006 in this case. pythoscope is a good tool to use if i ever start using unittest instead of just doctests. it will examine your code to build the start of a test file that you can fill in, almost as easily as a doctest. and it's been recently updated. some day i should put a source checker like pylint or pychecker into my stream. maybe catch some problems before they start. clonedigger has more powerful tools than pylint for detecting duplication, for refactoring hints.

pytest is a test runner rumored to be buried in logilab-common. sounds similar to the script i wrote to run my doctests and do a little coverage analysis. complexity analizes cyclomatic complexity of python code, though it's written in perl and it doesn't seem to be actively developed.

pymetrics does cyclomatic complexity as well as loc and other metrics. written in python and looks pretty active. worth a try, i think. pysizer does some interesting memory profiling, but i seems to be a little too alpha right now (only built-in types supported; usually these are the least of my memory troubles.) still, some nice features are already there to help trace where objects came from, who's using them, and plotting dep graphs. if only it could handle non built-in instances, this would be great.

guppy-pe is a combination of python dev tools. heapy is a memory profiler, and it seems to do basically the same thing as pysizer but is not limited to built-ins. gsl is a specification language, which i think i can skip as another example of how to javaify python. EDIT: hmm, i can't seem to get heapy to account for numpy arrays, although these people apparently did. sys.getsizeof() definitely does not. grrr.

Saturday, November 28, 2009

snaplogic for data integration

snaplogic is a tool for integrating data from disparate sources with a clean and simple interface. maybe i should look into it for dealing with some of my code that interacts with various formats of large datasets.

mock tesing

been thinking about mock testing and i finally came to the point that i need it (to paper over some urllib calls). some googling around turned up a number of good links (an excellent list of python testing links, including mocking libs, etc., is at the cheesecake). there are also a couple of controversies roiling out there that i don't really care about, such as the difference between stubs and mocks (i'll use what works for me, whatever it's called) and if mocks are inherently evil (use them at i/o boundaries with external resources that are expensive, unpredictable, or unavailable). some of the mock frameworks out there seem to be derived from java libs, which is an auto strike against in my book. (java coders seem always to decide beforehand that a simple task needs 10k loc, even if they're writing python.) minimock looks interesting, and the only complaint i see for it is that it only works on doctests. i'm not sure that's even true anymore, and it's actively developed. since i only write doctests (i just can't see why it needs to be more complicated; another java arifact?), maybe it's exactly what i need. the examples on its website are easy to read and understand, and one of them shows how to use/mock smtp. one issue that i might run into later is wrapping vs. complete mocking. wrappers let you use the real object part of the time or for part of its interface. this can avoid some of the evilness criticism that the mocked code could change and break things without breaking the tests. i can imagine that sometimes it's less complicated, sometimes it's more.

Wednesday, November 25, 2009

free linear algebra

here's a chart comparing free linalg code. handy.

pyret

python package for image deblurring. the demo page link is broken. deconvolution with svd/tikhonov regularization.

pyamg

a while ago i messed around with multigrid methods for solving sparse inverse problems. if i do it again, i'll try this: algebraic multigrid solvers in python.

text-to-speech on the iphone

surprising how little there is out there. i'm probably not looking in the right place. a number of people got flite running: http://artofsystems.blogspot.com/2009/02/speech-synthesis-on-iphone-with-flite.html http://www.voxtrek.com/ http://www.cmang.org/ http://www.embiggened.info/vocalizer.html some of them have itunes links to buy the software, but they don't seem to be active. one is a navigation system, another is just flite with a little gui icing.

festival voices

had to try some different voices with festival, since the default one on the acer aspire sounded terrible. kal is the default on gentoo, and is the one i'm most used to. ked sounds pretty good, too; maybe i'll use it when it want to differentiate. both of these are diphone voices. all the arctic voices i listened to sounded pretty bad. to switch between them globally, just put (set! voice_default 'voice_kal_diphone) into /etc/festival/siteinit.scm

Tuesday, November 24, 2009

execnet

execnet is another python remote execution package. looks kind of like a pythonic mpi with minimal setup overhead, since the sends and receives are manual. can give modules to remote instances, but you have to resolve import dependencies manually.

support vector machines with pyml

there are a number of svm codes out there, but the only non-swiged pure python one for which i've seen a third-party recommendation is pyml. decent general svm intro doc, too.

Information Theory, Inference, and Learning Algorithms

another book available online, including latex and octave, perl, etc, source. might be a good ref on IT basics, neural nets, coding, compression, and monte carlo.

data clustering for screen scraping

i just had an idea for an application of unsupervised data clustering. a quick google popped up python-cluster. looks like it's been abandoned for a couple of years, but it has a hierarchical algorithm and (maybe) k-means. i might try it. also, scipy.cluster has kmeans and vector quantization, with self organized feature maps and other methods promised later. the app is screen scraping web pages and trying to get the main content (an article, for example) without the ads, links, and other junk around the edges. i think it might be possible to look at each line (after tossing everything inside script tags) and separate the lines based on their length and the percent of the line inside html markup. the reasoning is that real content usually has long lines in the source and a small fraction of html taggage. i probably want to throw in line index as a third variable, since the lines i want will probably be close together. another thing i could do is grab multiple pages from the same site: either multiple articles that should have the same format or multiple copies from different days for a frequently updated page. that would allow me to do two things. first, i could combine data points from multiple pages to get higher point density for the cluster detection. (might need to test if an each individual page is sampled from the same distribution as the others to throw out outliers.) second, i could detect identical lines to throw away as nonunique boilerplate.

the ubuntu python-mvpa package looks like it might fit the bill.

Wednesday, November 18, 2009

Software for Hidden Markov Models and Dynamical Systems

all the source and the whole build system for a book by andrew fraser is online. the code is written in python, and the typesetting uses rubber, the python wrapper for all things tex. not only is the subject matter interesting (an application chapter on obstructive sleep apnea), but i think it has some good examples of how to write a book or long report with python and latex. took a _long_ time to put everything together, but it generates the whole book. if i actually use it, i think i'll have to buy the book in gratitude. (had to comment out a \printnomenclature line to get latex to run. guess i have a bad nomencl.)

Tuesday, November 17, 2009

Flexible Algorithms for Image Registration

book by jan modersitzki that has an associated website with matlab (meh) software and other ref docs.

Scientific Computation

book from cambridge u press by gaston gonnet. i'm not sure the methods are extremely advanced, but it gives an interesting spread of applications, including protein structure and stock price prediction.

Friday, October 30, 2009

outliers

just finished the book, 'outliers' by malcolm gladwell. interesting, but first a disclaimer/critique: he is not a quantitatively oriented person. he makes some claims without backing them up with data, and other assertions are tenuously extrapolated from scant data. it's clear from the definition he gives both in the book and in later interviews that he does not understand the real meaning of the word 'outlier' in the statistical context. it is also clear that he has an agenda: to disprove the 'myth of individualism' in favor of 'the power of community' to explain personal success. it's true that talent and hard work are not enough for success; you also need opportunity. (well, duh.) but i think he tries to replace one silly straw man with another. he strongly emphasizes the opportunities, and how randomly or arbitrarily they occur for people, as the dominant factor for determining success. but most opportunities don't just drop out of the sky into certain people's lap. it's precisely the people who are working hard, paying attention, and looking for opportunities that take advantage of them. maybe the author's mistake is the very common fallacy of confusing correlation with causation and choosing a causal relationship based on preconceived notion. in that spirit, it is interesting to see the role that these factors can play. people who are looking for those opportunities can increase their awareness and use it to their advantage.

estimating mutual information

just read a very interesting article on estimating mutual information from random variable samples. pretty much everything else i've seen on this subject is based on either histograms or kdes. so improving the algorithms comes down to improving histogram or kernel parameters. 'estimating mutual information' by a. kraskov, h. st\:ogbauer, and p. grassberger takes a unique approach, based on nearest neighbors. seems to do pretty well on few data points and nearly independent sets. the paper also shows an interesting application to ica. certainly worth a try, especially if i'm comparing a number of approaches to mi estimation. it points out that the norms need not be the same or even have the same space, so i could use the rank or any other transform (log is a popular one) to spread out some data or otherwise emphasize some parts to reduce estimation error without changing the theoretical result. the main results to implement are equations 8 and 9. be careful that the definitions of n_x and n_y are different in 8 and 9, though they could be counted simultaneously (on the same pass). compare fig 4 to 13. the exact value for I in the caption comes from eq 11. ref 34 goes into the uniqueness and robustness of the components that drop out of ica analysis. the paragraph under fig 17 has computed values from web-accessible data that could be used for testing code. it would be interesting to see if component 1 in fig 19 reflects phase differences in component 2 due to propagation delay. the second term of the second line in eq a4 equals 0, which confirms the fact that reparameterization does not change mutual information. using small values for k increases statistical errors while large k increases systematic errors. probably best to try multiple ks and compare trends to fig 4. the digamma function is implemented in scipy.special as psi.

Thursday, October 8, 2009

perallel python with ipython.kernel

the kernel module that comes with recent versions of ipython provides for interactive parallel python sessions. might have to check this out some time. this part of ipython is built on foolscap, a secure rpc framework. i wonder if it would make a good replacement for pyro. looks like it takes more up-front work, though.

Tuesday, October 6, 2009

python packages

scikits has a number of packages paralleling the scipy effort, including some optimization and machine learning, audio and signal processing, a matlab wrapper, etc. fwrap looks to be a next-gen of f2py. makes interfaces to cython, c, c++. still alpha. mpi4py provides a c++ like interface via cython. if i'm ever forced at gunpoint to use mpi again, this is what i will use. the mayavi people have made a recorder for use with the traits ui (from enthought) to generate a human-readable python script that can reproduce the gui actions. that sounds like a cheap and easy way to automate journal files without any effort on my part. maybe i could relax my principled stand against making guis in general....

bayesian inference books

travis oliphant (of enthought fame) recommends 'the algebra of probable inference' by richard t. cox and 'probability theory: the logic of science' by edwin t. jaynes as good references on using bayesian inference as a formalization of the scientific method (my interpretation). might have to check them out some time.

Monday, October 5, 2009

psuade

another uncertainty/optimization/sensitivity analysis package, this time from llnl. looks like they want to make their own built in environment (why?! why?!) but at least it's gpl.

Friday, October 2, 2009

automatic debugging

saw some interesting work on automatic debugging. the idea is to use tests (passing tests with one failing tests) to evaluate random code changes and find one that works. i don't really expect my computer to debug my code any time soon. but one interesting thought was that the ast nodes to change were weighted by positive and negative test coverage. maybe i could use coverage in this one to localize a bug. (more likely is places covered by multiple negative tests, less likely in places covered by positive tests.) that would help me find the bug, which is almost always the hardest part. refs to the spike black-box fuzzer from immunitysec.com, strata dynamic binary transformation (from virginia).

stable differentiation

found disappointingly little online about good numerical differentiation algorithms. the only thing i can think of is savitzky-golay filtering. scipy has a cookbook recipe, procoders has a page, and there's a krufty package out there i might steal from. i tried filtering a time series with a 3rd order butterworth before doing a simple central difference, and it made some huge errors at the beginning. i hope the s-g filter works better.

3d pdf objects

so i can now make 3d pdf objects, and i'd like to make it more useful. adobe put out some docs on their support for 3d, as well as javascript for 3d. they claim the possibility of animations, though i seem to recall from reading about this before that that is only via matrix transforms for rigid-body motion and not mesh deformation. i wonder if i can use insdljs.sty for the javascript....

Wednesday, September 30, 2009

flash on firefox 3

finally upgraded to firefox 3, and i had some trouble getting the adobe flash player recognized. i installed it, made some links to places i thought it might be looking, all with apparently no effect. the thing that finally made it work was when i symlinked /home/user/.mozilla/plugins/libflashplayer.so to /opt/netscape/plugins/libflashplayer.so.

Monday, September 28, 2009

amalgam and dream

recently came across some interesting work by jasper vrugt. AMALGAM is a self-adaptive global optimization algorithm. basically you start with a latin hypercube to get some initial sampling and feed those points to generate a couple of new points from each of a set of different global optimizers such as genetic, swarm, etc. after you sample that second generation of points, you can evaluate the quality of those optimizers on how well they pick new points for that type of problem. for each successive generation, you can sample more points from the optimizers that have done well. strikes me as similar to cover's optimal portfolio theory in the way it weights the winners. DREAM (differential evolution adaptive metropolis) is a sampling algorithm that has some advantages over markov chain monte carlo (mcmc) methods while maintaining desirable statistical properties that i don't really understand yet. useful for optimization and parameter fitting for high-dimensional inverse problems.

Thursday, September 24, 2009

open source cad

narocad and solidmodeller are two open source cad codes out there. looks like somebody is finally using opencascade.

software carpentry

i've read that the software carpentry course is a good way for scientific computing people to learn software engineering. audio lectures also available.

quantitative finance firms

jane street looks like an interesting place to work, with offices in new york, london, and tokyo. looks like they have a strong cs background (they use ocaml). they recruit at top schools, including mit and imperial college.

Wednesday, September 23, 2009

packages i need to check out

some packages i've come across recently that look interesting enough for further investigation. zhu3d epix alberta (finite element) candystore peercast edna grisbi homebank demerge nostaples tic98 unpaper cil (c ast generator) arrayterator coverage.py ipdb very nice ipython debugger on windows, got it working in powershell but not cygwin xterm or spyder spyder dev chat says there are issues, maybe someone's working on it pudb borland-esque text debugger depends on urwid, which requires *nix or cygwin lightyears rpyc snakefood pstplus

tet mesh improver

the stellar package at berkeley claims to improve tet meshes for finite element simulations. i should check it out some time, if i'm ever lazy enough to use a tet mesh again.

math tutoring

mathnasium looks like an interesting service. maybe some day i could open a franchise. i think the hardest part is finding out how each student learns and then learning how to teach that way.

Tuesday, September 1, 2009

python design optimization

this article looks like it might be interesting. multidisciplinary design optimization, with a significant section on software issues and python.

Thursday, August 13, 2009

chording keyboards

one of these days i'll build myself a chording keyboard, so i can walk around with my netbook in a backpack running knoppix+adriane like a wanna-be cyborg. cursory googling turned up a few good links, foremost among them the spiffchorder for its usbness. http://chorder.cs.vassar.edu/spiffchorder/forside http://cuddlepuddle.org/~adam/pickey/index.html http://www.chordite.com/ http://www.sigchi.org/chi97/proceedings/paper/fkm.htm#U4 wow, there are some scary people out there who take the cyborg thing way too seriously. now i know what my fate would have been had i never married.

Wednesday, August 12, 2009

epdlab

looks like enthought people have come out with their own ide: epblab. i followed source build instructions below, and it wasn't too painful. if i had to do it over again, though, i think i would use the easy_install or just grab the whole enthought package. i ended up needing to update wxgtk, wxwindows, ipython, and configobj in addition to the enthought suite stuff. https://svn.enthought.com/enthought/wiki/Build/ETS_3.0.0b1/Py2.5/Generic_Any_Any looks like epdlab uses editra for the integrated editor. not bad, though i think i will stick with emacs for now. if i were forced to use windows and had to start from scratch, though, i would cling to editra and epdlab like they were a marine float in a typhoon. and i would be sure to get the python ide and other plugins from the editra site.

enthought webinars

enthought is giving webinars, both free and open to the public and some that are restricted to paying customers. very interesting, very handy, especially the downloadable movies. i gotta keep tabs on this.

Friday, August 7, 2009

pypy woes

i fought mightily to compile some of my existing code with pypy. the docs warn that much of ordinary python is outside of rpython, and they are right. the speed ups i saw with things that pypy could translate to c were indeed impressive, but it was just too hard to shoehorn real-life python into rpython, even for some fairly simple parts of the code. i know this isn't what rpython is for, but i was hoping... maybe i'll try out cython and see if it's better than just writing c for scipy.weave.

Tuesday, August 4, 2009

fuzz testing with python

fusil is a fuzz testing framework for python. supposed to be easy to use and has dug out bugs in a number of codes otherwise assumed to be stable. many more links and info about a number of different types of tests are available on the pttt. minimock, for example, looks like one i could probably use a lot.

python 3d apps

came across some interesting 3d applications in python free-cad is a multi-platform 3d cad program with extensive use of python. not only does it have an embedded interpreter, all the pieces are python modules that can be imported to an external python. even the qt interface can be modified/controlled from python. these guys really did it right. uses opencascade (with pythonocc), coin/inventor, pivy. it claims some nastran support, but the mesh module seems to be entirely geared toward visualizing exterior triangle surface meshes and not for analysis. more fem support is on the dev roadmap. mgltools looks like a very powerful viz tool similar to the old dx. seems to be geared toward molecular stuff, though.

Tuesday, July 28, 2009

rpython and numpy

looks like at least a few people connected with pypy have an eye on numpy. there's a very small and limited demo in the svn right now. but other devs have said it is hard and not their main interest. i hope it at least remains possible, at some future date. maybe more promising is some work on ufuncs by one of the enthought people. it shows a good example of how to use rpython to compile a function. maybe i can use this to autogenerate c code for scipy.weave.inline. here's the most relevant example:

from pypy.translator.interactive import Translation
class compdec:
   def __init__(self, func):
       self.func = func
       self.argtypes = None
   def __call__(self, *args):
       argtypes = tuple(type(arg) for arg in args)
       if argtypes != self.argtypes:
           self.argtypes = argtypes
           t = Translation(self.func)
           t.annotate(argtypes)
           self.cfunc = t.compile_c()
       return self.cfunc(*args)
@compdec
def is_prime(n):
   if n < 2:
       return False
   for i in xrange(2, n):
       if n%i == 0:
           return False
   return True
print sum(is_prime(n) for n in xrange(100000))

i think that it would be easy to change the check/recompile to a dict lookup. that __call__ might also be compiled with rpython since args is always a tuple. that would add only 1 c function call overhead and preserve dynamic typing.

python guis with avc

avc looks like an interesting concept. it's a pure python library that connects gui toolkit variables to variables in the scope of your python script, with bidirectional communication. looks like a very clean and easy way to separate the view from the logic, and it has a standardized interface for gtk, qt3, qt4, tk, and wxwidgets. tk doesn't look as good as the others, but it's easy to get with python. gtk and wxwidgets look like the most feature-complete. all of the toolkits have their own optional external interface def files, and avc can use those. but you can also set up everything from the comfort of your python script. only problem: it's gpl, so i might need to use something else if i want to sell my stuff. EDIT: i think enthought's Traits might be usable in a similar way. and if i'm not mistaken, they license their open source stuff bsd.

python data storage

after a little bit of optimization, i'm finding the bottleneck now is reading in the data. i think i've found about all the ways to speed up cPickle (most recent protocol, Pickler().fast = True) and the next step will be to a real database. i'm not sure the python builtins will buy me much, though, and i think if i'm going to have to install something it might as well be pytables. pytables is an interface layer on top of hdf5, so it's probably best for large volumes of numerical data. it only requires hdf5 (which built without problems: configure && make install) and numpy. i had to set HDF5_DIR and add paths for LD_LIBRARY_PATH and PYTHONPATH, not being root. but overall a very painless install. the data structure is bound to be more complex than a simple pickle, but there are some good tutorials out there. also, the nmag project has some good real-life experience with using pytables for unstructured grid data. (see hdf5*py in nmag-0.1/nsim/interface/nfem) uiuc and cei (the people who make ensight) also defined an hdf5 mesh api, but it looks pretty krufty now. another plus with pytables is that you can use vitables to interact with the data. that's even easier than a pickle. EDIT: hers's a site that covers a lot of the issues with scientific data storage and refers to specific examples, including hdf. one problem that might arise with pytables is that numpy arrays can be memory mapped to a file on disk, but pytables can't do that if i'm not mistaken. am i? according to this email exchange, pytables doesn't do mmap but it can be as fast or faster if used properly. sounds like i can still use pytables without losing performance, but i will need to reread that and some of the refs therein to implement. here's an interesting conversation about large file i/o in python, with specific applications in finance.

Monday, July 27, 2009

python profiling and optimization

python's builtin cProfile/pstats are great, and it is really easy to get started with them. (just look at the top of the doc page for cProfile.) one problem, though, is that cProfile doesn't do line-by-line profiling. hotshot did, but it's deprecated and arguably harder to work with. a little poking around turned up line_profiler and kernprof. looks like they're the pet project of one of the enthought people. i haven't tried it yet but it looks pretty easy to use, even through the api as a part of a larger development tool. i'm thinking the best way to use it would be to run cProfile first and take the top 3 time hog functions (or 1 or 2 if they get over 90% total time) and auto run it again with line_profiler turned on for those functions. maybe i could build this into my doctester by first running the small and fast doctests to confirm correctness, then run some __profileTests__ module string that has more realistic usages. while i'm writing code, i can interrupt the profile tests to rerun the doctests, and i'll see the profile results any time i happen to let it run all the way through. kernprof is pure python and acts as a helper/wrapper for both/either cProfile and line_profiler. but line_profiler uses cython. speaking of cython, for optimization cython looks like an impressive way to generate c source and build it to run fast (and obfuscate, if that matters). i might look at it as a possible alternative to pypy/rpython, if i can't get that to work. one problem, though, is that it looks like so many changes and type annotations might be needed with numpy that i might as well just write the c. i've been very impressed lately with results i get using scipy.weave, with both the blitz converter and the default c-ish converter. i can narrow the hotspots down to one or two methods, and translating the python to c is not too hard then (especially with the default converter). scipy.weave.inline is not too shabby either, for speeding up numpy expressions. so pypy is another very interesting possibility for autogenerating c from python. someone has already shown an example of how to do this, including some initial work with numpy. EDIT: maybe using cython is not as bad as i had thought. apparently it will compile just about any valid python (with a couple of restrictions on closures) with a modest speed up. all the extra syntax is just for making it run faster, and even those annotations can be done in a way that preserves python compatibility of the original source. maybe i can even look at multiple-dispatch packages to make an auto-type discovery based on doctests and profile tests to make my augmenting .pxd files for me. probably the first thing i should do is try to replicate some of the speed-up results from the cython numpy tutorial. there's also another tutorial with a convolution example (similar to chapter 10 in the manual, but slightly updated including warning about None args and calling numpy c funcs from cython) that might be a better place to start.

chapter 11 in the user manual gives an example of how to profile with python's profiling tools, while the wiki page on profiling shows a (partial) example of low-overhead profiling with valgrind. debugging with ddd and gdb is also discussed in a wiki page at wiki.cython.org. also, since a discussion on their email list in april/may 2009, an --embed option has emerged that facilitates embedding a python interpreter with a python entry point. i had been looking at pyinstaller, which looks like a very nice cross-platform way to make standalone executables from python. maybe i won't even need it with cython --embed. details on the above + more cython tips on the wiki. okay, looks like the readme for cython_freeze says cython --embed can only put one module in with the interpreter, while cython_freeze can put any number of cython modules in there. also, it can put a normal interpreter shell in, not just a 'main' module so you can use it interactively. not sure if cython --embed can do that. also, i should look into using plipy for this sort of thing. it will allow the executable to run on any linux machine, from ancient redhat boxen to the latest ubuntu machine. 'twould be nice to preserve some of python's portability when i freeze it to binary. and there's a link there to portable python, which seems to do the same thing for windows. also, virtualenv is worth a look. lots of people endorse it.

https://translate.svn.sourceforge.net/svnroot/translate/src/trunk/virtaal/devsupport/profiling.py
saw a ref to this, which appears to be a way to profile python callables with KCacheGrind.

Saturday, July 25, 2009

python webapp frameworks

been looking at python webapp frameworks lately, for some reason. spyce had caught my eye, and it looks relatively easy to get started. i have dabbled with cherrypy before, but not enough really to know the fundamental differences between them. django is one of the big ones, of course, and integration with google app engine gives it a boost. it seems to be built for a need to integrate with database backends, and generally looks like overkill for anything i would be doing. turbogears is also big and composed of a number of packages. also overkill for me. interestingly, they switched from cherrypy to pylons. i wonder why. pylons itself contains a collection of 3rd party packages, including paste, webob, etc. i generally applaud code reuse and non-wheel-reinvention, but if i just need something small and simple i don't want to have to climb 5 different learning curves. zope is another behemoth that is behemothy in an interesting, oo way: the content is represented as python objects in a persistent database. pyjamas looks interesting as a python to javascript converter, if i ever need to avoid writing javascript. (hmmm. i wonder if i could use this for adding features to embedded 3d objects in pdf files.) web2py is billed as a database-driven framework, but it also is supposed to be easy to learn as it was developed as a teaching tool. getting started looks pretty straightforward, as you can just run it and edit stuff through the web admin interface. it also has a built-in shell (with ajax)and error logs. i think i might need to check this one out more thoroughly, as it might be particularly convenient for remote admining my headless server. quixote and porcupine i only glanced at in passing, failing to see anything really compelling.

Friday, July 10, 2009

mlb gameday audio on linux

well, the free baseball audio i used to use doesn't seem to work anymore, so i've been looking for ways to use the mlb gameday audio service. the problem is, they assume i want to spend the whole day sitting in front of my windoze machine listening to my web browser. so i found the excellent mlbviewer project. the dependencies were mostly easy to resolve: pyxml and simplejson have gentoo packages, and suds just needed an easy_install suds. rtmpdump, however, has been declared an enemy to truth, justice, and the american way. almost every site that once hosted a copy seems to have taken it down after getting peed on by adobe's legal department. (fritz hollings, still keeping the country safe from technology.) i had to get a copy of rtmpdump-v1.5 from a bsd repo, since mlbviewer claims not to work on newer versions (although i seem to recall seeing a changelog entry noting that 1.6 would...). i had to add an include for stdint.h in rtmppacket.h to get it to compile on linux. make rtmpdump, make streams worked fine after that. don't forget to run Patch.sh in rtmpdump-patches/ that comes with mlbviewer. it also needs a very recent svn version of mplayer; the install file shows you how to check it out. ./configure --enable-dynamic-plugins, make, and you're done. i built mlbviewer-0.1alpha12 in its own directory, with a dedicated rtmpdump and mplayer. the install file says to install these to root, so they are found by mlbviewer. but it turns out mplayer and rtmpdump_x86 are only referenced once in the whole mlbviewer package (other than test/), so it's not hard to change the paths. set PYTHONPATH if you don't install to root python setup.py --prefix=/path/to/mlbviewer/installation install mkdir ~/.mlb cp MediaService.* ~/.mlb/ set user and pass in ~/.mlb/config i think that the -dumpstream -dumpfile options will work with the mplayer called from mlbviewer, so this could be a nice little audio tivo for yours truly. and mlbviewer comes with other handy tools for finding out which files are available, for some easy cron automation.

EDIT: managed to compile rtmpdump 2.2b, which claims to support everything (including rtmpe). had to point it to some includes with

export XCFLAGS='-I/home/epd-6.0.1-rh5-x86/include'

export XLDFLAGS='-L/home/epd-6.0.1-rh5-x86/lib'

seems to work just fine with mlbviewer, even without any of its patches for 1.5. mlbplayer warns that you must compile a svn version from the last 2 weeks (before its release 2009-06-11) for it to play stably. ubuntu's current version (2:1.0~rc3+svn20090426-1ubuntu10.1, SVN-r29237-4.4.1) of mplayer plays the flv (with a mp3 extension!?) file dumped out by test/gamedayaudio.py, though it does get the length wrong which might cause problems with seeking. otherwise, maybe the warnings apply more for video than audio.

Friday, June 26, 2009

scanning with hplip

had to reboot my computer and remember how to make the hp officejet scanner/printer scan for me again. turns out hplip won't run if cupsd and dbus are not started, and they weren't by default. hp-check is pretty handy for finding these kinds of problems. then, i actually use scanimage to do the scanning. i think it's part of the sane-backends package rather than hplip, but it works for me. scanimage --help shows scanner-specific options. this gives a good 150 dpi scan of a letter-size page: scanimage --resolution 150 -x 216 -y 279 > output.pnm the easiest way i've found to make a pdf out of the scanned pages is first to convert each one to a png, then to a pdf (both times with convert). then, use pdftk in*.pdf cat output out.pdf to assemble the pages. pretty good compression with the png conversion (even without forcing greyscale or b/w) and verified to work on windows.

Tuesday, June 2, 2009

ica and quantitative finance

this site has some interesting papers on quantitative finance. in particular, i think the report on quant education would be an interesting read, even though it is a bit old now. the other paper, 'a first application of independent component analysis to extracting structure from stock returns' is the earliest reference i've seen on ica on financial data. now a number of people have been doing it, with mixed results imho. but there is a good point to be made here in that, if you are assuming independence, why just look at correlation? why choose an orthogonal basis orientation based on reprojection error L_2 minimization? why not look at mutual infomation or higher order moments and cumulants? if there are components that are uninterpretable, it is self-deceptive to force them to be small artificially and it will probably lead to overly optimistic estimates of risk. truth is, i have two goals for modeling log price relative time series: classification and time-windowed average estimation. for classification i want independence, and for the time averages i want to minimize time-averaged error (not necessarily time-averaged error^2). not only is amplitude significant; autocorrelation of the error time series is, too.

svn and xxdiff

svn and xxdiff don't play nice, so i wrote a little python script to clean up the args svn gives to an external diff. svnxxdiff: #!/usr/bin/env python import os import sys as = sys.argv i = as.index('-L') as[i] = '--title1' as[i+1] = '"'+as[i+1]+'"' i = as.index('-L') as[i] = '--title2' as[i+1] = '"'+as[i+1]+'"' as[0] = 'xxdiff' as.remove('-u') os.system(' '.join(as)) this alias saves some typing: alias svndiff 'svn diff --diff-cmd ~/local/bin/svnxxdiff'

EDIT: or i could just use tkdiff, part of the tkcvs package. it's smart enough to compare against latest repository version if there's only one file arg.

selling a home

it's a buyer's market, but a couple of websites might help those sellers out there. www.homegain.com has valuation tools and gives tips on staging. http://www.giftnetonline.com is a service that keeps a home in shape to sell while the owner is away.

Friday, May 8, 2009

free online baseball

just found this great site that links to online sports radio stations that broadcast baseball games: http://www.freebaseballradio.com/ and with the magic of mplayer, i can time-shift redsox games (my personal audio recorder) mplayer -dumpstream -dumpfile redsox-2009-05-07.asf mms://wm-live.abacast.com/northeast_broadcasting-wfad-32

Thursday, May 7, 2009

probabilistic planning

apparently some people have addressed the problem of non-deterministic project planning. deltek risk+ is an add-on for ms project that handles pdfs for task duration. liquidplanner is web-based planning + groupware for $25 to $35/mo/user (free for nonprofit/educational). looks like it handles duration intervals (with uniform pdfs?) and computes probabilities for meeting deadlines. maybe i should check out their training/demo videos and 30 day free trial. pertmaster adds monte carlo to the primavera (bought by oracle?) planning software. i've seen a couple of web rants about this subject, and the liquid planner people say this is their reason for existance. i should give them a shot before doing my own. at the least i will learn from it, and maybe i can save myself the trouble (though i still want the probabilistic success/fail with alternative tasks). if i do write my own mc code, hooking into openproj, i think i can use mpxj.sf.net. it's the lib used by openproj to read/write ms project xml files, and it will let me interface (mostly) at the input file level. also, i looked at the files in all the .jars that come with openproj. it looks like what i need to get the computed summary task/project duration will be in openproj-1.4/openproj.jar

recording from the mic

here's a handy one-liner script that lets me record from the mic in. #!/bin/env bash rawrec -t 18000 -s 8000 -f s8 -c 1 -v $@ then i can compress it losslessly with this: flac -V --best --endian=little --sign=signed --channels=1 --bps=8 --sample-rate=8000 --lax out4.raw -o out4.flac and decode with this: flac -dec out4.flac -o out4.wav

Thursday, April 23, 2009

probability and project management, continued

one question that immediately arises with probabilistic scheduling: how do i define pdfs/cdfs for individual tasks? network combinations of tasks are straightforward from there, and the averaging that happens probably makes the small-scale choices less important as long as there aren't bias errors. it seems popular to use the beta or triangular distributions with upper and lower bounds and a most likely time. the bounds are important, i think, because no task will have <= 0 time and eventually i will need to quit. to make this work with the idea in the last post about a success/failure + time joint pdf, i would need pdfs for both success and failure and a probability of success. there are constraints, though; the success and failure marginal pdfs should have the same upper bound. (once success is no longer possible, failure is guaranteed.) they could have different lower bounds. (the time required to realize something is impossible could be more or less than the minimum time to succeed.) now here's an interesting question: will my pdfs tell me to try an alternate task before i've failed the first one? in other words, can a marginal pdf conditioned on a minimum time (the time i've already spent on it) tell me that my chance of success is so low that i might as well try the next approach? this could very well happen if the most likely time given success is shorter than the most likely time given failure, and i reach the time in between the two without finishing. hmmm, this could be useful. but it also makes the order optimization more complicated if i need to assume that i will switch to the next alternate before failing. project management people have acknowledged the fact that it's hard for people to estimate probabilities in the absence of data. i could at least evaluate my estimation in hindsight, however, by looking at the distribution of the estimated cdf value at the realized time. under the null hypothesis of always getting the right distribution, this empirical distro should be uniform on [0,1].

graph theory, probability, and project management

many buzzwords and visualization strategies have floated down the project management river over the years: gantt chart, pert, cpm, event chain methodology, etc. one thing they all seem to have in common, though, is the underlying representation is a (possibly hierarchical, as with gantt or event chain) directed graph. so maybe the particular representation is not so important, and there have been few real advances in the way of thinking about project managment over the last hundred years. the one rational explanation for the diversity of planning methods i can see is in the application. sometimes, as in a factory/operational/repetitive setting, a process-based approach makes the most sense. one-time projects are fundamentally different, and this is where graph-based methods come in. but many of those, like construction projects, have well-defined, already proven tasks even though the overall project is unique. there may be a little uncertainty in task duration, but there is not much uncertainty about success or failure on the small scale. these are also projects in which resources are easily scaled; pour in more materails, people, and workspace to reduce the project time. that is why so many of these graph methods focus on time-critical paths and slack times. there is an assumption that the completion times are dependency-constrained, and there are people available somewhere to work on any given task as long as you give them the required resources. r&d projects are unique one-timers, which would indicate a graph method. but now there are significant probabilities that any given task on the small scale could fail, so alternate workaround processes are needed to mitigate failure risk on the large scale. also, the mythical man-month principle comes into play: many times these are small projects with a few highly specialized people (maybe just me!) and trying to bring in other people will only slow things down. so concepts like parallel work paths with slack time are irrelevant. i will have to do both parallel tasks even though they don't depend on each other, and the only question is which to do first. methods like pert have always had a concept of task duration as a random variable. maybe i could extend this to a joint pdf with a succeed/fail discrete random variable. then i could have multiple nodes with the same input/output edges that represent alternative approaches to generate the same products. with two possibilities as an example, i would need to choose one to try first and only do the other if the first failed. the time rv given failure of both would not depend on the order; it is the (scaled) convolution of the two failure pdfs. the time rv given success would be the success pdf + the convolution of the failure pdf of the first and the success pdf of the second, so it would depend on the order. in a practical application, i don't think i would bother to compute convolutions. most graphs i'll be dealing with will be small enough that i could just monte carlo the bejesus out of it. with a hierarchical representation, i would only need to consider a part of the total project at a time, just chopping out a chunk for which all possibilities have all the same inputs and outpus. each choice of n possible approaches will have n! orderings, each with a different time pdf given success. but i should be able to throw away many of them automatically if there is another cdf which is greater for every time and therefore indisputably better. more subjective criteria will be needed to choose between cdfs that cross each other. (maybe which one is greater for >50%, a more-likely-better criterion?[1]) to choose between which of parallel tasks i should try first, i should take the attitude that, if the project is going to fail, i want to know as soon as possible. so i should seek short time rvs given failure. i think it's the complementary case to alternatives; i can think of all alternates as being in the graph in series and, when an attempt succeeds, all the following alternates collapse to zero time. for parallel tasks, the rest of the graph disappears if one necessary task completely fails (with no available alternates). in both cases, i want to hit the collapsing event in the shortest time. for parallel, analogous to complete failure of alternates, the total time given success (and the probability of success) in all of them is the same regardless of order (assuming a one-person project). so order matters only for eventual failure for parallel. the discussion above assumes all individual tasks have independent pdfs. if my joint pdfs are conditional on other events, i think i should take that as a sign that i should split up the tasks rather than actually try to deal with probabilistic dependence. so this approach would tell me what i should be working on at any given time in my individual work, thereby avoiding inefficient personal biases and procrastination, etc. i'd like to try it, but i want to make sure i can use a file format that will allow easy data input with a gui and final gantt-chart-like output. i think i'll check openproj and taskjuggler for this. [1] this will not always produce a unique ranking. for example, with 3 cdfs with three orderings 1>2>3, 3>1>2, 2>3>1, each with 1/3 probability, will give three circular comparisons: 1>2, 2>3, 3>1. but this criterion might at least weed out some that are worse than all others.

activity-on-node vs. activity-on-edge

the old pert planning book from the 60s that i'm reading uses an activity-on-edge approach, with an appendix contrasting this approach with activity-on-node. at first i was thinking that aoe had conceptual advantages, but i'm not so sure anymore. and it seems that every project management software out there today that has a network representation uses aon. so what is really the difference? with aoe, the edge represents a work process. people say the nodes represent events, but i tend to think of events as things that are externally imposed. i think of nodes on an aoe graph as collectively defining the project _state_; ie, the nodes that do not depend on any unfinished edges at a given point in time define what work has been completed and what can now begin. with aon, the nodes represent a work process and the edges represent products/resources. each task depends on incoming resources and sends its products out through the graph edge pipelines to other tasks. some dummy nodes might be necessary for interface events, just to make the math work, since true deliverables and external resources are dangling edges.

planning quotes

couple of quotes about planning that i find useful: "In preparing for battle I have always found that plans are useless, but planning is indispensable." Dwight D. Eisenhower (1890-1969) i've seen many variations of this, and even attributions to other people. but this one is from the columbia world of quotations, so i think it's more authoritative. it's a pithy summary of how the mental discipline and attention to detail required by planning is usually more important than the plan produced. "the contemporary advanced-technology project demands advances that cannot depend on accident or chance; breakthroughs are increasingly the result of steady, planned, technological pressure under economical and political conditions which demand advances -- on ever-shortening time cycles. we have arrived at a point where we must _schedule_ creativity and invention, as well as the production which follows right on its heels, and must attempt to predict with some accuracy when operational hardware will be delivered and what it will cost." archibald and villoria, network-based management systems (pert/cpm), p. 77 i like the way this quote rejects the attitude that creative advances are haphazard and serendipidous. if i can plan well enough, i can predict and depend on them.

the success or failure of projects is determined in the first 10% of their lifetimes. fergus o'connell "how to run successful projects" (1994). very true in instructor's experience. you have to start the right way and know how you'll get there.

definition: a project is a unique exercise which aims at a defined outcome. it is not part of otherwise routine operation.

don't use microsoft project; it is useless. people are notoriously bad at estimating time required for tasks. ususally it takes about twice as long as you think. visualizing task dependence can help to envision concurrence and keep track of the critical path (longest minimum time, not necessarily most important) which can change.

Friday, April 17, 2009

project management software

i've been searching in vain lately for software that will magically make my projects organized. i'm most interested in using network-based methods, though these seem to be out of fashion since their peak in the 60s. the best book i've been able to find so far on this is 'network-based management systems (pert/cpm)' by archibald and yilloria (available pretty cheap on amazon). it's so old that most of the computational stuff is probably worthless; just monte carlo the thing and go home. the thing is, i don't care so much about resource leveling, since i'm the only resource. i do care about uncertain time estimates, but i would like to go beyond that to account for uncertain success for task alternatives. i tend to do more r&d type projects, where there is more than one way to approach a problem but not all approaches will work. i want my project plan to help me decide what to try first, either to eliminate unlikely possibilities or to confirm prerequisites quickly. i think this can fit the old network methodology if i use boolean inputs at event nodes or (probably better) i use a hierarchical network where multiple approaches are abstracted as a higher-level activity. i can't believe no one has done this before, but i can't find any refs. i guess the typical gantt/work breakdown structure has a hierarchical structure, so maybe i can take advantage of that when interfacing project management software. taskjuggler uses text format input files for gantt-chart-like dependencies that i might reasonably generate from a network topology. that would allow me to make reports that are intelligible to other people, but i would still have to write all the code for the network operations, especially dealing with probabilistic time and success representations (probably unavoidable anyway). openproj.org is another one worth looking at. relatively feature-rich, including hosting and (i think) some m$ project compatibility. probably this or taskjuggler would be the best to work with. definitely best for simple and/or one-offs; i've used it this way before, on windows and linux. kplato is less than impressive, and i saw some mention of them wanting to merge with taskjuggler anyway. www.sharedplan.com offers hosted planning. nice for security and collaboration, but i'm not sure what exactly their features are.

Tuesday, March 24, 2009

circuit protection

i've been thinking that if i want to be a real grown-up with probes or circuit-pc interfaces, i'll need to use some protection. ptc resistors are cheap and easy overcurrent protectors. they basically increase in resistance by orders of magnitude when they pass a critical current. digikey has one that trips at 25 mA, about right for mcu io. varistors (eg mov) and avalanche diodes have a complementary role, passing large currents at critical voltages. so maybe a transient voltage suppressor in parallel with a load with a series ptc (with maybe a ~high power resistor to absorb the energy) could protect against both esd and stupidity in a cheap and easy way. according to littelfuse, though, the tvs can have too much capacitance to use them on high-bandwidth signal lines. they recommend their powerguard pgb0010603, which mouser carries for ~30 cents each.

avr microcontrollers

some frustration with the lack of good open source high-level language support for pics has lead me to investigate accessibility for the avr and other platforms made by atmel. apparently isp is at least as easy, judging by this parallel port programmer. other designs leave out the pull-up resistor on reset and drive it directly; some even leave out all the current-limiting resistors, leaving nothing but a homemade cable. can't get any simpler (and less safe) than that. (one note: some people use pin 18 instead of 20 and 21 on the parallel port for gnd.) plenty of designs out there for usb programmers, too, so i could use my netbook to flash an avr. but i think they all use an avr, so the 2-resistor programmer will always have its place. i couldn't find any refs on using the parallel port to power the mcu during programming, but i guess it doesn't matter since it will need power from the circuit to test/use. still need to familiarize myself with all the available options for avrs. here is the atmel product guide with feature tables, etc. apparently atmel is much more stingy with samples than microchip, so i'll just have to order some from digikey or mouser. gcc has an avr backend! it seems almost too good to be true. and there is the pymite project; still looks young but under active development. if i go far into embedded stuff, i'll eventually need to use a rtos, like freertos, to keep my code modular and simple and to give me some platform independence.

Thursday, March 12, 2009

network hard drive vs. vista

managed to get my network hard drive mapped in *blech* vista *blech* *wipes mouth with back of hand* thanks to the hard work of others. this guy told me the right setting to crowbar into the registry as well as the 'net use' command, since the smiley-face, hand-holding "Map Network Drive..." still won't work. ('the thing i like about windows is not having to use that command line thingy...' harumph!) HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\LMCompatibilityLevel from a 3 to a 1 net use Z: \\I-drive\Share password /USER:username /PERSISTENT:YES now i just need to figure out whether i should use ftp+fuse or smbmount on my linux box.

Thursday, March 5, 2009

easy svn server

here's a very handy command to use a remote subversion repository without needed a svn daemon running all the time. for example, to check out: svn co svn+ssh://username@server.name/path/to/repository ssh starts and stops the server for you. can't beat that!

Tuesday, February 10, 2009

microlending

there are a few microlending efforts out there that i think are worth a closer look. kiva allows lenders to review and select loans to make. microplace looks like more of a fund for investors interested in providing capital for microlending. i should check these out, along with the others reviewed and ranked in the forbes article.

Tuesday, January 27, 2009

sympy

version 0.6.3 of sympy has c and python code generation. now i can do symbolic math and have very fast computation of the resulting expressions. cool! In [2]: f = sympy.sympify('exp(-x**2)') In [3]: sympy.printing.ccode(f) Out[3]: 'exp(-pow(x,2))' In [4]: sympy.printing.python(f) Out[4]: "x = Symbol('x')\ne = exp(-x**2)" also has some built-in plotting, which would be more interesting if it weren't so dangerous (pyglet crashed x twice now). but it does have plotting.textplot, which is a cheap and easy way to plot stuff in a terminal. brings back memories of the ol' hp48g. the mpmath module that comes with sympy also has a plot module that promises convenient expression plotting. too bad it doesn't seem to work yet.

sfepy, symfe

sfepy is a finite element code written (mostly) in python. wow! i gotta check it out some time. produces convenient output formats (eg, vtk, hdf5), input files are python scripts, etc. looks like it will eventually use symfe to do some symbolic computations with sympy. symfe looks like it's been stale for a little over a year, though. i hope these projects don't just die.

py

more python goodies, from the people who are working on pypy. py has a test utility, name export control, greenlets, local/remote/svn filesystem access, etc. it also has some channel-based remote execution capability, with no remote-side server installation/maintenance. might make a good light-weight alternative to pyro. utilities for handling code object, too. if i ever do an object caching database with distributed execution, i think this is the way to go. auto api docs based on running instances. cool. definitely the simplest greenlet implementation i've ever seen; i might actually be able to understand and use it. i know pypi has a greenlet module for a coroutine paradigm; i wonder if it's the same. better than the improved generator type of greenlet because you can switch anywhere in the call stack. the tutorials by david beazley are recommended.

Wednesday, January 21, 2009

ksplice: hot-patch your kernel

ksplice is a tool for patching a running kernel without rebooting. i should check it out some time, especially if gentoo makes a package for it.

Wednesday, January 14, 2009

investment optimization functions

i already know that an objective function for an investment optimization needs to have more than just the expected value of the rate of return (though i think this should be the o.f. if constraints are applied as parameter boundaries). otherwise, i would be saying that i would tolerate an unlimited increase in any risk metric for a small increase in expected return, and that's not very smart. so, what else do i throw into the o.f.? the sortino ratio looks interesting. i think it makes a good critique of the sharpe ratio: upside variability should not be punished like downside variability. maybe a variant of it that, like the information ratio, uses a benchmark rather than a risk-free asset for active return. the wikipedia articles on these list others, and the upside potential ratio article refers to an article comparing it to sharpe and sortino ratios. value-at-risk and expected shortfall are both easy to compute, given a pdf on returns. i know value-at-risk has taken a beating. but even though expected shortfall has nice mathematical properties as a coherent, spectral risk measure, it is sensitive to errors in the distribution in the tail. i'm nervous about estimating distributions on something that provides few data by definition. hmmm, that makes me wonder... are there financial risk metrics based on extreme value theory? one thing they all have in common, though, is that they are functionals of the rate of return. the rate of return is a function of the time to reach a limit order, the price relative, and the transaction cost. so, given a pdf for time as a function of price relative, and the function defining rate of return, it should be easy to wrap an optimizer around any of the objective functions above. EDIT: i think the upside potential ratio is the same thing as the sortino ratio, differing perhaps only in the context. the information ratio is the same as the sharpe ratio, except that it uses a benchmark rather than a risk-free return for comparison. looks like the sharpe ratio, sortino ratio, value-at-risk, and expected shortfall (and probably many/most others) are not only funtionals of the rate of return pdf, they are functions of first and second partial moments. if i can boil those partial moments down to a small set, it would be possible to define a multi-objective space in which constant-value contours of the various financial objective functions are simple manifolds. the advantage of this is that i could find a pareto-optimal front in that partial moment space, and it would be easy to see not only the trade-offs but also how sensitive the o.f.s are at any point. otoh, maybe it would be just as easy to look at an overlay of all the o.f.s plotted in the control parameter space, and see values/sensitivities that way. at the very least, splitting up the moments into partials would facilitate simultaneous computation.

Friday, January 9, 2009

Elementary Calculus of Financial Mathematics

this might be a book worth reading: Elementary Calculus of Financial Mathematics, by A. J. Roberts December 2008 / xii + 128 pages / Softcover / ISBN 978-0-898716-67-2 List Price $59.00 / SIAM Member Price $41.30 / Order Code MM15 some of the description sounds like stuff i already know, but the idea of seeing solutions to some practical problems is appealing. www.siam.org/books

Blog Archive

About Me