Thursday, December 23, 2010

dashboard and screen scraping

one thing that's been on my low-priority radar is a way to scrape through the complex flaming hoops that banks, credit cards, and investment brokerages put up so i can have an auto dashboard, showing me account balances and net worth at a glance. mechanize looks like a nice package for performing many browser functions, including form interaction; probably the best of its kind i've found (and nice faq). however, it does mean writing a browsing session from scratch (read: lots of online debugging) and i'm not sure how well it can handle javascript, frames/windows, and all the other eye candy screen junk these sites like to throw at you. someone out there recommended pyxpcom (combined with pydom in pythonext) as a way to do anything mozilla can. i think that must be true, since it seems to be just the pieces that mozilla-esque browers are made of. as powerful and difficult to use as a build-your-own-ferrari kit. i think the most promising option seems to be selenium, which is apparently merging with webdriver for version 2.0. basically drives a real browser, but can record and play back scripts in a variety of languages (including python). the webdriver type of interface seems to be the future of selenium, and it has the advantages of better navigation and less to install. written in java, but i think it can do python (though the docs are behind if so). so i'm not sure if i should just wait for an official release of 2.0, but it does look like selenium is what i'm after. here's the doc on using ide. EDIT: did some more looking around with selenium, and wow! i love the ide/rc combo. i think i need to look at this blog post to get the most out of locators (css vs. xpath). some of the extra plugins for selenium-ide are worth getting, and the selenium.py module can apparently just be copied into the python path to use selenium-rc. 1.0.11 has firefox 4 support in the ide, but it's very recent (2011-04-12). they have put out a number of rcs for v2; apparently the v2 release is coming summer 2011. no remote control javascript server is necessary for version 2 since it's integrated with webdriver. i need to know if the ide and python export will still work. right now i think python will work, but no ide yet (though 2.0 is probably backwards compatible so might run the code generated by the version 1 ide). more selenium links: command locators, xpath/css/dom rosetta, css locators are faster than xpath, good info, stay up to date,good example, managed to get selenium python bindings installed on a windows machine (not surprisingly, a bit more involved than on linux) with my epd python. had to manually download tar ball, python setup.py install, and manually create the test dir structure that it would then complain about. maybe there's an option to make it skip tests, but the kludge was faster than looking that up. now i have selenium 2 with the webdriver interface, much better than rc! and btw, my experiments confirm what others have said about locators: css is much faster than xpath, even on firefox. i've also found that, while the selenium ide is really good for getting started with the locators, it's often possible to find shorter, more informative, and likely more stable tags and ids by poking around in the html just a little rather than using the first thing that pops up in the ide table. so i'm not going to try to keep a drop-in interface to call into the ide-generated scripts; cut-n-paste of one-liners will be good enough for both dev and maintenance. still, there is tremendous value in starting with something that works, and that alone makes the ide worth the install. some other things i've learned: the 'andWait' stuff is only relevant from the java interface. in python, there's no way to keep running asynch while stuff is still loading. click, get, etc. only return to the python script once it's fully loaded, so that can be a latency bottleneck. i did poke around and find a possible place to change that, but i'll see if i really need to.

Monday, December 13, 2010

scraping off microsoft's look and feel

visually impaired people have at least one thing going for them: the high contrast windows theme is much better than windows standard. and the opticwhite theme for google chrome helps as well. now i don't feel quite so out of place and my eyeballs won't melt.

Saturday, December 11, 2010

virtualbox

holy grail: windows 7 and ubuntu dual boot, with either also running under a vm in the other. i've been looking around for a good virtual machine; some of the old ones seem kinda dead or just emulate (read: way too slow) (bochs, plex86) or are too slow except on linux host or maybe forked off somewhere else (qemu/kqemu or whatever). xen looks pretty good, especially with its capability to run a guest os off a partition. but it only runs on a linux host, so that's only half the answer, and i'm not sure if anyone has gotten it to use an existing windows install. virtual box can run on windows or linux host, and it can run either guest at virtualized native speed. and maybe i can get it to run an existing windows or ubuntu from the other. one problem with booting the oem windows partition is that setting up virtualbox from linux will require bootrec.exe from an install (not recovery) dvd. (colinux might be able to boot an existing install, but colinux is limited to one processor atm.)

Friday, December 10, 2010

getting spyder and python to work on windows 7

had to struggle to get my python install working with numpy and spyder, probably because i copied them over from another install.
with spyder, i had to move the 2.0.0beta5 egg and give the --prefix option to setup.py in order to install and use 2.0.3.
with numpy, i was getting 'ImportError: DLL load failed: The specified module could not be found.' when i tried to import numpy, _unless_ i was in the Python26_64\Scripts dir. i think it's because of the mkl lapack dlls in there. but it all worked once i added that to my PATH (not PYTHONPATH) with the help of cygwin -- just using export in bash allows me to peek at what it would be in windowsese with os.environ['PATH'] so i could put it into the env editor in spyder. viola!

sshd on cygwin

got a new machine with windows 7, and i think i'll have to actually use the windows partition. ugh.
fortunately, cygwin comes with a kajillion unix packages that make microsoft bearable. and i just got my openssh server up and running, so i can still log in from home or elsewhere. here's how:
first, i followed the steps here to get rid of any old failed-attempt kruft. then, even though i did 'run as administrator' on the cygwin bash startup bat, it still gave me warnings when i ran ssh-host-config and tried to use my windows user for running sshd. so i went back and did all the editrights lines. rerunning ssh-host-config (probably unnecessary) gave no warnings, so i started it up with 'cygrunsrv -S sshd' as suggested here (also used the 'tty ntsec' for CYGWIN, as he suggests). and it Just Works.

Monday, December 6, 2010

the ascent of money

just watched a documentary on pbs called 'the ascent of money' (google it, easy to find). nice 4-part series on some of the history of finance and how it ties in with the history of the world. should be required viewing. i hear the book is good, too, but the video is easier for lazy people like me.

Saturday, November 27, 2010

potato-powered web server

when i see pages like this, it scares me a little to think how similar i am to this person.

Tuesday, November 23, 2010

15.433

from ocw 3 harry markowitz, 'portfolio selection', journal of finance, march 1952, u. of chicago (grad student) 'the process of selecting a portfolio may be divided into two stages. the first stage starts with observation and experience and ends with beliefs about the future performances of available securities. the second stage starts with the relevant beliefs about future performances and ends with the choice of portfolio' my code concerns second part for computation, but could also guide quantitative, rational, unemotional thought and validate assumptions made in the first part. s&p 500 returns look nearly bimodal, with a larger positive mode and smaller negative mode 10 year treasury bills returns have small tails 9 fama french three-factor model connections to other ratio patterns estimates for the factors 1963-2000 momentum (short-term positive correlation, long-term reversals) most studied anomaly in finance (2000) 10 equity option valuation risk neutral pricing binomial trees put/call parity black scholes formula implied volatility survey: why do institutions use options? at page 200, printed landscape (should be portrait)

Monday, November 22, 2010

thinking in c++

both volumes of the 2nd edition are freely available online, as is a draft version of 'thinking in python'. they're all a bit old, though they look like worthwhile reads.

the office

dwight cracks me up: http://www.noob.us/humor/the-office-fire-drill/ http://www.youtube.com/watch?v=zthtgZNJ5sc&feature=related http://www.youtube.com/watch?v=oLsOS362hkM http://www.youtube.com/watch?v=cbm5EoAm1bI

Wednesday, November 17, 2010

c++ design patterns and derivatives pricing (2nd ed)

just finished a nice quant book by mark joshi. still need to go back through the exercises, but it was a nice taste of some basic but real quantitative finance data structures and their c++ implementation, as well as more general c++ and oo issues. here are a few points that i think i should remember: open-closed principle open for extension, closed for modification (don't require editing files to extend their functionality) const good for discipline, safety, and optimization rule of three with no declared copy constructor, the compiler will do a shallow copy if destructor, assignment operator, or copy constructor is defined, define the other two, too rule of almost zero almost never declare any of destructor, assignment, or copy use smart pointers instead: shared_ptr, scoped_ptr, or Wrapper to avoid memory leaks 'almost': class with abstract methods is likely to be deleted by pointer to the base class, so it needs a virtual destructor stlport.org provides an stl with range checking, useful if you're not using visual c++ 8.0 (it already has debug mode) boost.org is intended to become part of c++ standard quantlib and xlw commands never to use malloc free new [] delete [] throw exception: ok in constructor, never in a destructor new is slow floating point errors do not cause exceptions by default, but they can be made to pimpl private implementation: one class is visible to clients with no data except a pointer to another class, which is defined in the source file in an unnamed namespace difference between encapsulation and insulation: private changes do not affect classes protected by encapsulation, but fail to insulate because all clients have to recompile lots of good refs like 7, 11, and 18

Tuesday, November 2, 2010

ath5k woes

okay, is everyone else out there as bugged as i am about the ath5k driver? it seems like every ubuntu maintenance update i've done over the last couple of months on my acer aspire one has made it worse. i just downgraded back to 2.6.32-21-generic, which works pretty well (though i did get a bunch of 'ath5k phy0: failed to wake up the MAC Chip' and 'ath5k phy0: can't reset hardware (-5)' after i let it run for a while). i've been have other hardware problems with the ssd drive and the web cam, but i think this one is a driver issue given the chatter i've found on mail lists and bug reports. i know it's oss, so i can't really complain, but i'm really looking forward to this being resolved. i wonder if i should lag my kernel updates and/or switch to madwifi or something.... EDIT: don't think madwifi is the answer, since it looks like their dev effort got folded back into the ath5k. at least, i couldn't find a madwifi version that would both compile with my kernel and work with my pci-e wifi. but i seem to recall looking at the L0s, L1 ASPM stuff before with the unsupported jumbo problems, and i noticed that now all aspm support is turned off by default on my card. (dmesg even says the pci driver explicitly decides to do it.) this change was apparently made because junky old pcie wireless cards, like i have in my acer aspire, get the unsupported jumbos in L0s. but L1 is supposed to be a mandatory part of the standard (L0s is optional). so maybe something is trying to shove it into L1 without checking that that's enabled. i tried turning L1 back on with the enable-aspm script (with root complex 00:1c.2 and endpoint 03:00.0) but it didn't seem to do anything. so i put the pcie_aspm=force kernel option into /boot/grub/grub.cfg, and that worked. in fact, it only enabled aspm for the atheros and its pci-e port, and only L1 for that. i guess the ath5k devs or somebody actually put code in to check if it can do it. so that's good; shouldn't get any of the jumbo problems, even with the =force. so now we'll just see if this helps with the 'ath5k phy0: failed to wakeup the MAC Chip' and 'ath5k phy0:can't reset hardware (-5)' problems.

Saturday, October 30, 2010

booting from sdhc on acer aspire one

finally got my aa1 to boot with root on the sdhc card. these instructions are really good, and this reference is good, too. (though he is of the opinion that the nand flash write cycle limit is nothing to worry about, and i used to think that, too, but.... i wonder if my swap and ext4 fs had anything to do with the built-in ssd drive crashing so completely and painfully. this time i'll be going ext2 retro, just to be safe.) had to boot from a usb drive (yeah, ssd is that hosed... can't even spare a few mb for a boot partition. heck, i can't even alter the partition table.) but it seems to be booting. now i wonder if i can hack this case open and put the usb drive in the unused bay...

Thursday, October 28, 2010

flashed my aa1 bios

my acer aspire one seemed to be having hardware issues, so i flashed the bios from 0.3310 10/06/2008, InsydeH2O Rev. 3.5, vga bios IntelV1585 to... the same thing. it's the latest version on acer's website, but i reflashed it anyway in case it had been munged somehow. i used the FLASHIT.EXE and zg5ia32.fs method that doesn't require any freedos junk. but it doesn't seem to make any difference.

Wednesday, October 27, 2010

mutual information for linearly dependent rvs

i can't remember if i had recorded this anywhere, but i want to make sure i have it because it took me a little while to derive. if x1 and x2 are both uniform random variables on [0,1] and y = a*x1 +(1-a)*x2, then the mutual information between x1 and y is ln(a/(1-a)) + (1-a)/(2*a) nats, where ln is the natural log. this is a useful result for testing mutual information estimators because it's on a bounded domain so might converge more quickly than functions of exponents. the key to deriving this result is to realise that the marginal pdf for y is a trapezoid, and be careful about the log base when using the chain rule/integration by parts.

imperial websites

www.imperialbaseball.co.uk www.bright-futures.org.uk www.icfinancesociety.com

Wednesday, October 13, 2010

cheap hotels in london

http://golondon.about.com/od/londonhotels/tp/cheaphotels.htm tune hotel is supposed to have really cheap rooms available, but i can find them on the online reservations.

Saturday, October 9, 2010

PlayerPiano

cool tool for showing 'interactive' presentations of python code. you can type random keys and it looks like you're typing in a fake python shell while it just runs your doctests (like an old-fashioned player piano with the punchcard scrolls). fast, easy, and frees you up to think about what you're saying instead of what you're typing. just don't let anyone ask questions -- they might want you to change something!

Friday, October 1, 2010

impressive -- pdf slide presenter

cool little presentation slide viewer, does transitions, overview screens, highlight boxes, and cursor-following spotlights. written in python with xpdf as the renderer. runs on windows, mac, linux. also allows for hyperlinks, so my page and overview links from beamer should still work. not sure about embedded movies, audio, or 3d objects, but... still cool.Link

Friday, September 24, 2010

optimization of conditional value-at-risk

by rockafellar and uryasev good description of cvar and why it's superior to var (value at risk) for portfolio optimization. they use a math technique sort of like a lagrange multiplier (not really, but works in a similar way) to transform the otherwise 2-step procedure for computing and optimizing conditional value at risk to a single stochastic optimization, convex (piecewise linear). it can then be solved with either linear programming or a nonsmooth optimizer.
interesting side note: they use a sobol sequence to get superior performance over straight-up mc. (the wikipedia article on sobol quasi-random sequences is quite dense and hard to understand, but here's a nice article that shows a monte-carlo integration example with a finance application, refs to niederreiter, sobol, and faure qrs. bottom line: niederreiter is (maybe) best.) also shows examples of portfolio optimization and optimal hedging with a butterfly spread.
certainly this is a better read than the highly mathematical paper that introduced cvar (convex measures of risk and trading constraints by f\"ollmer and schied). all i can really remember about that one is the point about cvar being convex (while var is not) and why that's important: diversification (mathematically, linearly interpolating between two portfolios) should not increase risk. still, i think it's interesting that an industry-standard book on portfolio optimization like 'active portfolio management' by grinold and khan would brush off all risk metrics other than variance so lightly. maybe return distributions really are close enough to gaussian (with exceptions for derivatives, etc.) that it doesn't matter in practice, as they claim. at least anything that can compute those quantities should also give the variance for comparison. call number:336.767 GRI ID:2406743083 Active portfolio management : a quantitative approach for providing superior returns and controlling risk / Richard C. Grinold, Ronald N. Kahn. good book to have in the personal library, although there will be an updated version out later this year or next.

Sunday, September 12, 2010

nice fashionista blog

here's a blog that goes into some detail about why some types of clothing are higher quality than others, and how to take care of them. i don't normally go in for the gq stuff, and i currently can't afford anything in the blog, but someday i might have to. and i like the take permanentstyle has on fashion: permanent style, not semiannual closet churn.

Friday, September 3, 2010

pycxx vs. cython for python + c++

cython apparently has some capability for wrapping c++, with some workarounds and remaining issues as of v0.13. pycxx (or cxx) takes on the interface question from the c++ side, giving python a more c++ friendly api that can call/access either from the other (though i think python is still intended to be in the driver's seat).
i guess the question is whether i want to use more c++ (with pycxx) or a pythonesque third language (cython pyx).
i've seen some references to using c++ with weave.accelerate, but not sure if that has any advantage over either of the other two.

Wednesday, September 1, 2010

quant job interview questions and answers

nice book by mark joshi et al. recent, relevant, $35 print on demand from lulu. lots of advice and questions to prep for that interview and gauge how ready you are to apply. good refs, including the big one by author 1 ('the concepts and practice of mathematical finance'); xlw, the c++ plug in maker for excel api; and 'thinking in c++' (available free online from various mirrors).

Friday, August 27, 2010

beyond deterministic optimization

i've come to realize that my multiobjective optimization problem is far from simple, with a lot of functions with noise-induced multimodality (FNIM) (numerical noise creating local minima). so i've been looking for previous work on how to deal with it. the fnim literature i've found seems to be very navel-gazing in that they look at how global minima bifurcate at the noise level is increased. but not as much on how to overcome the problem.
the term 'robust optimization' seems to be dominated by some guys from stanford, mit/singapore, and israel who have a particular framework that is not so useful to me right now. they assume the data going into the objective function are from a distribution that is an unknown member of a set of possible distros, and they want to protect against the worst-case from that set while strictly obeying the constraints. i think it's more of a mini-max problem, and what i need is to account for the uncertainty without needing to be so conservative. also, most, though not all, of the applications have a linear programming bend.
i'm not sure if stochastic optimization is what i need, but i'll check it out.

Wednesday, August 25, 2010

proposition 8 decision

the emergency motion to stay judge walker's ruling is a well-written review and critique of the district court's prop 8 decision, especially the first 3 pages (even though it's a summary, it's a thorough summary). actually, the whole thing is pretty good although there are a couple of pages in the middle that are more dense.

Friday, August 20, 2010

trading hocus pocus

i think i found the source for that reference to the use of fibonacci numbers in trading. there's a charlatan in peddling books and software based on numerological mysticism. i can't believe people are paying $500 a pop for his stupid software and >$150 for the book. really makes me wonder why i can't put together some of my code and sell it, if anyone is willing to buy that bovine byproduct. check out this gem from his website:
What's in the book: Fibonacci ratios as market tools.
For centuries, scientists, mathematicians, and artisans have discovered that the Fibonacci sequence of numbers is replicated throughout nature. This amazing number series defines the appearance of physical structures, as well as the PROGRESS OF CHANGE AND GROWTH governing dynamic structures and systems.
The Fibonacci sequence is found in the structures of trees, plants, and flowers. A regular sequence of Fibonacci numbers occurs in the relationship between a particular branch and the next branch. Fibonacci relationships also exist between leaves and stems. Radio signals generated by pulsars conform to Fibonacci numbers. Research reveals that many natural crystals contain the Fibonacci golden ratio, 1.618. There is even speculation by Dr. John Penrose of the Institute of Mathematics at Oxford, that the golden ratio may provide the crucial link between the sub-atomic and the supra-atomic world!
Human behavior has a dualistic nature. We think and act both as individuals and as part of a crowd. A crowd has its own energy and collective mind. It can be viewed as a dynamic system and, as such, it is governed by the same laws that exist throughout nature. Since a crowd is a dynamic system, and since financial and agricultural markets exhibit crowd behavior, it follows logically that Fibonacci relationships should be intrinsic to all liquid markets. These all important numbers and ratios indicate areas of contraction and expansion in price-wave movements. The concepts of Fibonacci support, resistance, and expansion in price function due to the principle inherent in the unfolding of all natural phenomena, including the behavior of crowds and the collective mind. The real difficulty is in the PROPER AND PRACTICAL APPLICATION of this natural phenomena to price movements. It is precisely that difficulty which DiNapoli Levels is designed to address. The text puts it all together in a unified trading approach you can act on!
This book also dispels myths about Fibonacci analysis and only covers those techniques that are useful and practical to employ in real life trading situations.
oh, good. it dispels all the myths and only covers the practical parts of this nonsense.

Thursday, August 19, 2010

forecasting commodity markets: using technical, fundamental, and econometric analysis

1995 book by julian roche is a bit disappointing and maybe just dated; it mentions that less work has been done with commodity price prediction than in other markets, and the fact that i have had such a hard time finding refs on it bears that out. refers briefly, though apparently seriously, to the idea of using numerical patterns from astrological sources and the fibonacci series... yikes! it also briefly mentions the possibility of using futures for prediction, but then cops out with some easily overcome excuses about how hard it might be. the types of technical analysis that it cites are really dirt dumb simple, though it does justify that by stating that those predictions are only good in the short term anyway, and you're not going to have enough data to fit a complex model reliably.
one very worthwhile part of the book is the table of forecasting approaches on pages 175-179 (heavily adapted from strategic business forecasting, pp 159-164) and their characteristics, including short-, medium-, and long-term accuracies. looks like trend analysis, seasonal adjustments, and box-jenkins type arima models are easy ways to the top of the game.

Saturday, August 14, 2010

chaco, traits callbacks, and garbage collection

i just learned something about traits (from enthought) the hard way.... if you allow the object that has a trait callback defined on it to drop all references to it (other than adding it to a container), the callback won't work. in my case, i had a lasso selection turned on for a chaco plot that was put into a HPlotContainer. it plotted just fine, interaction tools were fine.... but no reaction to the selection.
all i had to do was hang onto a reference to that traits object, and the selection worked. tough, tough bug to find. i feel cheated somehow.

Friday, July 16, 2010

notchup

here's a recruiting/headhunter marketplace jobs site that i should get on my radar (or, rather, on theirs). they use independent talent scouts that scour sites like linkedin and personal contacts to find candidates that might already be employed ('passive' candidates) under the assumption that really good people are very rarely out of a job or trolling the job websites, looking for work. hang out my shingle and let them race to me with offers... now that's the way to get a job.

Wednesday, July 14, 2010

using googlecl with blogger from bash

i think i finally found a good way to post from the cli.

google blogger post --title 'blog post title' --tags 'tag1,tag2' $'multiline content...'

the $ in front of the hard (single) quote lets bash escape single quotes inside that last string, but nothing else. so i can put all kinds of crazy chars in my multiline post, including a backslash-escaped single quote, with impunity. only problem is editing a previous line.... well, now i have an excuse for leaving the typos in.

stock index futures and price forecasting

trying to figure out exactly what numbers financial commentators use to make headlines like, "futures point to higher open..." often they point to specific futures contracts, like SPc1, NDc1, or DJc1. the cme group equity futures quotes seem to be what people are referring to (or at least have equivalent price percent changes). unfortunately, the javascript in those pages that handle the quote updates prevents convenient scraping. options pages like these, otoh, are exceedingly easy to scrape as whitespace-delimited plain text. and with the eom (end of month, european-style) options, the price/probability is fairly simple. if i really need to get at the futures contract prices, i could use bloomberg, which just has static html tables (with major us and world indices on one page). back on the cme group site, i couldn't find quotes for options for oil, and even the time and price transactions list is another inaccessible js job. however, if i only care about end-of-day data, there are a lot in the ftp server (see the settle/ dir, for example) and other links from the volume:volume & open interest tab of one of the futures pages. daily settlement prices (near the bottom) has links for interest rates, equities, (agricultural) commodities, precious and industrial metals (in comex), oil and energies (in nymex and cme clearport clearing), and fx. the volume by price data on that page might be particularly useful for building distributions. it combines total volume for the day at a each price, and the historical data available on the ftp server goes back a year and a half. the format is a bit obfuscated, but there are links to one-page format descriptions next to the data links.
one other thought about using derivatives for price forecasting/asset valuation: the price of the derivative depends not only on what people believe about the future price, but also on what everyone knows about the current price. so, if i had intraday data for the derivative price, i should subtract out the effect of the underlying price to get the future price. i wonder if ica on the log derivatives would achieve this... obvious maybe, but worth noting so i don't forget.

more python libs

rthread exposes a threading interface on remote processes, if you are into that kind of thing. \
\
struct (standard library) interprets a string as a c structure.\
\
timeit (standard library) for quick and dirty timing of a command string, like the ipython magic and tic,toc in matlab.

Tuesday, July 13, 2010

python from excel

xlrd, xlwt, and xlutils are fine for accessing excel files from python, but what about accessing python from excel (like calling out to vba hooks to define user functions)? pyxll seems to do what i would want (even claims numpy, ctypes, and cython support!), but the license seems very restrictive. if there is nothing better, it is probably the way to go. i might also check out the ironpython hook (http://www.ironpython.info/index.php/Interacting_with_Excel) and discoveryscript (http://www.xefion.com/discoveryscript.html). discoveryscript is free, but i think its license will also restrict redistribution (and the other products on the website are in the $500/licence range), so that might be problematic.

Monday, July 5, 2010

clyther

tool for generating opencl code directly from python, as an alternative to cuda and pycuda. might be a little alpha at the moment, but i should keep tabs....

ubuntu one

can't believe i haven't signed up for this yet. is there any downside? i guess i could use cryptoloop on all my machines that link to it if i'm paranoid about privacy. but that would prevent me from accessing files on the web.

googlecl

google released command-line access. now i can post (like this one) from my tilda window.

Friday, July 2, 2010

tripadvisor for cheap hotels

tripadvisor.co.uk has a nice feature -- it searches a number of other sites for cheap rates on the same hotel. helps prevent non-refundable clicker's remorse.

coroutines in python

wow, am i behind the times or what? i didn't realize python 2.5 added real coroutine support by making yield an expression. now i can call the send method on a generator and it will pop in as the evaluated result of the yield.
also the webbrowser module can pop up a browser window (or tab). handy for showing docs.
i'd only just started using with. i need to stay up to speed on these things.

test coverage in python

i think it's about time for me to abandon the half-home-baked test coverage tool i have been using in favor of coverage.py by ned batchelder. nice html reports (easy to lauch view, now that i have google-chrome) and i'm already using html reports generated by cython -a. another possibility is canopy and instrumental. uses ast tools to analyse code; maybe useful for other things, too.

tahoe-lafs

decentralized secure data store. if i ever need cloud storage, this might be a good alternative to cryptoloop.

Thursday, July 1, 2010

3d pdf animation

was looking into adding animation to my 3d pdf objects again. it's really hard to find any useful info on the newer prc format (the best i could find was a low level adobe tech support guy whining about the iso application taking a long time). u3d has at least some skin and bone based animation, beyond the rigid body + scale part-level animation that is useless to me (every reference to key frame animation, afaik, refers to this). but i'm not sure if acrobat or the adobe reader support that part of the u3d ecma standard. this page (google cache) from right hemisphere seems to say that it doesn't, but that might refer just to their deep exploration code. at any rate, i think they were one of the founding parties to the standard, so it's not promising if they never bothered. somewhere i found an unreferenced statement that everything supports ecma 1st edition, but acroread 7 doesn't support ecma 3rd edition.
however, the JavaScript for Acrobat 3D Annotations API Reference doc on the adobe website describes a Bone object and specifically says it 'is usually moved over time to create animated characters.' Mesh objects contain geometry, but nothing in the api doc looks like it could be actual vertex positions, unless they're lurking in the Node.metadataString. i can find no additional info on Bone.
the U3D Supported Elements doc from the 3d dev center states that skeleton descriptions are 'parsed and unused (that is, no skeletal animation, skinning, or bones)'
also, opacity and material color animation are not supported, and i'm pretty sure that's a basic u3d limitation.
it's hard to find much info about the newer prc format. i don't see anything in the adobe spec docs that obviously provides for it (though not much anything obvious in the specs... quite sparse). i think the best i can do right now is look at the asymptote code (there was a passing reference to animating the prc in one of their journal articles as a todo item; alas, the only animation they show in that article is an ordinary movie). rumor has it that adobe turned the prc implementation over to a third party, bailing again on a drive into the 3d market, so i'm not sure prc has a future any brighter than u3d, which they have said (fwiw) they will always support.
so the only way out i can see right now is to create multiple meshes, one for each time step, and use javascript to cycle the visibility toggles. i guess the only real wasted space comes from the redundant connectivity, and that might be manageable if i continue to use only the exterior surfaces. and it would allow me to circumvent the u3d material color animation limitation. a rough estimate is 6.5 bytes/exterior node for u3d file size. actually, maybe that's not too bad... a 5 second, 20 fps animation with 8000 exterior nodes would fit in under 5MB. not too big for a short report or presentation. (maybe ~10 fps is a more realistic limit on what acroread can achieve.)
a little playing around with the quality options: diffuse color quality moved down from 300 to 100, -dcq 100, make practically no difference in file size. specular color quality, -spq 100, and animation quality 1000 -> 100, -aq 100, made zero difference since i'm not using either. geometry default quality down from 300 to 100, -gq 100, makes no difference, as does texture quality 75 -> 50, -tq 50. texture coordinate quality 300 to 100, -tcq 100, and normal quality 300 to 100, -nq 100, make zero difference. the one thing that does change the file size significantly is position quality. i had been using -pq 500, and -pq 100 decreases the size by about 7-8%. and maybe decreasing the palette from 256 to, say, 16 colors would help, though it would surely be less than a 10% difference. so i don't think there's any way i could squeeze it down more than 20-25%.

pypy 1.3

pypy now has stackless, jit, sandbox, and low-memory versions out in 1.3. and support for c extension modules is coming along. and, as part of one of this year's gsoc projects, they are working on adding numpy to pypy. cool!

Tuesday, June 29, 2010

another python debugger

pudb. text-based but looks easy to use.

a bit less temporary /tmp

hat-tip to pycuda, i got my ubuntu machine to relax about wiping my temp dir on every reboot:

On Debian (and possibly Ubuntu?), edit the file /etc/default/rcS and change

TMPTIME=0

to the number of days that you'd like to keep files in /tmp around. "30" works for me.

now i won't be so paranoid about installing kernel updates right away.

gpu programming with python

theano seems to have an advanced api, but it might be too advanced since it uses its own optimization and other magic goodies, and it seems so focused on its own objects that it's like a metaprogramming language. still, dangerously interesting. uses cuda, so nvidia only. here's some free advice from their tutorial:
Only computations with float32 data-type can be accelerated. Better support for float64 is expected in upcoming hardware but float64 computations are still relatively slow (Jan 2010).
Matrix multiplication, convolution, and large element-wise operations can be accelerated a lot (5-50x) when arguments are large enough to keep 30 processors busy.
Indexing, dimension-shuffling and constant-time reshaping will be equally fast on GPU as on CPU.
Summation over rows/columns of tensors can be a little slower on the GPU than on the CPU
Copying of large quantities of data to and from a device is relatively slow, and often cancels most of the advantage of one or two accelerated functions on that data. Getting GPU performance largely hinges on making data transfer to the device pay off.
the same website has a link to cudamat, which might be a more cooperative if lower-level way to go. target seems to be basic matrix and element-wise ops. actively developed.
pystream was developed as another cuda wrapper until about mid 2008, then abandoned when the company went off to develop gpulib, a cuda api for idl and (?) matlab.
pycuda handles the background stuff, but you still have to feed it c code (though there are tools for run-time code generation). looks like it's actively developed, though, with an impressive list of users, and like the others it does play nicely with numpy arrays. i think this is the place to start.
pygpu uses pycg and pyglew to generate cg code directly from python. so you write python and it will run on nvidia or ati hardware, under both linux and windows. unfortunately, neither the homepage nor the google code page show any signs of activity in the last few years. pycg (developed by the same guy) seems to have trickled off in late 2007, though ubuntu packages were uploaded to launchpad just a year ago. too bad, this looked like it might have been a good one.
the gpu stuff on scikits oddly seems intended for actual graphics stuff.
pyopencl ? here's a faq page contrasting cuda and opencl.
pycublas maybe just does matrix mult.

bond spread data

fred (research.stlouisfed.org) has a lot of economic indicator historical data, including moody's bond yields, but only for aaa and baa. i'd like to find something for more junky bonds....
moodys.com certainly would have these data, along with others i'd like to see. but registration is required and i don't know if that stuff is available for free.

Monday, June 28, 2010

Best Practices in Estimating the Cost of Capital: Survey and Synthesis

Robert F. Bruner, Kenneth M. Eades, Robert S. Harris, and Robert C. Higgins
nice peek at the popularity of various financial analysis techniques. just a bit dated now, as it came out in 1998, but still worth a look.

tea party != conservative populism

jeffrey friedman makes some good points about why the tea party leaders should not assume (or maybe even aim for) a conservative populism. here's the most sobering stat for me:
An April Rasmussen survey found that only 60 percent of Americans now believe that capitalism is better than socialism. Among those under 30, socialism and capitalism are nearly tied at 33 percent and 37 percent.

oil spill -> green pork

charles krauthammer lets fly a critique of obama's face-the-nation. seems a bit harsh for ck, but he does an effective job of deconstructing the basic argument.

obama vs. science

jonah goldberg makes an interesting connection between the anti-science claims against bush and the drilling moratorium. a minor point, perhaps, but one to put on the record.

portfolioscience

here's a company that sells software for portfolio optimization and risk analysis. maybe i should fill out their form and check out the demo some time, just to see how they do it and which metrics seem to be featured. in particular i'm curious about the riskapi efficient frontier optimizer.
aorda is another one, started circa 2006 by a ufl prof who was one of the early proponents of cvar. free download for crippleware, but have to register first. pay versions are very highly priced: commercial license is $10k/year.

why defend bp?

why are some on the right defending bp? rich lowry has a great editorial up on nro making the point that we don't have a dog is this fight while some republicans are determined to get bitten.

Wednesday, June 23, 2010

15.535

looking at the assignments and exams, i think i understand the concepts just fine, although the intricacies of interpreting how management might be manipulating their accounting numbers are not easy.
2
http://mit.edu/wysockip/www has useful stuff but doesn't have all the stuff from class anymore
peg ratios, often cited
3
cash flows over firm's life cycle
trend analysis: cfo vs ebx
red flags: growing discrepancy between net income and cash flows
undervalued liabilities, overcapitalization
investment activity
key: proceeds from exercise of stock options. good?
firm type: growth options vs assests in place
tech, growth: not much depreciation, financing primarily related to equity
airlines: cfo large compared to net income, even in loss years; large depreciation, investing; debt financing
retailers: walmart has large difference between cfo and net income
4
problems with residual income valuation
p/e or m/b with real options?
5
abnormal earnings with dcf (discrete cash flows)
what do analysts use? refs asquith et al., 2001
earnings multiple 99%
p/e 97
relative p/e 35
revenue multiple 15
price-to-book 25
cf multiple 13
dcf 13
eva 2
'model' 4
estimate price multiples for comparable firms avg/median/etc. why not use distro?
if current earnings are not good prediction for future: forward p/e or pro forma earnings (remove non-recurring) or price to operating cash flow
other p/e: peg, p/cf, levered, (debt+equity)/ebitda
m/b market to book
stock screener links
profitability: roa (return on assets)
roa decomposed into profit margin and asset turnover
roe (return on common equity)
roe decomposed into profit margin, turnover, leverage
short term liquidity
current ratio = current assets/current liabilities: short-term debt paying ability
quick ratio = (current assets-inventory)/current liabilities: acid test ratio
long-term solvency
long term debt ratio = long term debt/(long term debt+shareholder's equity)
d/e = long term debt/shareholders' equity
total liabilities/total assets
7
forecast eps goes down the last 6 months before release due to expectations management
8
detecting earnings management
ratio of volatility (stddev/mean) of accrual income measures to underlying volatility of sales and cfo
12
risk assessment
turnover: accounts receivable turnover, inventory turnover, fixed asset turnover, accounts payable turnover, days payable outstanding
short-term liquidity: current ratio, quick ratio (acid test), operating cash flow to current liabilities
long-term solvency (maybe a good way to value bonds?): debt/equity, long-term debt ratio (simple function of d/e), liabilities/assets
interest coverage ratio, in terms of both income and expenses or cash flow
refs modigliani-miller theorem without explaining: debt and equity financing are equivalent
absolute metrics: interest coverage, current ratio
13
cost of capital
equity cost of capital (discount rate)
capm: estimate beta (key issue) period typically 5 years; bloomberg, analysts, yahoo finance, etc
http://research.stlouisfed.org/fred/data/irates.html for risk-free rate and other data
fama-french 3-factor model extends capm with size, b/m (higher b/m->higher returns)
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html for rates, other data
long run averages: r_m-r_f (market-riskfree) 7.95% per year, r_smb (size premium) 3.32%, r_hml 5.05%
international
segmented/integrated capm: bekaert and harvey 1995
world capm holds if country stock market is integrated: http://www.msci.com/equity/index.html
ow, use r_country
'institutional investor' magazine ranks country credit risk 0-100
impressive fit to data: r_country = alpha + beta*rank
15
expected return depends on systematic risk
alpha = abnormal return = actual return - capm, for example
multiples valuation key assumption: earnings and book equity are comparable
drift strategies
returns over last 6-12 months predict next 6-12 months
post earnings announcement drift from under-reaction to news
red flag: again, gap between reported income and cfo
quality of earnings ratio: (earnings-cfo)/avg total assets
'widely accepted' evidence on fundamental trading strategies
e/p, b/m, cf/p: high->high future abnormal stock returns
var(cf)/p: high->low future abnormal stock returns
v/p (firm value from abnormal earnings model/price): high->low returns
short term reversal: high return this month->low next month
medium term momentum: high return past 6-12 months->high return next 6-12
accrual anomaly: high accounting accruals this quarter->low returns next quarter and beyond
16
bankruptcy detection
http://www.ibbotson.com/content/cc_1v11.asp cost of capital:$15/beta
altman z-score fit from manufacturing firm data
linear function of ratios
moody's, s&p use similar models to z-score to rate corp bonds
http://riskcalc.moodysrms.com/us/research/crm/45768.pdf
http://riskcalc.moodysrms.com/us/research/defrate.asp
17
mergers and acquisitions
'old' purchase method: goodwill asset created and amortized over 40 years
pooling of interests no longer permitted for valuing
18
employee stock options
20
off balance sheet activities
enron background
21
pension plans
defined benefit plans cause accounting problems
22
international financial analysis
insider (code law) codified system
close interplay gov, banks, unions, big firms
continental euro, japan
less public disclosure
outsider (common law)
us, uk, english-speaking
us vs uk differences
23
sarbanes-oxley and review
sarbox 2002
identify comparable firms
multex (?) via yahoo for quick industry benchmarks
will change: accounting rules, tech, market integration, contracting methods
won't change: thought process, economics

guild wars

skills at the end of nightfall:
n: meekness, well of dark
p: harrier's toss, never surrender, stand your ground

Wednesday, June 16, 2010

memory profiling with python

tough to find good memory profiling for python. heapy-pe and the other (didn't bother to remember the name; pysizer?) turned out to be no good to me with numpy arrays which (surprise!) tend to be the biggest data structures i deal with. here a couple of others to try some time:
meliae is new and more cli-oriented, but looks easy enough to try (and script).
dowser spawned off of cherrypy, but i think it works for any python code with the web server as sort of a gui (i think unlike dozer, which targets wsgi apps. or maybe dozer is just a wsgi version of the 'gui'?).

here's an example of objgraph to analyse memory usage.
i think these are more garbage collector approaches, rather than hook-and-trace, so maybe more likely to work with libs like numpy.

memory_profiler also comes recommended and looks interesting. pure python, so portable and hackable.

Tuesday, June 15, 2010

r-cran-fimport

ubuntu has a package for downloading free econometrics data: provides import function to access (free) data from Economagic, the US Federal Reserve, Forecasts.Org, Yahoo and other web sources. worth a look to see some sources that would be good to look at. the group of people who wrote this also have a link to a brief discussion of portfolio risk surfaces over the convex hull of achievable sets. interesting... i was thinking something along the same lines, and it's a little gratifying that working pros seem already to be doing something similar.

__get__ method for fun and profit

just learned (or maybe relearned) something cool about python: the __get__ special method gets called when an instance is accessed as an attribute of another instance. not only are there potential uses for this, it also holds the key to understanding the 'self' and 'class' special arg in methods. this is something that confused me a couple of times before, such as passing references to instance vs. class methods from outside the class to be used inside the instance.
so, for example, i could allow instances of one of my classes to know how and where it's getting passed around, and something about the context when something is asked of it. maybe a quick and dirty memory leak tracker, when i know beforehand which objects are the big boys but i don't know who's pointing at them.
or maybe a little internal usage auditor, when i'm considering the impact of a refactor.

Monday, June 14, 2010

valuation books

couple of books recommended by people in the valuation business. one comment about duffie is that he was kind of a disappointment as a consultant, since he likes to stay more in the theoretical than the practical. not sure if this is the best book from him, but it's fairly recent. (search for 'dynamic')
Investment valuation : tools and techniques for determining the value of any asset
Damodaran, Aswath.
interesting that he says most analysis/justification for valuation is on discounted cash flows (as it seems to be in the book), but most valuation in practice is with ratios in relative valuation. contingent claim valuation is a more recent perspective, looking at opportunities available to a firm and pricing them like options. i was disappointed at how little there is on bond and commodity valuation, especially given the promise in the subtitle. interesting chapter on evidence of market efficiency.
Credit risk : pricing, management and measurement
Duffie, Darrell.
financial statement analysis and security valuation, 3rd ed
stephen h. penman
658.15
more on valuation, including slightly less than simple forecasting and detecting financial statement manipulation
dynamics of markets: econophysics and finance
joseph l. mccauley
658.15:519.217
empirical refutation of common modeling assumptions
value at risk: the new benchmark for managing financial risk, 3rd ed
philippe jorion
658.155
different types of risk, some 'industry-standard' real-life practical-experience rules of thumb

Tuesday, June 8, 2010

mplayer dump

tried to record a lecture in realmedia format with mplayer, but it keeps dying in the middle. cache seems to help it get farther, but it still chokes. maybe i just need to give it a _big_ cache, or use the -cache-min option:
mplayer -audiofile-cache 8192 -cache 8192 -dumpstream -dumpfile out.rm rtsp://etc

encrypted pdf files

managed to get a pdf file with restricted permissions using pdftk (pdf toolkit). the man page was a bit unclear on this, since if you just put in an owner password it will not actually restrict the file and no passwd is requested (or needed) to open it. this will do the trick, with all restrictions:
pdftk in.pdf output out.pdf user_pw foo
or, to make 2 levels of access:
pdftk in.pdf output out.pdf owner_pw baz user_pw foo
now the restrictions will be in place if you put in foo as the passwd, but they will not if you put in baz.

redirecting stdin, stdout, stderr from python

saw a nice comment in a post on python daemon forks that explains how to redirect std* pipes, even when they are accessed from c. i've been frustrated trying to figure that out before.

More reliable i/o stream redirection. Just reassigning to the sys streams is not 100% effective if you are importing modules that write to stdin and stdout from C code. Perhaps the modules shouldn't do that, but this code will make sure that all stdin and stdout will go where you expect it to.

import os, sys

out_log = file('/out/log/file/name', 'a+')
err_log = file('/err/log/file/name', 'a+', 0)
dev_null = file('/dev/null', 'r')
sys.stdout.flush()
sys.stderr.flush()
os.dup2(out_log.fileno(), sys.stdout.fileno())
os.dup2(err_log.fileno(), sys.stderr.fileno())
os.dup2(dev_null.fileno(), sys.stdin.fileno())
(and another poster suggests closing sys.std* before duping.) cool. i need to remember this next time i wrap somebody's code that thinks it's a good idea to barf on the terminal without a --quiet option.
also, the demon implementation looks pretty clean, with extra tidbits sprinkled into the comments, and the author explains the reasons for doing things. i don't think i will need the double fork for my udev script, but the first fork will be necessary.