Thursday, December 23, 2010

dashboard and screen scraping

one thing that's been on my low-priority radar is a way to scrape through the complex flaming hoops that banks, credit cards, and investment brokerages put up so i can have an auto dashboard, showing me account balances and net worth at a glance. mechanize looks like a nice package for performing many browser functions, including form interaction; probably the best of its kind i've found (and nice faq). however, it does mean writing a browsing session from scratch (read: lots of online debugging) and i'm not sure how well it can handle javascript, frames/windows, and all the other eye candy screen junk these sites like to throw at you. someone out there recommended pyxpcom (combined with pydom in pythonext) as a way to do anything mozilla can. i think that must be true, since it seems to be just the pieces that mozilla-esque browers are made of. as powerful and difficult to use as a build-your-own-ferrari kit. i think the most promising option seems to be selenium, which is apparently merging with webdriver for version 2.0. basically drives a real browser, but can record and play back scripts in a variety of languages (including python). the webdriver type of interface seems to be the future of selenium, and it has the advantages of better navigation and less to install. written in java, but i think it can do python (though the docs are behind if so). so i'm not sure if i should just wait for an official release of 2.0, but it does look like selenium is what i'm after. here's the doc on using ide. EDIT: did some more looking around with selenium, and wow! i love the ide/rc combo. i think i need to look at this blog post to get the most out of locators (css vs. xpath). some of the extra plugins for selenium-ide are worth getting, and the selenium.py module can apparently just be copied into the python path to use selenium-rc. 1.0.11 has firefox 4 support in the ide, but it's very recent (2011-04-12). they have put out a number of rcs for v2; apparently the v2 release is coming summer 2011. no remote control javascript server is necessary for version 2 since it's integrated with webdriver. i need to know if the ide and python export will still work. right now i think python will work, but no ide yet (though 2.0 is probably backwards compatible so might run the code generated by the version 1 ide). more selenium links: command locators, xpath/css/dom rosetta, css locators are faster than xpath, good info, stay up to date,good example, managed to get selenium python bindings installed on a windows machine (not surprisingly, a bit more involved than on linux) with my epd python. had to manually download tar ball, python setup.py install, and manually create the test dir structure that it would then complain about. maybe there's an option to make it skip tests, but the kludge was faster than looking that up. now i have selenium 2 with the webdriver interface, much better than rc! and btw, my experiments confirm what others have said about locators: css is much faster than xpath, even on firefox. i've also found that, while the selenium ide is really good for getting started with the locators, it's often possible to find shorter, more informative, and likely more stable tags and ids by poking around in the html just a little rather than using the first thing that pops up in the ide table. so i'm not going to try to keep a drop-in interface to call into the ide-generated scripts; cut-n-paste of one-liners will be good enough for both dev and maintenance. still, there is tremendous value in starting with something that works, and that alone makes the ide worth the install. some other things i've learned: the 'andWait' stuff is only relevant from the java interface. in python, there's no way to keep running asynch while stuff is still loading. click, get, etc. only return to the python script once it's fully loaded, so that can be a latency bottleneck. i did poke around and find a possible place to change that, but i'll see if i really need to.

Monday, December 13, 2010

scraping off microsoft's look and feel

visually impaired people have at least one thing going for them: the high contrast windows theme is much better than windows standard. and the opticwhite theme for google chrome helps as well. now i don't feel quite so out of place and my eyeballs won't melt.

Saturday, December 11, 2010

virtualbox

holy grail: windows 7 and ubuntu dual boot, with either also running under a vm in the other. i've been looking around for a good virtual machine; some of the old ones seem kinda dead or just emulate (read: way too slow) (bochs, plex86) or are too slow except on linux host or maybe forked off somewhere else (qemu/kqemu or whatever). xen looks pretty good, especially with its capability to run a guest os off a partition. but it only runs on a linux host, so that's only half the answer, and i'm not sure if anyone has gotten it to use an existing windows install. virtual box can run on windows or linux host, and it can run either guest at virtualized native speed. and maybe i can get it to run an existing windows or ubuntu from the other. one problem with booting the oem windows partition is that setting up virtualbox from linux will require bootrec.exe from an install (not recovery) dvd. (colinux might be able to boot an existing install, but colinux is limited to one processor atm.)

Friday, December 10, 2010

getting spyder and python to work on windows 7

had to struggle to get my python install working with numpy and spyder, probably because i copied them over from another install.
with spyder, i had to move the 2.0.0beta5 egg and give the --prefix option to setup.py in order to install and use 2.0.3.
with numpy, i was getting 'ImportError: DLL load failed: The specified module could not be found.' when i tried to import numpy, _unless_ i was in the Python26_64\Scripts dir. i think it's because of the mkl lapack dlls in there. but it all worked once i added that to my PATH (not PYTHONPATH) with the help of cygwin -- just using export in bash allows me to peek at what it would be in windowsese with os.environ['PATH'] so i could put it into the env editor in spyder. viola!

sshd on cygwin

got a new machine with windows 7, and i think i'll have to actually use the windows partition. ugh.
fortunately, cygwin comes with a kajillion unix packages that make microsoft bearable. and i just got my openssh server up and running, so i can still log in from home or elsewhere. here's how:
first, i followed the steps here to get rid of any old failed-attempt kruft. then, even though i did 'run as administrator' on the cygwin bash startup bat, it still gave me warnings when i ran ssh-host-config and tried to use my windows user for running sshd. so i went back and did all the editrights lines. rerunning ssh-host-config (probably unnecessary) gave no warnings, so i started it up with 'cygrunsrv -S sshd' as suggested here (also used the 'tty ntsec' for CYGWIN, as he suggests). and it Just Works.

Monday, December 6, 2010

the ascent of money

just watched a documentary on pbs called 'the ascent of money' (google it, easy to find). nice 4-part series on some of the history of finance and how it ties in with the history of the world. should be required viewing. i hear the book is good, too, but the video is easier for lazy people like me.