looks like amazon gives you up to a year to try out some free cpu time on their cloud computing nodes.
http://aws.amazon.com/free/terms/
also, their spot instances allow you to bid for time, rather than paying the fixed on-demand rates. looks like the discount is significant, if you can handle the unpredictability. nice example of running a jenkins build slave on spot, too. also refs princeton consultants and their optispotter, which helps smallish ($50mil) hedge funds find hft opportunities.
http://aws.amazon.com/ec2/spot-instances/
http://www.youtube.com/watch?v=-vAAuTs9iu4
picloud still looks like a good way to get started. can do fractional hours, and prices are comparable to ec2 on-demand. they allow you to create an environment on a virtual ubuntu, so you can install whatever you need as if you had a local filesystem.
http://aws.typepad.com/aws/2012/12/picloud-and-princeton-consultants-win-the-first-amazon-ec2-spotathon.html
Showing posts with label python. Show all posts
Showing posts with label python. Show all posts
Friday, May 31, 2013
Wednesday, April 17, 2013
jake vanderplas blog
http://jakevdp.github.io/
blog with lots of interesting python examples and demos. not just code, though; the guy seems plugged in to big-picture developments.
blog with lots of interesting python examples and demos. not just code, though; the guy seems plugged in to big-picture developments.
Saturday, April 13, 2013
python presentation videos
http://vimeo.com/pydata/videos/page:1/sort:date
http://pyvideo.org/category
no need to attend pycon or pydata, or no need to take notes if you do.
http://pyvideo.org/category
no need to attend pycon or pydata, or no need to take notes if you do.
Friday, April 5, 2013
ipython parallel and acyclic graphs
heard an interesting tidbit from a talk by brian granger. apparently the ipython parallel kernel has the ability to take acyclic graph dependencies and intelligently distribute the computation. i need to look into that.
Saturday, November 17, 2012
sobol sequences and python
tried it, seems to work just fine.
https://github.com/naught101/sobol_seq/blob/master/sobol_test_output.txt
http://people.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html
https://github.com/naught101/sobol_seq/blob/master/sobol_test_output.txt
http://people.sc.fsu.edu/~jburkardt/py_src/sobol/sobol.html
modular toolkit for data processing
http://mdp-toolkit.sourceforge.net/
interesting project with a number of capabilities. python code for pca, ica, slow feature analysis, manifold learning methods ([Hessian] local linear embedding), classifiers, factor analysis, rbm, etc.
according to the 'intro to scipy' talk at pydata 2012, it has the fastest pca available in python (even if the interface is more difficult than scipy svd or sklearn.decomposition.PCA).
interesting project with a number of capabilities. python code for pca, ica, slow feature analysis, manifold learning methods ([Hessian] local linear embedding), classifiers, factor analysis, rbm, etc.
according to the 'intro to scipy' talk at pydata 2012, it has the fastest pca available in python (even if the interface is more difficult than scipy svd or sklearn.decomposition.PCA).
numba and cython
interesting comparison between numba and cython (and pure python). both projects i want to keep and eye on.
http://jakevdp.github.com/blog/2012/08/24/numba-vs-cython/
http://jakevdp.github.com/blog/2012/08/24/numba-vs-cython/
Friday, October 26, 2012
nuitka
http://www.nuitka.net/pages/overview.html
interesting project that generates c++ from python. not sure if it's ready for me to grab and use, without becoming a dev. but it's one to keep an eye on and file away with numba, shedskin, cython, py2exe, etc.
interesting project that generates c++ from python. not sure if it's ready for me to grab and use, without becoming a dev. but it's one to keep an eye on and file away with numba, shedskin, cython, py2exe, etc.
Friday, June 22, 2012
continuous integration
recently decided it was time to stop putting off trying continuous integration for software development. (i'm only a decade behind the times; not bad.)
since i mostly use python, i had to look at buildbot. apache gump and cruisecontrol also seemed like possibilities. but in the end i tried hudson since i'd read it was easy to set up and use, and it really was. all i had to do was download the war file and run
java -jar .\hudson.war *>output.txt
(i had to redir output so the blocking to console wouldn't make it wait for me to scroll or press a key.)
here are some motivational/informative quotes on ci:
wikipedia:
continuous integration -- the practice of frequently integrating one's new or changed code with the existing code repository -- should occur frequently enough that no intervening window remains between commit and build, and such that no errors can arise without developers noticing them and correcting them immediately.
martin fowler:
continuous integration doesn't get rid of bugs, but it does make them dramatically easier to find and remove. in this respect it's rather like self-testing code. if you introduce a bug and detect it quickly it's far easier to get rid of. since you've only changed a small bit of the system, you don't have far to look. since that bit of the system is the bit you just worked with, it's fresh in your memory -- again making it easier to find the bug. you can also use diff debugging -- comparing the current version of the system to an earlier one that didn't have the bug.
bugs are also cumulative. the more bugs you have, the harder it is to remove each one. this is partly because you get bug interactions, where failures show as the result of multiple faults -- making each fault harder to find. It's also psychological -- people have less energy to find and get rid of bugs when there are many of them...
if you have continuous integration, it removes one of the biggest barriers to frequent deployment. frequent deployment is valuable because it allows your users to get new features more rapidly, to give more rapid feedback on those features, and generally become more collaborative in the development cycle. this helps break down the barriers between customers and development -- barriers which i believe are the biggest barriers to successful software development.
paul duvall, cto, stelligent incorporated:
6 anti patters
infrequent checkins, which lead to delayed integration
broken builds, whcih prevent teams from moving on to other tasks
minimal feedback, which prevents action from occurring
receiving spam feedback, wich causes people to ignore messages
possessing a slow machine, which delays feedback
relying on a bloated build, which reduces rapid feedback
since i mostly use python, i had to look at buildbot. apache gump and cruisecontrol also seemed like possibilities. but in the end i tried hudson since i'd read it was easy to set up and use, and it really was. all i had to do was download the war file and run
java -jar .\hudson.war *>output.txt
(i had to redir output so the blocking to console wouldn't make it wait for me to scroll or press a key.)
here are some motivational/informative quotes on ci:
wikipedia:
continuous integration -- the practice of frequently integrating one's new or changed code with the existing code repository -- should occur frequently enough that no intervening window remains between commit and build, and such that no errors can arise without developers noticing them and correcting them immediately.
martin fowler:
continuous integration doesn't get rid of bugs, but it does make them dramatically easier to find and remove. in this respect it's rather like self-testing code. if you introduce a bug and detect it quickly it's far easier to get rid of. since you've only changed a small bit of the system, you don't have far to look. since that bit of the system is the bit you just worked with, it's fresh in your memory -- again making it easier to find the bug. you can also use diff debugging -- comparing the current version of the system to an earlier one that didn't have the bug.
bugs are also cumulative. the more bugs you have, the harder it is to remove each one. this is partly because you get bug interactions, where failures show as the result of multiple faults -- making each fault harder to find. It's also psychological -- people have less energy to find and get rid of bugs when there are many of them...
if you have continuous integration, it removes one of the biggest barriers to frequent deployment. frequent deployment is valuable because it allows your users to get new features more rapidly, to give more rapid feedback on those features, and generally become more collaborative in the development cycle. this helps break down the barriers between customers and development -- barriers which i believe are the biggest barriers to successful software development.
paul duvall, cto, stelligent incorporated:
6 anti patters
infrequent checkins, which lead to delayed integration
broken builds, whcih prevent teams from moving on to other tasks
minimal feedback, which prevents action from occurring
receiving spam feedback, wich causes people to ignore messages
possessing a slow machine, which delays feedback
relying on a bloated build, which reduces rapid feedback
Sunday, April 1, 2012
my own doctest runner
#!/cygdrive/c/Python27/python -i 'c:\crunch6SVN\python\pyTest.py' #!/cygdrive/c/Python26_64/python 'c:\crunch6SVN\python\pyTest.py' #!/usr/bin/env python #!/usr/local/python/bin/python # this works from powershell, but not from xterm or within spyder: # C:\Python27\python ..\..\pyTest.py .\scanCoverage.py -g # i think spyder puts in some trace hooks into pdb of its own import sys import os #import ipdb #dbg = ipdb.set_trace from pdb import set_trace as dbg def lineProfile(runStr,runContext={},module=None,moduleOnly=False): # with the run string set up, i can use cProfile to find the worst offenders import cProfile,pstats import line_profiler import sys prof = cProfile.Profile() #r = prof.runctx(runStr,{},{'p':p,'sout':sout}) r = prof.runctx(runStr,{},runContext) # maybe use prof.dump_stats() to spit out to a file r = pstats.Stats(prof).strip_dirs().sort_stats('time').print_stats(5) #get line profiling on top 3 time hog functions ss = pstats.Stats(prof).sort_stats('time') def b(fn): return fn.rstrip('.py').rstrip('.pyc') #### rstrip takes or if moduleOnly: # only show functions in this file hogs = [f[2] for f in ss.fcn_list if b(f[0])==b(__file__)][:3] ts = [ss.stats[f][2] for f in ss.fcn_list if b(f[0])==b(__file__)][:3] else: #hogs = [f[2] for f in ss.fcn_list][:3] hogs = ss.fcn_list[:3] ts = [ss.stats[f][2] for f in ss.fcn_list][:3] fts = [t/ss.total_tt for t in ts] # ignore any functions beyond what accounts for 80% of the time for i in range(len(fts)): if sum(fts[:i])>.8: break hogs,ts,fts = hogs[:i],ts[:i],fts[:i] hogs.reverse();ts.reverse();fts.reverse() # i want longest time last # can't line prof builtins, so take them out of the list fts = [f for f,h in zip(fts,hogs) if not h[0]=='~'] ts = [t for t,h in zip(ts,hogs) if not h[0]=='~'] hogs = [h for h in hogs if not h[0]=='~'] # this probably won't work in pyTest: #fs = [[getattr(x,h) for x in locals().values() if hasattr(x,h)][0] #fs = [[getattr(x,h) for x in sys.modules.values() if hasattr(x,h)][0] # pstats only saves module filename, so match files and search within them # rstrip for .pyc, .pyo modules = [x.__file__.rstrip('oc') for x in sys.modules.values() if hasattr(x,'__file__')] indices = [modules.index(h[0].rstrip('oc')) for h in hogs] modules = [x for x in sys.modules.values() if hasattr(x,'__file__')] hogMods = [modules[i] for i in indices] # find functions/methods within module # only searches down one level instead of a full tree search, so don't # get too crazy with deeply nested defs fs = [] for ln,h,m in zip(*zip(*hogs)[1:3]+[hogMods]): #import pdb;pdb.set_trace() if hasattr(m,h) and hasattr(getattr(m,h),'__code__') and getattr(m,h).__code__.co_firstlineno == ln: fs.append(getattr(m,h)) else: for a in [getattr(m,x) for x in dir(m)]: if hasattr(a,h) and hasattr(getattr(a,h),'__code__') and getattr(a,h).__code__.co_firstlineno == ln: fs.append(getattr(a,h)) break #fs = [[getattr(x,h) for x in runContext.values() if hasattr(x,h)][0] # for h in hogs] lprof = line_profiler.LineProfiler() for f in fs: lprof.add_function(f) #stats = lprof.runctx(runStr,{},{'p':p,'sout':sout}).get_stats() stats = lprof.runctx(runStr,{},runContext).get_stats() for ((fn,lineno,name),timings),ft in zip(sorted(stats.timings.items(),reverse=True),fts): line_profiler.show_func(fn,lineno,name,stats.timings[fn,lineno,name],stats.unit) print 'this function accounted for \033[0;31m%2.2f%%\033[m of total time'%(ft*100) #import pdb;pdb.set_trace() # monkey patches to allow coverage analysis to work # just a little disturbing that (as of 2.4) doctest and trace coverage # don't work together... def monkeypatchDoctest(): # stolen from http://coltrane.bx.psu.edu:8192/svn/bx-python/trunk/setup.py # # Doctest and coverage don't get along, so we need to create # a monkeypatch that will replace the part of doctest that # interferes with coverage reports. # # The monkeypatch is based on this zope patch: # http://svn.zope.org/Zope3/trunk/src/zope/testing/doctest.py?rev=28679&r1=28703&r2=28705 # try: import doctest _orp = doctest._OutputRedirectingPdb class NoseOutputRedirectingPdb(_orp): def __init__(self, out): self.__debugger_used = False _orp.__init__(self, out) def set_trace(self): self.__debugger_used = True #_orp.set_trace(self) pdb.Pdb.set_trace(self) def set_continue(self): # Calling set_continue unconditionally would break unit test coverage # reporting, as Bdb.set_continue calls sys.settrace(None). if self.__debugger_used: #_orp.set_continue(self) pdb.Pdb.set_continue(self) doctest._OutputRedirectingPdb = NoseOutputRedirectingPdb except: raise #pass return doctest def monkeypatchTrace(): import trace try: t = trace.Trace class NoDoctestCounts(t): def results(self): cs = self.counts newcs = {} # throw away 'files' that start with= 2.6 will not allow import by filename # i should refactor the whole thing to use imp module sys.path.insert(0,os.path.dirname(n)) n = os.path.splitext(os.path.basename(n))[0] if not n.startswith('-'): if True:#try: if debug: # __import__ needs a non-empty fromlist if it's a submodule if '.' in n: try: m = __import__(n,None,None,[True,]) except ImportError: # just run doctests for an object modName = '.'.join(n.split('.')[:-1]) #objName = n.split('.')[-1] m = __import__(modName,None,None,[True,]) #doctest.run_docstring_examples(m.__dict__[objName],m.__dict__,name=objName) doctest.debug(m,n,True) import sys sys.exit() else: m = __import__(n) for i in m.__dict__.values(): import abc # if it's a class (from a metaclass or metametaclass) or function if type(i) == type or type(i) == abc.ABCMeta or \ (type(type(i)) == type and hasattr(i,'__name__')) \ or type(i) == type(lineProfile): try: print 'Testing',i.__name__ doctest.debug(m,n+'.'+i.__name__,True) except ValueError: print 'No doctests for', i.__name__ else: import pdb if coverage: #### need a better way to get module filenames without # importing them. (after initial import, the class and # def lines will not be executed, so will erroneously # be flagged as not tested.) #d,name = os.path.split(m.__file__) d,name = '.',n #bn = trace.fullmodname(name) bn = name.split('.')[-1] # ignore all modules except the one being tested ignoremods = [] mods = [trace.fullmodname(x) for x in os.listdir(d)] for ignore,mod in zip([bn != x for x in mods], mods): if ignore: ignoremods.append(mod) tracer = trace.Trace( ignoredirs=[sys.prefix, sys.exec_prefix], ignoremods=ignoremods, trace=0, count=1) if '.' in n: tracer.run('m = __import__(n,None,None,[True,])') else: tracer.run('m = __import__(n)') tracer.run('doctest.testmod(m)') r = tracer.results() r.write_results(show_missing=True, coverdir='.') else: # __import__ needs a non-empty fromlist if it's a submodule if '.' in n: try: m = __import__(n,None,None,[True,]) except ImportError: # just run doctests for an object modName = '.'.join(n.split('.')[:-1]) objName = n.split('.')[-1] m = __import__(modName,None,None,[True,]) doctest.run_docstring_examples(m.__dict__[objName],m.__dict__,name=objName) import sys sys.exit() else: #import pdb; pdb.set_trace() m = __import__(n) # dangerously convenient deletion of any old coverage files try: os.remove(trace.modname(m.__file__)+'.cover') except OSError: pass # need to call profile function from the doctest # so that it can set up the context and identify the run string, because anything not passed back will get garbage collected # and there's no way to pass anything back # but how can i call something within pyTest from the doctest string? some kind of callback? # i want pyTest to decide if it gets called, so i can switch from the command line doctest.testmod(m) if profile: runStr,runContext = m._profile() lineProfile(runStr,runContext,m) else:#except Exception,e: print 'Could not test '+n print e raise e q = quit from sys import exit as e
Wednesday, January 11, 2012
debugging c++ extensions to python
trying to debug a python extension module written in c++ (wrapped with swig). i think this would be so much easier if i were using gcc, but python is built with msvc... setup.py wants the debug versions of python libs, but i don't have them and don't really want to try to build python from scratch right now.
these refs seem relevant:
http://www.velocityreviews.com/forums/t677466-please-include-python26_d-lib-in-the-installer.html
http://vtk.org/gitweb?p=VTK.git;a=blob;f=Wrapping/Python/vtkPython.h;h=9d01ac21bafae0a24252398f268b6b3563df62cd
Wednesday, September 14, 2011
python/c++ with microsoft visual c++
finally got a 64 bit pyd python extension working, compiled with ms visual c++ and visual studio for epd on windows. figured out that the version string in python (MSC v.1500 64 bit (AMD64)) was for visual c++ 2008 == v9.0, and the express edition only builds 32 bit. so i had to get the sdk (version 7 works with 2008) and make sure the amd64 stuff got installed with it.
some notes say to install the service pack before the 64 bit stuff, but i didn't find this necessary.
i ran the 'Windows SDK Configuration Tool' from the start menu, since it sounded logical, and ticked the box to link the sdk with VC 2008. not sure if that was necessary or not.
one change i had to kludge manually was changing the references in C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\vcvarsall.bat under the amd64 label. originally they were pointing to "%~dp0bin\amd64\vcvarsamd64.bat"; they need to be "%~dp0bin\vcvars64.bat". at that point, running vcvarsall.bat amd64 should work and a simple use of weave passes:
import scipy.weave as w
c = w.inline(r'printf("hi.");',verbose=2)
now that the compiler, etc., are set up i can use swig and distutils to build bigger extensions (like in the swig docs), and it Just Works!
Saturday, May 7, 2011
karnickel: macros in python
not sure when exactly a macro would be useful in python. i remember seeing such a thing in some cython code, to deal with a c++ template, though karnickel deals with the python ast so probably not useful for that. but, there it is, if the need arises.
Thursday, May 5, 2011
python embedded in gdb
version >=7 of gdb has an embedded python interpreter. here's a tutorial on it. very handy if i need to debug c or c++. i'm guessing a million new debugger guis will be built on top of this.
Wednesday, February 16, 2011
Thursday, January 6, 2011
windows python in cygwin
finally solved a problem (or found a workaround, at least) for something that had bothered me for a while: when i tried to use windows python (not cygwin python, which worked fine) in a xterm, it seemed not to be connected to stdout, stderr, and stdin. neither the interpreter nor the debugger prompt would show up, and nothing happened when i used print or sys.stdout.write. the mysterious thing was it would work from a non-x cygwin shell. but i needed mouse action on the desktop and screen (which uses a text-based x windows server) remotely.
turns out the problem is how cygwin interfaces a non-cygwin console app from the terminal. it talks to it through pipes rather than with a real pty, and the issues there are deep and woolly. so all these windows programs are buffering in the pipe, not realizing how impatient i'm getting on the other end.
fortunately, python has an easy workaround. the -i option makes it assume interactivity, skipping the tty check. i can use it on the cli or #! shebang, and now it's working. only problem is it drops me into an interpreter when the script finishes, so i have to type quit() (c-d, c-z, c-c are all ignored).
ref here
parallel, numpy, shared memory,...
trying to figure out how to do parallel processing efficiently with python, and numpy in particular. i want something simple, closely related to the original serial code (sorry, mpi, you're not welcome here).
parallelpython holds some promise, dodging the gil by starting separate interpreters and piping pickles back and forth. similar to pyro, and it looks pretty seemless between smp vs. cluster. unfortunately, pp does not provide for any shared mem so big data (even read only) must be copied (and pickled!) on smp.
multiprocessing is now built in to 2.6 and backported as far as 2.4 or 2.3. doesn't handle remote processes, though the pp/pyro-type pickle server (manager) interfaces with inet ports. i think it basically forks the process to make the worker processes, so you get less overhead (os service vs. cranking up a new python). and there's no need to feed it modules or any other globals; these get copied on the fork. it has some capability to share memory, though i think these are only kinda raw ctype buffers. (i think all of this is similar to the approach posh used, though more generally for user-defined types -- high quality hackery but unmaintained since 2003.) apparently some people have coaxed numpy into using these ctype arrays to make np arrays sit in shared memory land, with views available to the children. (maybe using this sort of thing.) the approach got an attaboy from the big man himself, travis oliphant, but (in the same dir) sturla has a sharedmem module written later (cleaned up and posted here) that looks like it makes lower level sys calls to create shared memory space manually. does that mean the multiprocessing shm is unsatisfactory? the paper does warn that it's a moving target, and the scipy cookbook indicates the same thing: 'this page was obsolete as multiprocessing's internals have changed.' epd has a webinar coming up promising to demo multiprocessing with large arrays, so maybe i should see what they do. anyway, if i do use this for parallel stuff, this blog post might be useful.
here's another page that looks very useful for multiprocessing.
fabric looks interesting, too, though more geared toward sysadmin stuff. maybe similar to posh in some ways.
Thursday, December 23, 2010
dashboard and screen scraping
one thing that's been on my low-priority radar is a way to scrape through the complex flaming hoops that banks, credit cards, and investment brokerages put up so i can have an auto dashboard, showing me account balances and net worth at a glance.
mechanize looks like a nice package for performing many browser functions, including form interaction; probably the best of its kind i've found (and nice faq). however, it does mean writing a browsing session from scratch (read: lots of online debugging) and i'm not sure how well it can handle javascript, frames/windows, and all the other eye candy screen junk these sites like to throw at you.
someone out there recommended pyxpcom (combined with pydom in pythonext) as a way to do anything mozilla can. i think that must be true, since it seems to be just the pieces that mozilla-esque browers are made of. as powerful and difficult to use as a build-your-own-ferrari kit.
i think the most promising option seems to be selenium, which is apparently merging with webdriver for version 2.0. basically drives a real browser, but can record and play back scripts in a variety of languages (including python). the webdriver type of interface seems to be the future of selenium, and it has the advantages of better navigation and less to install. written in java, but i think it can do python (though the docs are behind if so). so i'm not sure if i should just wait for an official release of 2.0, but it does look like selenium is what i'm after. here's the doc on using ide.
EDIT: did some more looking around with selenium, and wow! i love the ide/rc combo. i think i need to look at this blog post to get the most out of locators (css vs. xpath). some of the extra plugins for selenium-ide are worth getting, and the selenium.py module can apparently just be copied into the python path to use selenium-rc. 1.0.11 has firefox 4 support in the ide, but it's very recent (2011-04-12).
they have put out a number of rcs for v2; apparently the v2 release is coming summer 2011. no remote control javascript server is necessary for version 2 since it's integrated with webdriver. i need to know if the ide and python export will still work. right now i think python will work, but no ide yet (though 2.0 is probably backwards compatible so might run the code generated by the version 1 ide).
more selenium links: command locators, xpath/css/dom rosetta, css locators are faster than xpath, good info, stay up to date,good example,
managed to get selenium python bindings installed on a windows machine (not surprisingly, a bit more involved than on linux) with my epd python. had to manually download tar ball, python setup.py install, and manually create the test dir structure that it would then complain about. maybe there's an option to make it skip tests, but the kludge was faster than looking that up. now i have selenium 2 with the webdriver interface, much better than rc! and btw, my experiments confirm what others have said about locators: css is much faster than xpath, even on firefox. i've also found that, while the selenium ide is really good for getting started with the locators, it's often possible to find shorter, more informative, and likely more stable tags and ids by poking around in the html just a little rather than using the first thing that pops up in the ide table. so i'm not going to try to keep a drop-in interface to call into the ide-generated scripts; cut-n-paste of one-liners will be good enough for both dev and maintenance. still, there is tremendous value in starting with something that works, and that alone makes the ide worth the install.
some other things i've learned: the 'andWait' stuff is only relevant from the java interface. in python, there's no way to keep running asynch while stuff is still loading. click, get, etc. only return to the python script once it's fully loaded, so that can be a latency bottleneck. i did poke around and find a possible place to change that, but i'll see if i really need to.
Friday, December 10, 2010
getting spyder and python to work on windows 7
had to struggle to get my python install working with numpy and spyder, probably because i copied them over from another install.
with spyder, i had to move the 2.0.0beta5 egg and give the --prefix option to setup.py in order to install and use 2.0.3.
with numpy, i was getting 'ImportError: DLL load failed: The specified module could not be found.' when i tried to import numpy, _unless_ i was in the Python26_64\Scripts dir. i think it's because of the mkl lapack dlls in there. but it all worked once i added that to my PATH (not PYTHONPATH) with the help of cygwin -- just using export in bash allows me to peek at what it would be in windowsese with os.environ['PATH'] so i could put it into the env editor in spyder. viola!
Monday, November 22, 2010
thinking in c++
both volumes of the 2nd edition are freely available online, as is a draft version of 'thinking in python'. they're all a bit old, though they look like worthwhile reads.
Subscribe to:
Posts (Atom)