Only computations with float32 data-type can be accelerated. Better support for float64 is expected in upcoming hardware but float64 computations are still relatively slow (Jan 2010).
Matrix multiplication, convolution, and large element-wise operations can be accelerated a lot (5-50x) when arguments are large enough to keep 30 processors busy.
Indexing, dimension-shuffling and constant-time reshaping will be equally fast on GPU as on CPU.
Summation over rows/columns of tensors can be a little slower on the GPU than on the CPU
Copying of large quantities of data to and from a device is relatively slow, and often cancels most of the advantage of one or two accelerated functions on that data. Getting GPU performance largely hinges on making data transfer to the device pay off.
the same website has a link to cudamat, which might be a more cooperative if lower-level way to go. target seems to be basic matrix and element-wise ops. actively developed.
pystream was developed as another cuda wrapper until about mid 2008, then abandoned when the company went off to develop gpulib, a cuda api for idl and (?) matlab.
pycuda handles the background stuff, but you still have to feed it c code (though there are tools for run-time code generation). looks like it's actively developed, though, with an impressive list of users, and like the others it does play nicely with numpy arrays. i think this is the place to start.
pygpu uses pycg and pyglew to generate cg code directly from python. so you write python and it will run on nvidia or ati hardware, under both linux and windows. unfortunately, neither the homepage nor the google code page show any signs of activity in the last few years. pycg (developed by the same guy) seems to have trickled off in late 2007, though ubuntu packages were uploaded to launchpad just a year ago. too bad, this looked like it might have been a good one.
the gpu stuff on scikits oddly seems intended for actual graphics stuff.
pyopencl ? here's a faq page contrasting cuda and opencl.
pycublas maybe just does matrix mult.
No comments:
Post a Comment