18
Jan

Python multiprocessing is fucking sweet

   Posted by: Swizec   in Uncategorized

CRAY-1 (no longer used, of course) displayed i...
Image via Wikipedia

You know how it is said every programmer needs to learn how to do parallelisation and funky stuff on multi-core multi-processor beast of machines? And how such machines aren’t even really beasts these days, they’re our run of the mill desktop and portable computers.

This is the world we live in.

It’s getting worse by the hour!

Very soon the first thing a young programmer will hear out of a lecturer’s mouth will be Thread-Safe.

But there’s something we can do about that even today. First of all, we can kiss threading good bye. Sure it’s sweet and yes it sort of works. But ew! It’s like trying to make a marine corps do their job with everyone’s finger in someone else’s arse. That’s the problem with threads you see, they keep picking each other’s arses and noses and then nobody can do any work.

Multi processing to the rescue!

Running an algorithm in several processes is the only thing that makes it run on several processesors in parallel and it gives each process its own memory space and everyone is nicely contained in their own little world. But fuck, now you can’t exactly pick another process’s arse when you need to … like when eating through a common queue of tasks.

And then python’s multiprocessing module, library, thingy, whatever it’s called, comes into play.

It. Just. Makes. Everything. So. Fucking. Easy!

This weekend I was working on a scrobbler for Delicious. Basically this thing is supposed to go through a user’s Delicious history, scrape every website it finds, send the results to three different semantic API’s and build connections between the tags those API’s return and the ones the user used to tag the particular link.

Now obviously there’s a lot of downtime involved here for every iteration. You’re easily looking at 10 solid seconds of waiting per website. This means that scrobbling 838 pages (my stress test) would take about two and a half hours. With multiprocessing it took something like 20 minutes.

The beauty of this approach is that I’ve never ever ever done anything in parallel. And yet I could do funky things like worker pools, queues, semaphoring and a bunch of other stuff I’ve only heard of in fairy tales until now … in an hour.

So there you go, an investment of a few hours for learning from scratch and some tweaking to create a ten-fold increase in speed.

Reblog this post [with Zemanta]
Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • LinkedIn
  • StumbleUpon
  • TwitThis
  • DZone

Tags: , , , , , , , ,

If you liked this post you should follow me on Twitter,
or see the music I like on last.fm,
or perhaps leave a comment, I like comments,
or go do your job because I know you're slacking,
or go write a blog of your own,
or tweet about something interesting,
or go out and have some fresh air,
or find a girlfriend,
or a boyfriend,
or a manbearpig,
or for fuck's sake stop reading this already,
no?
This entry was posted on Monday, January 18th, 2010 at 17:01 and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a reply

Name (*)
Mail (will not be published) (*)
URI
Comment