Paul Koppen
http://paulkoppen.com/

WordPress Backup

Though my current hosting provider excels at keeping backups, this is certainly not true for all of them. This very simple Python script creates backups of all important stuff on your WordPress blog like your uploaded photos and videos, themes, and database (that holds all your posts, comments, and more). More importantly, it takes no configuration because it reads everything it needs automatically from your WordPress installation.

So this is how you run it: wp-backup.py YOUR_WORDPRESS_FOLDER
and the backup files will be created straight away.

And this is where you get it: wp-backup.py

Thumbies: Picasa Thumbnails

Thumbies is a jQuery plugin that renders your Picasa albums in thumbnail view. It is particularly aimed at having your latest album as a widget on your website (e.g. as a blog widget), showcasing some photos with easy access to the whole album on Picasa Web. I think this is best demonstrated with an example.

The above thumbnails, with album title, and links to the photos in their album, are all rendered with only these few lines of code:

var user    = '112717293352090528392',
    album   = '5785180675340381969',
    options = {'max-results':5, thumbsize:104, 'link-to':'link'};
$('#thumbies-example').thumbies(user, album, options);

For more details, more options, more examples, and of course to download the script visit the Thumbies page.

Twit: Twitter Timeline Crawler

Twit crawls twitter user timelines, extracts and parses all tweets, and writes extracted texts and user mentions to files. Notice that the texts you will crawl come in all the languages of the world (so text files are stored in UTF-8). Usually for language learning you want to have texts in only one language or at least have control over the languages. The user mentions (not "followers") will be logged separately and can e.g. be used to visualise / assess social relationships.

There is one main script which is used to control a central queue server and possibly many crawler clients. Both the server and crawlers will run as daemon.

  • Manage the queue: manage.py queue [start|stop|save|bootstrap] where the last two commands can only be run when a queue has already been started.
  • Manage a crawler: manage.py crawl [start|stop].
  • Configure both in config.py and twitter_keys.py (see comments in those files).

Download here (released 25 March 2013).

I have managed to download gigabytes of text with 15 computers simultaneously, so this should work quite well. Future releases will distribute the queue server and use some database to hold the queue to free a bit of memory.