Skip to content

Bank statement scraper for Bank of Ireland

Like many people, I was also losing track of my finance. Having bank accounts in use in both NL and IE probably didn't help. :-) As any proper FOSS geek, I learned to like the monster called GnuCash. (Psst! Guys! It's pretty amazing that a product more than ten years old still doesn't let you do operations (like delete) on multiple entries at once, dont you think?)

And there's this thing about Irish Banks. They have bigger issues to worry about than how well their Internet banking service works. What keeps you away from looking at my bank account? You (hopefully) not knowing my six-digit user ID, date of birth (top secret information! Have I mentioned that my birthday is next Saturday? ;-P) and another six-digit number, this time my PIN number. No one-time passwords, no challenge-response system, nothing else.

My only hope is that this lets you transfer money only to accounts to which I've transferred money before. IOW all you can do is give my landlady a little present. Pfew!

Also, going back to the original topic, there's no way to export info from their web interface. So I wrote one myself. One advantage of a pretty simple website is that I could easily write a scraper for it. Run it with the right arguments, and it'll spit out a CSV bank statement, ready to be fed to your favourite accounting software.

What else have I been doing? Been working on Giggity. Android development's fun. I spent the weekend scraping the Dance Valley timetable page, Google, Last.FM, Wikipedia and more to automatically generate a Giggity schedule file for it. Love it! :-)

On Pandaboard SD card performance

I have the Pandaboard running as my home server for a while now. Until last weekend, I was using a Microdrive as its root filesystem. Sadly, the drive seems to be broken. :-( That means I finally had a chance to try bootstrapping a server very quickly using Puppet. This worked fairly well, which means the time investment is paying off already.

Since all the storage I had at home was the 32GB SD card I bought for this thing anyway, I decided to give it another chance. At some point I was reminded already that alignment really matters with these things. Some Bonnie++ runs do seem to confirm this. I removed the second partition on the SD, and recreated it on a 4MB barrier. (The trick to do this is to use the "u" command in fdisk to switch units to sector instead of cylinders, and make sure the start sector is a multiple of 8192.)

To be honest, I did run most of these benchmarks with the SD card reader/writer in my desktop machine. Only the last test was done on my Pandaboard, but as you can see the results are very similar.

Version 1.96Sequential OutputSequential InputRandom
Sequential CreateRandom Create
SizePer CharBlockRewritePer CharBlockNum FilesCreateReadDeleteCreateReadDelete
K/sec% CPUK/sec% CPUK/sec% CPUK/sec% CPUK/sec% CPU/sec% CPU/sec% CPU/sec% CPU/sec% CPU/sec% CPU/sec% CPU/sec% CPU

Click here for a table not f*cked up by my blog software.

Although the throughput numbers for ext3 are pretty similar for non-aligned and aligned access, look at the latency numbers. Unfortunately I haven't got a clue how Bonnie++ calculates these and can't find very good documentation on it. Throughput may be average and latency worst-case? Either way, as you can see a misaligned partition can cause some slowdowns.

What surprised me more is that a switch to ext4fs sped up things a lot more, up to the point that the performance is perfectly reasonable! I'm running with this SD as my root filesystem now and everything just works. (While before a simple apt-get install run could take several minutes.)

While I was at it, I also tried out logfs and nilfs2, which are officially optimised for flash media. However, AFAIK they're more meant for raw NAND storage, not for block devices with all the NAND logic abstracted away (like anything you buy in stores these days). Not worth it for these SDs.

Obviously this test is far from scientific. Only in the case of ext4-panda have I run the test five times to then pick a decent result (there were some outliers in all areas). All other tests were done on a freshly formatted filesystem, which I'm sure also doesn't make the result that reliable.

Just my 2 cents! But my Pandaboard's definitely happier now. Here's hoping that wear leveling works well..

If you're interested, here is a more thorough overview of SD card performance. The LWN article about flash storage it links to is interesting too. The Flash card I used here is a 32GB class 10 Transcend card.

Splitting PDFs with pyPdf

A simple task, yet I couldn't find a quick cmdline to do it with, apart from pdftk, 15MB of Java rubbish.

Instead, here only 10 or so lines of Python. It was so fast I wasn't sure if it worked until I saw the results were there. Usage: split [prefix] [infiles...]. Multiple infiles possible. First argument is the filename prefix to use for all created files.

import pyPdf
import sys

n = 0
for f in sys.argv[2:]:
f = pyPdf.PdfFileReader(open(f))
for p in f.pages:
of = pyPdf.PdfFileWriter()
of.write(open("%s-%03d.pdf" % (sys.argv[1], n), "w"))
n += 1

Don't pay attention to Serendipity screwing up the code layout. We all know it's rubbish, I just can't be arsed to migrate to something better. :-/


As a bit of a cloud "sceptic" I still like to waste too much time maintaining my own network/IT infrastructure. :> I'm definitely trying to avoid the more tedious stuff though. I started using Puppet a while ago which definitely helps.

Last week I was looking for a way to automatically populate DNS reverse lookup zones. The only thing I could find was mkrdns which is unmaintained for almost ten years and doesn't seem to support IPv6. So I decided to write my own thing, dnsrev.

It's pretty simple, written in Python with help from some modules. It can read any number of zonefiles and update any number of reverse zonefiles. There's no need for any kind of 1:1 mapping between them, so it can deal with multiple netblocks in one zonefile, etc. I hope it'll be useful to someone. Comments, suggestions and patches are welcome.

Shiny happy hardware

For years I'm using Winterms as simple home "servers". It was a fun project to work on and some people were even nice enough to send me some examples of more powerful (relatively, we're talking about ~300MHz here at most) hardware. Two of them are still working as nameservers/printservers and one of them even hosted the Winterm hacking website for a while.

But they're getting old, slow, and pretty painful to upgrade. Time to move on I'm afraid. So before my last trip to the US, I ordered two shiny pieces of hardware: A Pandaboard and an Nvidia Tegra developer board. Due to circumstances, I didn't really expect both (or even either) of them to arrive - Nvidia seemed to send the board only to people who have projects they find interesting/important (stuff like the Motorola Xoom probably), and the Pandaboards never seem to be in stock.

Yet, here I am with both of them, wondering which one to actually use. :-)
Left: Pandaboard, right: Tegra2 250 Harmony board

I guess I'll just write down my findings here so far. I'll probably end up using both, one as a server and the other one to run stuff like xbmc on my TV.

Both boards seem quite similar, spec-wise. Two 1GHz ARM cores, 1G of RAM, USB, sound, networking (including WiFi and Bluetooth), HDMI output, and an SD card slot. The Pandaboard has an internal antenna, no clue about the range.

Although both boards' USB ports apparently aren't really meant for powering 2.5" USB HDDs, it seems to work quite well anyway. Which is good, because SD cards as root filesystems seems like a bad idea. Did you know that (according to bonnie++) a desktop hard disk from 2007 outperforms SD cards (at least in the Pandaboard and Tegra) not just on sequential reads, but also on seeks? So yeah, I may be using USB HDDs instead, which sadly means more power usage. :-( Especially in the Pandaboard SD performance is too bad to be usable.

One big advantage of the Pandaboard seems to be the community. A pretty busy (and generally helpful) IRC channel, lots of info online on Wikis. The Pandaboard iss "just another OMAP architecture" so lots of stuff that worked for BeagleBoard should work on the Panda with some customizations. Canonical/Ubuntu also support the thing officially.

Here comes the biggest contrast with the Tegra. Nvidia seems to be too busy with Android, the result is that there's little support for doing other stuff with the board. The only thing you get for now is L4T (Linux 4 Tegra), which is an Ubuntu Jaunty (yes, 9.04, that's two years ago by now..) image you can run on it. There are efforts on getting Lucid to run, don't know where those are ATM. But one complication there is some binary-only drivers/helpers (like nvrm_daemon, which I guess manages the memory shared between OS and video/etc), which means troubles getting X to work after an upgrade. Ouch.

The Panda also certainly wins in the bootloader department, as it just loads uboot stuff from a FAT partition on the SD card (tricky part here is that if you do anything wrong with the partitioning and formatting of this SD card, the boot process will just fail silently). For flashing the Tegra you need a proprietary fastboot flasher binary. Possibly, once booted, I can just write my kernels to NAND myself from inside the OS, but I haven't yet tried this.

So yes, with this all in mind, it's a delight to run a normal (and not outdated) Debian/Ubuntu install on a Pandaboard. Video is also supposed to work flawlessly almost out of the box on Ubuntu. However, I seem to be unlucky/doing it wrong since the framerate is not impressive, and playback seems buggy. (While the little video playback I've done on the Tegra was pretty good, super smooth, and with only 10% of CPU usage!)

For my original goal, running a simple home server, I feel that both boards are suitable - I'd just run Debian inside a chroot on the Tegra so the helper daemons (and maybe some video stuff) can run outside it. But before I get this video stuff to work, I have some work to do. And hopefully, if I wait for long enough, some other patient souls out there will also fix some of these problems...

Xen horror, just upgraded my box to Squeeze

Of course I could just not upgrade, but I'd have to do it sooner or later anyway..

It looks like Xen, now that it's owned by Citrix, also suffers from the XML manager syndrome. At least the extremely annoying clock bug was fixed, which means I get >4ms precision in timing again.

Just dumping this here in my blog since more people seem to have this problem and aren't getting very helpful answers so far. Or maybe I just didn't try the right Google queries..

hypnotoad:/tmp# xm suspend bijtje
Error: Domain is not managed by Xend lifecycle support.
Usage: xm suspend <DomainName> Suspend a Xend managed domain
means you're not using the shiny new (whatever the purpose is) lifecycle tool that keeps track of domain configs and other stuff in /var/lib/xend/domains/$UUID/. You can use "xm new" to set this up, except this is broken on Debian:
hypnotoad:/tmp# xm new
Unexpected error: <type 'exceptions.ImportError'>
ImportError: No module named xmlproc
This, my dear reader, means that Debian's shipping Xen utils that depend on a Python XML module that was actually removed from Debian since it's not maintained by upstream anymore.

So I was just trying to figure out how to make this all go, and then I realised:
hypnotoad:/tmp# xm save bijtje bijtje.sav
hypnotoad:/tmp# xm restore bijtje.sav
Oh look! I can still do it. I just have to tell Xen where to save the statefile.

So in short:
  • Xen is also suffering from XML-itus
  • Debian drops packages that other packages still depend on
  • Use "xml save", not "xml suspend"

Burn all spammers!

I have a habit of always having a tail -f /var/log/mail.log running on my mailserver somewhere. It's noisy, but has been useful in the past. Over the last weeks/months, I noticed open relay probes are getting incredibly popular (again), but also extremely aggressive. They're frequent, done by hundreds of botnet drones all the time.

Obviously my Postfix is configured properly, so this is mostly a waste of (fairly scarce, on a DSL box several km away from the exchange) bandwidth and annoying noise in the logs. But getting rid of it is harder than I hoped. :-(

This is what I have now: iptables -I FORWARD -p tcp --sport 25 -s -m string --algo kmp --string '554 5.7.1 <' -j REJECT --reject-with tcp-reset

This works as-in it kills the connection as soon as my mailserver sends a "554 5.7.1 Relaying denied" response. The REJECT goes to the mailserver, but together with the tcp-reset this also kills the TCP connection on both sides fairly quickly. However, the little fuckers are also using pipelining, so I still get a screen full of logspam for pretty much every attempt. Although this is mostly cosmetic, I'd love to get rid of that crap..

What I really wonder is, WTF are they even doing this? Are open relays really still that common? Don't they have their botnets already? I guess the open relays are nice multipliers and are also more willing to deal with stuff like graylisting...

[edit]Looks like "554 5.7.1" is not just about "relaying denied", so possibly not such a great idea. Don't try this at home!

BitlBee, alive and kicking

As quin in #bitlbee said a little while ago, I stole someone's mojo and found an amazing amount of productivity when it comes to writing code, and it feels great. I'm quit relieved that I can still find plenty of time and motivation to work on BitlBee even though during the week I already spend a lot of time at the keyboard. This after not working much on it for probably at least a year.

I managed to finally do the IRC core rewrite + abstraction that I intended to do for so long already. It'll allow adding non-IRC frontends to BitlBee if someone ever wants to, and also the IRC core has the flexibility it needs to add many more features that I wanted for years already, and were impossible to implement without adding even more horrible hacks.

There's also a libpurple-based backend for a few months already, plus file transfer support (written by Uli Meis and Marijn Kruisselbrink actually, it just took me a long time to merge the >3000-line diff, fortunately Review Board did make it a lot less painful), all thrown into a bleeding edge branch called killerbee. It's code that needs a little bit more work before I really like it.

Also, BitlBee has Twitter support for about two months already (thanks to hard work done by Geert Mulders), and according to the application registration page on Twitter it has almost 500 users already. It's quite likely that many of those used it for five minutes and went back to a client with more features, but it's still nice to see.

Last of all, to help with the current lack/fragmentation of online documentation there's now a BitlBee Wiki. Its supposed to have easy-to-find docs about common FAQs, but the easy-to-find part isn't really working out yet since it hardly shows up in any search results. Hopefully this hyperlink from a high-profile weblog will improve that a tiny bit. ;o) Possibly the content is not that good yet either, so if anyone has something to add to it, by all means, please do!

With a 1.2.8 release coming up, BitlBee is totally alive - and is for almost eight years already. It's been a fun project to work on so far, and hopefully will be for a long time.