Converting TransXChange documents to GTFS

After creating tools to convert UK rail data and SSIM flight schedules to GTFS, the next logical step was bus data.

Most countries use GTFS as their standard for bus data and many feeds can be found on transitfeeds.com. However, for historical reasons the UK has it’s own standard for bus data: TransXChange.

To the governments’ credit there is a lot of documentation, schemas and even data sets that are freely available. Unfortunately, that doesn’t necessarily make it easy to work with.

There are a number of projects that will convert TransXChange documents to GTFS, but they don’t seem actively maintained and each comes with their own quirks.

transxchange2gtfs

Enter transxchange2gtfs, yet another tool to convert TransXChange documents to GTFS. Rather than being just another tool, this one has some advantages over the others:

Smaller output

Date handling in a the TransXChange is a lot more flexible than GTFS and different tools deal with this with different degrees of success. A common practice in TransXChange documents is to specify a service date range and then narrow it down with periods of non-operation for specific vehicles. Rather than adding multiple exclude days for each day of non-operation it’s usually possible (and more efficient) to modify the start and end date of trip.

Another optimization that reduces the overall file size is to re-use identical calendars between trips.

More accurate data

Rather than specifying specific dates that should be excluded are added to the base calendar, TransXChange allows you to specify a bank holidays by name. This means that the dates of each bank holiday needs to be stored or calculated in the tool. A number of tools don’t quite get this right. transxchange2gtfs contains valid bank holidays for the next 7 years.

The original tool does not calculate stop times correctly. I wasn’t able to track down the exact source of the bug, but it misses stops when the journey pattern is a reference to another vehicle.

The UK government also provide a NaPTAN data set that contains stop names, longitude and latitude for every bus stop in the UK. This has been included in transxchange2gtfs so that the most accurate data is always provided.

Low memory usage

Most large files use less than 1GB of memory. Processing the entire UK data set only requires 2GB and takes roughly 25 minutes on a single core machine. A proof-of-concept for a mult-process version was created but it didn’t improve overall performance very much. Be warned, the GTFS version of the full UK data set is about 3GB once it’s uncompressed.

Interchange and transfers

The GTFS standard contains a transfers.txt file for connections between routes. This file is commonly used for interchange times and transfer times between nearby stations.

A default interchange time of 120 seconds is added for each stop.

Any stops within roughly 0.5 mile of another each other will have a bi-directional transfer added between them.

Installation and usage

transxchange2gtfs can be installed with npm, assuming you have node 10 or above installed on your system:

npm install -g transxchange2gtfs

And run on the CLI:

# single TransXChange document
transxchange2gtfs transxchange.xml gtfs.zip

# Zip file with multiple documents
transxchange2gtfs transxchange.zip gtfs.zip

# Multiple zip files and documents
transxchange2gtfs transxchange.zip other.xml onemore.zip gtfs.zip

When processing multiple documents it is assumed that any stops with the same ATCO code are the same stop.

Feedback

The target audience for this tool is likely to be quite small. However, if you are in that number please try it out and file any bugs you find.

Back