Solving the vehicle routing problem for research biologists
20 June 2021
My partner is a research scientist working in a lab group. Every year, members of the lab are expected to go on long driving trips to collect specimens for research. This labor is shared among the members of the lab (in practice, the graduate student members of the lab). So even though one lab member’s work might only focus on a single species, a research trip might collect many different species, even if the lab member who requires them is not present on the trip.
In practice, the planned destinations are known at the beginning of the academic quarter. The trips need to be planned out in such a way that effort and route duplication is minimized, while each individual only goes on as many trips as their workload will bear.
While not a professional programmer or computer scientist by any means, I’ve been tinkering with software long enough that coming up with ways to solve problems like this is not that hard for me, and so I’ve been asked on several occasions now to solve the routing problem for them. Having done that, I thought I’d document my process (using entirely open source software and data) and provide a few hacked together scripts to simplify the process.
The core piece of the puzzle is Vroom, a software project explicitly designed to solve the vehicle routing problem in an efficient way. It’s designed to integrate with OSRM, a routing engine built with OpenStreetMap data. If you’re curious about the system, I recommend this underappreciated talk about Vroom.
Dealing with constraints can be complicated. Fortunately, I never had to worry about blackout dates, because the routes once calculated could begin whenever the driver was ready to start. That said, Vroom has one significant limitation: the only metric it allows you to optimize is total drive time. If there is 3 weeks worth of work to do, and it minimizes total drive time to send one vehicle on a very long trip, Vroom will do that. You can’t tell Vroom to, for example, minimize the length of the longest trip, or to make all trips as close together in length as possible. I thought at first that this would make Vroom unusable for my tasks (it certainly doesn’t seem like it would work for hot pizza delivery, for example), but with a lot of handheld tinkering you can get it to work using availability constraints on the vehicles. I also didn’t find any software that would handle the problem better.
While OSRM does provide a very generous free API, it’s still too limited for the kind of routing I needed to do. Vroom works by requesting a routing matrix from OSRM that contains the travel times between every pair of points you submit. Sometimes I would be given a thousand or more destinations. This meant I had to run a routing server myself. Fortunately I have an always-on server that was more than up to the task.
You can find the routing server at osrm-backend. There’s a Docker image, but as I prefer to run software in different containers I simply built it myself. The map data I needed (the US West subregion) is available to download from Geofabrik. After pre-processing, the data for the western United States is “only” 14 GB.
Very clear instructions for setting up the software are available on the project’s wiki page on Github. You’ll probably want to use the MLD pipeline as the pre-processing step is much faster. The page recommends the CH pipeline specifically for large distance matrices (exactly what we have), but as we’re only going to submit a small number of requests to this server in its lifetime, we can afford for them to be slightly slower.
Vroom has a documented JSON format that describes the available vehicles, the destinations they need to reach, and other constraints. I wrote a script to create the necessary input for Vroom based on data in a spreadsheet (exported to a TSV file).
The format of this file is
Latitude <tab> Longitude <tab> Skills <tab> Time
Skills is probably the most interesting feature I made available. Sometimes one scientist is particular good at identifying one species, and so the plan is to make sure they’re in the vehicle for every destination that involves collecting that species. In the spreadsheet, the scientists can put a number (or several numbers separated by a comma) in the corresponding Skills cell. Each vehicle has a corresponding list of numbers which identify the scientists in each vehicle (a scientist can go on multiple trips and so be in multiple vehicles). Vroom will use the skills feature to only send a vehicle to a destination if there is a scientist in the vehicle who can identify the specimen there.
The Time field is occasionally useful too. It’s an estimation of how long the vehicle will spend at the destination not moving (in seconds). In most cases, the scientists I was routing for would simply park the car, clip a branch of a tree or shrub, and get back in the car, so this was effectively zero. In some cases, however, the GPS point could be a mile or more from the road, meaning that a longer hike was in order.
The scripts that I’m making available are very much hacked together, not up to
my usual standard for code I make public. For example, there are important
constants in the
makejson.py file that have to be set or adjusted when the
script is run. I didn’t come up with a convenient way to use
We have the already-discussed list of VEHICLES:
VEHICLES = [[1,2],[3,4]]
This identifies two vehicles / trips, where two scientists (numbered 1 and 2) are in the first, while two other scientists (numbered 3 and 4) are in the second.
We have lists of starting coordinates and ending coordinates for each vehicle.
These are in
[longitude, latitude] format because that’s what Vroom itself
STARTCOORDS = [[-118.44, 34.068], [-118.44, 34.068]] ENDCOORDS = [[-118.44, 34.068], [-118.44, 34.068]]
Note how the starting and ending coordinates for the vehicles are all the same in this case. That’s a common pattern (starting and ending at the university), but you’re not locked into doing that by any means.
Last we have TIMEAVAIL:
TIMEAVAIL = [400000, 400000] # seconds
This is the messiest part. It’s the total drive time allowed for each car. In theory, this would be perfectly known to the participants in advance (they could allow more time for stops, breaks, sleeping, etc., as needed). In practice, though, it’s a balancing act. The research trips have to be done, so even if finishing all of them would take more time than the scientists theoretically have, put together, that just means that they have to find more time from somewhere. So in practice the only workable approach that I found was setting this value unreasonably high for each car, then reducing it selectively to get routes that looked reasonable. When you can’t reduce it any more without the router failing, you have a roughly optimal solution.
Vroom will print a large JSON file with the results (and the routes themselves
from OSRM in
polyline format), and so I have another script that takes the
result and prints the result as a long list of coordinates for each vehicle,
and also exports the routes (as lists of coordinates) to text files, so that
they can be shown on a map. Theoretically I ought to create KMLs or something
more useful, but this turned out to be more complicated than I had hoped to do
in Python and so I ran out of time.
My repository contains a
run.sh file that shows how I run the program for my
specific setup. Obviously you’ll need all the pieces in place (Vroom, OSRM on
a server, etc) in place for this to work. Running the script, even for a large
list of destinations, takes only 3-5 seconds, so re-running it while changing
the parameters is no big deal.
The results are extremely nice looking:
Thanks to GPS Visualizer for making it easy to plot these routes.
This solution shows three different routes, constrained to a limit of about a week per trip. Several of the points have additional constraints, such that only one scientist can collect the sample, and this has still allowed for a nice solution. In many cases, it’s possible for human intuition to come up with a map that looks a lot like this one, but getting the exact order of coordinates right turns out to be crucial in saving hours of drive time.
At the end of the day, I suspect my exact approach won’t be completely workable for most people. You’ll have slightly different constraints than me, a different input format for your data, or some other problem. That’s why I’m just sticking these scripts online instead of spending the time to make sure they’re as functional and well-written as they could be. Hopefully, others will be able to look to them for help in solving their own routing problems.