Skip to main content

Visualizing Taxi Trips in 3D

A 3D model of New York City showing taxi journeys as lines
Animated Taxi Lines Full Size

Some time ago, I came across a detailed 3D model of NYC from NYC DoITT.

I was interested in using it to make some 3D animations as New York is such an iconic and well-recognized city. The model consisted of 20 GML files which uncompressed to ~12GB. I was able to import these into Blender with the help of this CityGML Plugin.

Due to the size of the data, I imported each file separately before saving them out as an OBJ file. I would then use a welding utility I had previously written to remove duplicate vertices which cut the file size down dramatically. Blender has a welding routine, but I found it only welded some of the vertices when working with large models.

3D model of New York City stretching far off into the distance
New York Combined Model. Full Size

Taxi Data

Now that I had the model imported, I set about finding a dataset I could visualize using it. One that immediately came to mind was the NYC Taxi Trip Dataset. These are CSV files that contain journey start/end locations as well as the pickup and dropoff times. They also contained the fare and any tolls paid.

I chose the January 2016 file and decided to limit it to Manhattan island for performance reasons.

Data Processing

The first thing I did was limit the data to a particular time window. Filtering with QGIS didn't work due to the number of records exceeding 33 Million.

Using a similar approach to my previous flight visualization, I wrote a Javascript utility to filter trips that took place between 4 PM-5 PM on the 6th of January 2016. I also created a 4 point bounding box of Manhattan and excluded any trips that occurred outside of it.

The bounding box was very rough and a large number of points were located on the Hudson River due to GPS errors.

To clean it up further, I imported an outline of Manhattan into QGIS and then imported the taxi trips in 2 passes. The first pass was with the pickup location as the point geometry. I then used the marquee selection tool to delete invalid points.

I then saved the result as a CSV file without geometry. Reimporting it back in again and repeating the process with the drop-off location.

Route Calculation

The next problem was estimating the taxi route. Each record only contained a start/end location meaning I had no idea which roads the driver chose. Visualizing this would show lines going directly through buildings.

To solve this, I used GraphHopper. An Open Source routing server that uses OpenStreetMap as its input data. Geofabrik.de has this data available in convenient regional extracts. I grabbed the entire state of New York and pointed to it using the GraphHopper config file.

GraphHopper was very simple to use. Once it was running with the correct config file, I could access it via a browser UI.

A screenshot of the web interface for GraphHopper
GraphHopper Web Interface.

Looking at the documentation and browser network requests, I worked out how to construct a GET request containing the pickup and dropoff location.

http://localhost:8989/route
?point=${START_LAT},${START_LON}
&point=${END_LAT},${END_LON}
&type=json&locale=en-GB
&profile=car&points_encoded=false
&instructions=false`

This returns an array of latitude longitude locations for each street turn.

I then looped over the records and calculated a route for each journey. GraphHopper was incredibly fast and only took ~30 seconds for the 14k records.

JavaScripts await function was used to perform each request one at a time to avoid overloading the server.

I took the point arrays for each journey and created a combined GeoJSON file with each journey as a GeoJSON feature using the LineString geometry type.

Using QGIS I reprojected the GeoJSON file to NYC 2263 and saved it out as an ESRI shapefile which BlenderGIS supports.

Blender Alignment

To align the data accurately in Blender, I came up with the following process. I am sure there is a better solution but this was simple and easy to perform.

First, I set the pivot point of the 3D city model to a particular building that I could locate easily using Google Maps. I chose the center of the rooftop dome at the Met Museum of Art in Central Park. I then transformed the entire model to the world origin so that the dome coordinates were now (0, 0, 0).

Next, I used Google Maps to find the latitude, longitude for this dome. I created a point in QGIS and saved it as a shapefile which I imported in Blender.

Now I had the 3D city model at the origin. I also had the taxi journeys imported as individual lines. Finally, I had the dome point. The dome point and the taxi lines were correctly aligned together as they were both imported via BlenderGIS.

I then set the pivot point of each taxi line to be the location of the dome point. Finally, I transformed all the lines to (0, 0, 0).

The result was a perfect alignment of the taxi journey lines with the 3D city model. The key was to figure out the projection of the 3D city models. I confirmed this by opening one of the city GML files in a text editor and saw NYC 2263 in the file header.

Animating the lines

I wanted to create an animation that represented one hour of taxi journeys. Each line should start and end animating at the correct relative time. To do this, I wrote a Python script in Blender. Each journey 3D line was named with an ID number which was used as a lookup to set the start and end keyframes correctly.

Now that the lines were animating correctly I gave them a neon glowing material. I used an HDRI for lighting and rendered out an animation.

Blender Render Watchdog

I had an issue with Blender crashing, possibly due to the size of the file and amount of lines animating. The crashes were random and sometimes it would render 100 frames before crashing and other times it would crash after every few frames.

I wrote a watchdog script that I could use to point to a Blender file and headlessly render it. I could set how many frames it needed to render. If Blender crashed, the watchdog script would check if it had finished rendering, if the final image file was not on the disk then it would reopen Blender to continue rendering.