Introduction

The haversine formula implemented below is not the most accurate distance calculation on the surface of a sphere, but when the distances are short (i.e. GPS tracks) is completely adequate and very fast.

The key to fast calculations of piecewise GPS segments is to avoid looping and utilize the great vectorization potential in NumPy/pandas.

Haversine distance

This implementation of haversine takes pandas series or NumPy arrays or scalars and returns the distance in km.

def haversine(lon1, lat1, lon2, lat2):
''' Returns the distance between the points in km. Vectorized and will work with arrays and return an array of
distances, but will also work with scalars and return a scalar.'''
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2)
distance = 6371 * 2 * np.arcsin(np.sqrt(a))
return distance

Speed calculations based on haversine

Below is a vectorized speed calculation based on the haversine distance formula. It works on pandas series input and can easily be parallelized to work on several trips at a time.

def gps_speed(longitudes, latitudes, timestamps):
"""
Calculates the instantaneous speed from the GPS positions and timestamps. The distances between the points
are calculated using a vectorized haversine calculation the great circle distance between two arrays of points on
the earth (specified in decimal degrees). All args must be of equal length.

Args:
longitudes: pandas series of longitudes
latitudes:  pandas series of latitudes
timestamps: pandas series of timestamps

Returns:
Speed is returned an array in km/h.

Example:
>>> df['gpsSpeed'] = gps_speed(df.longitude, df.latitude, df.recordedAt)
"""

assert longitudes.shape > 1
assert latitudes.shape > 1
assert timestamps.shape > 1

lon1 = longitudes.values[:-1]
lat1 = latitudes.values[:-1]
lon2 = longitudes.values[1:]
lat2 = latitudes.values[1:]

# Vectorized haversine calculation
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2)
km_array = 6371 * 2 * np.arcsin(np.sqrt(a))
time_array = (timestamps.diff().dt.seconds / 3600).values[1:]

# Calculate the speed
time_array[time_array == 0] = np.nan  # To avoid division by zero
speed = km_array / time_array

# Make the array as long as the input arrays
speed = np.insert(speed, 0, np.nan, axis=0)

return speed

Only registered users can comment.

1. David Kovar says:

My python fu isn’t quite up to this – how would you add a Z (altitude) component to the function?