## Introduction

The haversine formula implemented below is not the most accurate distance calculation on the surface of a sphere, but when the distances are short (i.e. GPS tracks) is completely adequate and very fast.

The key to fast calculations of piecewise GPS segments is to avoid looping and utilize the great vectorization potential in NumPy/pandas.

## Haversine distance

This implementation of haversine takes pandas series or NumPy arrays or scalars and returns the distance in km.

def haversine(lon1, lat1, lon2, lat2): ''' Returns the distance between the points in km. Vectorized and will work with arrays and return an array of distances, but will also work with scalars and return a scalar.''' lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2) distance = 6371 * 2 * np.arcsin(np.sqrt(a)) return distance

## Speed calculations based on haversine

Below is a vectorized speed calculation based on the haversine distance formula. It works on pandas series input and can easily be parallelized to work on several trips at a time.

def gps_speed(longitudes, latitudes, timestamps): """ Calculates the instantaneous speed from the GPS positions and timestamps. The distances between the points are calculated using a vectorized haversine calculation the great circle distance between two arrays of points on the earth (specified in decimal degrees). All args must be of equal length. Args: longitudes: pandas series of longitudes latitudes: pandas series of latitudes timestamps: pandas series of timestamps Returns: Speed is returned an array in km/h. Example: >>> df['gpsSpeed'] = gps_speed(df.longitude, df.latitude, df.recordedAt) """ assert longitudes.shape[0] > 1 assert latitudes.shape[0] > 1 assert timestamps.shape[0] > 1 lon1 = longitudes.values[:-1] lat1 = latitudes.values[:-1] lon2 = longitudes.values[1:] lat2 = latitudes.values[1:] # Vectorized haversine calculation lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2) km_array = 6371 * 2 * np.arcsin(np.sqrt(a)) time_array = (timestamps.diff().dt.seconds / 3600).values[1:] # Calculate the speed time_array[time_array == 0] = np.nan # To avoid division by zero speed = km_array / time_array # Make the array as long as the input arrays speed = np.insert(speed, 0, np.nan, axis=0) return speed