Vectorized GPS distance/speed calculation for pandas

Introduction

The haversine formula implemented below is not the most accurate distance calculation on the surface of a sphere, but when the distances are short (i.e. GPS tracks) is completely adequate and very fast.

The key to fast calculations of piecewise GPS segments is to avoid looping and utilize the great vectorization potential in NumPy/pandas.

Haversine distance

This implementation of haversine takes pandas series or NumPy arrays or scalars and returns the distance in km.

def haversine(lon1, lat1, lon2, lat2):                                                                                  
    ''' Returns the distance between the points in km. Vectorized and will work with arrays and return an array of       
    distances, but will also work with scalars and return a scalar.'''                                                  
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])                                                  
    a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2)                 
    distance = 6371 * 2 * np.arcsin(np.sqrt(a))                                                                         
    return distance

Speed calculations based on haversine

Below is a vectorized speed calculation based on the haversine distance formula. It works on pandas series input and can easily be parallelized to work on several trips at a time.

def gps_speed(longitudes, latitudes, timestamps):                                                                       
    """                                                                                                                 
    Calculates the instantaneous speed from the GPS positions and timestamps. The distances between the points          
    are calculated using a vectorized haversine calculation the great circle distance between two arrays of points on   
    the earth (specified in decimal degrees). All args must be of equal length.                                         
 
    Args:                                                                                                               
        longitudes: pandas series of longitudes                                                                         
        latitudes:  pandas series of latitudes                                                                          
        timestamps: pandas series of timestamps                                                                         
 
    Returns:                                                                                                            
        Speed is returned an array in km/h.                                                                             
 
    Example:                                                                                                            
        >>> df['gpsSpeed'] = gps_speed(df.longitude, df.latitude, df.recordedAt)
    """                                                                                                                 
 
    assert longitudes.shape[0] > 1                                                                                      
    assert latitudes.shape[0] > 1                                                                                       
    assert timestamps.shape[0] > 1                                                                                      
 
    lon1 = longitudes.values[:-1]                                                                                       
    lat1 = latitudes.values[:-1]                                                                                        
    lon2 = longitudes.values[1:]                                                                                        
    lat2 = latitudes.values[1:]                                                                                         
 
    # Vectorized haversine calculation                                                                                  
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])                                                  
    a = np.sin((lat2 - lat1) / 2.0)**2 + (np.cos(lat1) * np.cos(lat2) * np.sin((lon2 - lon1) / 2.0)**2)                 
    km_array = 6371 * 2 * np.arcsin(np.sqrt(a))                                                                         
    time_array = (timestamps.diff().dt.seconds / 3600).values[1:]                                                       
 
    # Calculate the speed                                                                                               
    time_array[time_array == 0] = np.nan  # To avoid division by zero                                                   
    speed = km_array / time_array                                                                                       
 
    # Make the array as long as the input arrays                                                                        
    speed = np.insert(speed, 0, np.nan, axis=0)                                                                         
 
    return speed

Leave a Reply