Toll Data

Toll Data#

Introduction#

The purpose of this module is to import, summarize, analyze, and export toll data and analysis. Pandas and numpy libraries are used extensively. Unfortunately, the repository does not include sample data due to privacy reasons, so it is shared for both reference and as a work sample.

Detailed API information can be found in the Docs folder as HTML files.

Transaction Files#

Transaction files are the most basic storage unit of toll data and contain information such as time of travel, axle count, transponder (if present), license plate, etc. Both Excel and csv files can be processed, and data is exported from memory as a DataFrame, or to disk as a csv file.

Trip Files#

Trip files contain similar information to transaction files, but there are a a few differences. The trip files contain the fare assigned to a trip, as well as the final OCR value that will be sent to the customer service system. Toll trips have one or more toll transactions, and are aggregated by trip ID.

Rate Assignment#

Rate assignment is straightforward for time-of-day facilities. If you know the following parameters you can assign a value.

Datetime
Transaction type
Axles
Transponder/Tag Status
Pay-by-Mail status
Holidays

One exception is that it is not possible to know whether a vehicle will be pay by plate or pay by mail, so this can be assigned using a probability model, or assigned the default pay by mail rate.

Plate Combinatorics#

This class provides a simple way of determining a set of possible OCR mistakes from common errors. For example, the value of B is often mistaken for the numerical value of 8. A plate with a value of 88 would return the combinations of BB, B8, 8B, and 88. This process is executed for arbitrarily complex plates, using a lookup table of common errors.

AVI Validation#

This class performs automated AVI testing, assessing whether a plate was read without its associated tag. The read threshold, which represents the number of times a plate and tag are seen together can be set to constrain the number of errors detected. The input data is required to have the following fields and be in csv of excel format:

TAG_ID, representing the transponder ID
PLATE, for the plate value without state identifier
TRX_ID, for the transaction identification number

The output file includes the new fields AVI_MISMATCH set to True or False based on the result of the test, and the field MISSED_TAG_ID for the ID of the missed tag, if applicable.

A CSV file can be used to as a lookup dictionary, or the dictionary can be generated as the analysis is performed. The API also allows users to state whether they want to use a static dictionary that is not updated as the test progress, or one that is continuously updated.

AVI Test#

This class expands upon the AVI validation test, offering more flexibility and making it easier to repeat. It is recommended to use a minimum of at least 30 days of data. This recommendation is based on experience with the system and on the observation that many users do not frequently travel on toll facilities. Issues with equipment or random errors with plate reads may misidentify issues and result in inaccurate performance calculations.

The test takes some time to execute since it imports a large number of files. The results of the first run are exported to a .pkl file. If an existing .pkl file exists this step can be skipped, and runtime is reduced.

The process works as follows:

Import files for analysis, or using existing data
Select random start date
Build plate/tag dictionary
Run AVI Validation test and compute metrics

This analysis can be repeated any number of times, and the get_test_result method can be used to aggregate this information.

Travel Time#

This module provides accurate travel times for user-defined trips. It does so by calculating all node-to-node travel, and then the total travel time based on the specified trip. The provided DataFrame needs to include: unique trip identifiers, datetime information, and node names. Once a TravelTime object is created, travel times can be calculated using get_travel_time_all_day or get_travel_time.

The module includes a data from a real-world example.

Import the test file as a dataframe, then parse the datetime information.

    df = pd.read_csv('_hashed_export_test_data_trip.csv')
    datetime_format = '%m/%d/%Y %H.%M.%S.%f'
    df['DATETIME'] = pd.to_datetime(df['Trans Time'], format=datetime_format)

Create a travel time object.

    sample_travel_time = TravelTime(df)

Define the trip using a selection of toll points.

    trip_def = ['SB01', 'SB02', 'SB03', 'SB04', 'SB08', 'SB09', 'SB10']

Get travel times for the entire day.

    travel_times = sample_travel_time.get_travel_time_all_day(trip_def)

Trip Builder#

This module allows the grouping of transactions into trips. It takes in a Pandas DataFrame and requires transaction ID, datetime, plaza, transponder ID, and plate ID fields. This class includes detailed logging that can be enabled.

Using this class is very simple. First define a list of exit nodes. The exit nodes define locations where trips automatically end. The example exit nodes are from a real-world system.

    import TripBuilder as tb
    df = [DataFrame]
    exit_nodes = ['NB10', 'NB05', 'SB06', 'SB10', 'SB11']

Import the dataframe and create a TripBuilder object

    build = tb.TripBuilder(df, exit_nodes=exit_nodes)

Run the build and get the results

    build = tb.TripBuilder(df, exit_nodes=exit_nodes)
    build.build_trips()
    df_result = build.get_dataframe()

Testing#

To test this module run python -m pytest in the toll level directory tolldata. This will execute the tests scripts for the various modules. While the tests are not extensive, they should catch any changes that would result in producing inaccurate results.