Toll Data#

GitHub

Introduction#

The purpose of this module is to import, summarize, analyze, and export toll data and analysis. Pandas and numpy libraries are used extensively. Unfortunately, the repository does not include sample data due to privacy reasons, so it is shared for both reference and as a work sample.

Detailed API information can be found in the Docs folder as HTML files.

Transaction Files#

Transaction files are the most basic storage unit of toll data and contain information such as time of travel, axle count, transponder (if present), license plate, etc. Both Excel and csv files can be processed, and data is exported from memory as a DataFrame, or to disk as a csv file.

Trip Files#

Trip files contain similar information to transaction files, but there are a a few differences. The trip files contain the fare assigned to a trip, as well as the final OCR value that will be sent to the customer service system. Toll trips have one or more toll transactions, and are aggregated by trip ID.

Rate Assignment#

Rate assignment is straightforward for time-of-day facilities. If you know the following parameters you can assign a value.

  • Datetime

  • Transaction type

  • Axles

  • Transponder/Tag Status

  • Pay-by-Mail status

  • Holidays

One exception is that it is not possible to know whether a vehicle will be pay by plate or pay by mail, so this can be assigned using a probability model, or assigned the default pay by mail rate.

Plate Combinatorics#

This class provides a simple way of determining a set of possible OCR mistakes from common errors. For example, the value of B is often mistaken for the numerical value of 8. A plate with a value of 88 would return the combinations of BB, B8, 8B, and 88. This process is executed for arbitrarily complex plates, using a lookup table of common errors.

AVI Validation#

This class performs automated AVI testing, essentially whether a plate is read without its associated tag. The read threshold, which represents the number of times a plate and tag are seen together can be set to constrain the number of errors detected. The input data is required to have the following fields and be in csv of excel format:

  • TAG_ID, representing the transponder ID

  • PLATE, for the plate value without state identifier

  • TRX_ID, for the transaction identification number

The output file includes the new fields AVI_MISMATCH set to True or False based on the result of the test, and the field MISSED_TAG_ID for the ID of the missed tag, if applicable.

A CSV file can be used to as a lookup dictionary, or the dictionary can be generated as the analysis is performed. The API also allows users to state whether they want to use a static dictionary that is not updated as the test progress, or one that is continuously updated.

AVI Test#

This class performs an AVI validation test, which is similar to the AVI Validation, but made to be more extensible and easier to repeat. A minimum of 30 days of data is required, without modifying the source. This is based on previous experience, and that a large statistical sample is required for better validation.

The test takes some time to execute since it imports a large number of files. The results of the first run are exported to a .pkl file. If an existing .pkl file exists this step can be skipped, and runtime is reduced.

The process works as follows:

  1. Import files for analysis, or using existing data

  2. Select random start date

  3. Build plate/tag dictionary

  4. Run AVI Validation test and compute metrics

This analysis can be repeated any number of times, and the get_test_result method can be used to aggregate this information.

Travel Time#

This module provides accurate travel times for user-defined trips. It does so by calculating all node-to-node travel, and then the total travel time based on the specified trip. The provided DataFrame needs to include: unique trip identifiers, datetime information, and node names. Once a TravelTime object is created, travel times can be calculated using get_travel_time_all_day or get_travel_time.

The module also has a data from a real-world example.

Import the test file as a dataframe, then parse the datetime information.

    df = pd.read_csv('_hashed_export_test_data_trip.csv')
    datetime_format = '%m/%d/%Y %H.%M.%S.%f'
    df['DATETIME'] = pd.to_datetime(df['Trans Time'], format=datetime_format)

Create a travel time object.

    sample_travel_time = TravelTime(df)

Define the trip using a selection of toll points.

    trip_def = ['SB01', 'SB02', 'SB03', 'SB04', 'SB08', 'SB09', 'SB10']

Get travel times for the entire day.

    travel_times = sample_travel_time.get_travel_time_all_day(trip_def)

Trip Builder#

This module allows the grouping of transactions into trips. It takes in a Pandas DataFrame and requires transaction ID, datetime, plaza, transponder ID, and plate ID fields. This class includes detailed logging that can be enabled.

Using this class is very simple. First define a list of exit nodes. The exit nodes define locations where trips automatically end. The example exit nodes are from a real-world system.

    import TripBuilder as tb
    df = [DataFrame]
    exit_nodes = ['NB10', 'NB05', 'SB06', 'SB10', 'SB11']

Import the dataframe and create a TripBuilder object

    build = tb.TripBuilder(df, exit_nodes=exit_nodes)

Run the build and get the results

    build = tb.TripBuilder(df, exit_nodes=exit_nodes)
    build.build_trips()
    df_result = build.get_dataframe()

Testing#

To test this module run python -m pytest in the toll level directory tolldata. This will execute the tests scripts for the various modules. While the tests are not very extensive, they should catch major errors.