How TallyRoad Processes Your Traffic Video

1 February 2025 · 6 min read

When you upload a video to TallyRoad and configure your intersection lines, a sequence of steps happens automatically. Here is what is actually going on.

Step 1: Upload and metadata extraction

Your video is uploaded directly to Cloudflare R2 object storage. As it uploads, TallyRoad extracts metadata: duration, frame rate, resolution, and file size. This metadata determines whether the video needs to be processed as a single unit or split into chunks (videos longer than 30 minutes are automatically chunked).

Step 2: Configuration

The intersection configuration you draw — the approach lines — is stored as a JSON object containing the coordinates of each line relative to the video frame dimensions. When you draw a line from one side of the intersection to the other, you are defining a virtual tripwire that vehicles will cross.

Step 3: Detection with YOLOv11

Processing starts by running each video frame through a YOLOv11 object detection model. TallyRoad samples frames at 5 frames per second rather than processing every frame — this is fast enough for accurate tracking while significantly reducing processing time.

YOLOv11 produces a list of detected objects for each frame: bounding box coordinates, a class label (car, truck, bus, motorcycle, bicycle, pedestrian), and a confidence score. Only detections above the configured confidence threshold are used.

Step 4: Multi-object tracking with ByteTrack

Detection alone gives you a list of objects in each frame, but not which object in frame 1 is the same as which object in frame 2. That is what ByteTrack does.

ByteTrack maintains a list of active tracks (individual vehicles being followed) and uses a combination of position prediction and bounding box overlap to match detections across frames. Each track gets a unique ID that persists across the entire video.

When a vehicle enters the frame for the first time, ByteTrack creates a new track. As the vehicle moves, ByteTrack updates the track position each frame. When the vehicle exits the frame, the track is marked as complete.

Step 5: Approach line crossing detection

For each frame, the counter checks whether any tracked vehicle has crossed one of the configured approach lines. A crossing is detected when a vehicle centroid moves from one side of the line to the other between consecutive frames.

The first line a vehicle crosses is its entry approach. The second line it crosses (if any) is its exit approach. The combination of entry and exit approach determines the movement classification: left turn, straight, right turn, or U-turn.

Step 6: Result aggregation

Once processing is complete, all the track records are aggregated. Vehicle counts are grouped by class and by time interval. The movement records (for TMC mode) are grouped by entry approach, movement type, and time interval.

Peak Hour Factor is calculated: the algorithm finds the one-hour window with the highest total volume, identifies the highest single 15-minute interval within that window, and computes PHF = peak hour volume ÷ (4 × peak 15-minute volume).

Step 7: Chunked processing and stitching

For long videos (over 30 minutes), TallyRoad splits the video into overlapping chunks. Each chunk is processed independently. The overlap (30 seconds between chunks) allows the stitching algorithm to detect and remove duplicate tracks that span chunk boundaries.

The stitched result is a single dataset as if the video had been processed in one pass.

Step 8: Report generation and storage

The final results are used to generate an Excel report with multiple sheets: Summary, Interval Counts, Turning Movements (if TMC mode), and Peak Hour Analysis. An annotated preview video is also generated for videos under 2 hours, with bounding boxes and track IDs drawn on each frame.

Both files are uploaded to R2 and made available via presigned download URLs. The original video is deleted from storage after processing completes to minimise storage costs.