Evaluation Metrics and Toolkit

In LaSOT, we conduct one-pass evaluation (OPE) to assess the performance of each tracker. In detail, we utilize three types of metrics, Precision, Normalized Precision and Success, to measure different tracking algorithms. The definitions of the three metrics can be seen in the paper or paper.

The evaluation toolkit (note: conference version) can be found here or on Github.

A new version of evaluation toolkit (suppprt both conference and journal versions) with complete tracking results can be downloaded here (local), here (GoogleDrive) or here (Baidu Pan, pwd: 2020).

Evaluation Protocol

We define three protocols for evaluating trackers on LaSOT as follows


  • Protocol I (no constraint): All 1,400 sequences in LaSOT (note: conference version) are employed for evaluation. Researchers are allowed to leverage any videos except for those in LaSOT for training their trackers.

  • Protocol II (full-overlap): Only 280 sequences in the testing subset (note: conference version) of LaSOT are utilized for evaluation. Research are allowed to leverage 1,120 sequences in the training set of LaSOT to develop their trackers. (Training/Testing split: Training Subset | Testing Subset)

  • Protocol III (one-shot): Only 150 sequences in the newly collected extension subset (note: journal version) of LaSOT are utilized for evaluation. Research are allowed to leverage all 1,400 sequences in the training set of LaSOT to develop their trackers. (Training/Testing split: Training Subset | Testing Subset)

Evaluated Trackers

We assess 48 popular tracking algorithms on LaSOT under Protocol I, II and III (see their definitions above). These trackers include deep learning based ones, correlation filter based ones with hand-crafted or deep features, sparse representation based ones and other representatives. Table 1 shows these trackers.

Table 1. Description of each tracking algorithm in detail.
Tracker Paper Where When Speed Code

Note: Each tracker is used as it is from authors' implementation, without any modification.

Evaluation Results

The following plots demonstrate the evaluation results of tracking algorithms under three protocols using three metrics. Click on each image to zoom-in for better view.

Plots of Protocol I (i.e., evaluation on 1,400 videos in testing set)

Plots of Protocol II (i.e., evaluation on 280 videos in testing set)

Plots of Protocol III (i.e., evaluation on 150 videos in extension subset)