How do you choose between FRC rating models? We compare several models on three characteristics: predictive power, interpretability, and accessibility.
Given the importance of match strategy and alliance selection, several models have been developed that attempt to quantify an FRC team's contribution to match outcomes. We consider a wins baseline, the popular OPR and Elo rating systems, and the new Expected Points Added (EPA) model deployed on Statbotics. A brief summary of each model is included below.
When choosing between FRC rating models, multiple factors play a role. In particular, we consider the following three:
We evaluate predictive power by comparing the model's predictions to the actual outcomes. Accuracy is measured by the percentage of matches that the model correctly predicted. Brier Score measures the mean squared error and quantifies calibration and reliability. A Brier Score of 0 indicates perfect calibration and a Brier Score of 0.25 indicates no skill. Going back to 2002, we evaluate models on 160,000 matches, with special emphasis on 15,000 champs matches and the 80,000 recent matches since 2016.
Simply predicting "Red Alliance" each match results in an accuracy of 50%. To approximate the accuracy of an idle spectator, we consider a simple wins baseline that predicts the winner based on the alliance with the most combined wins. To evaluate the predictive power of a model, we compare its accuracy to the accuracy of the wins baseline. This reflects the extent to which the model is able to predict match outcomes beyond the eye test.
The table below can be customzied to include and exclude models, metrics, and years.
Caleb Sykes Elo
|2016 - 2022||65.80%||70.13%||70.85%||72.04%|
The Wins Baseline predicts the outcome of a match with a 65% accuracy on average. In comparison, the OPR model has a 68% accuracy, the Elo model has a 69% accuracy, and the EPA model has a 70% accuracy. Since 2016, these numbers are 66% (Wins), 70% (OPR), 71% (Elo), and 72% (EPA). While the EPA model outperforms the OPR/Elo models by only 1-2%, in relation to the wins baseline, the EPA model outperforms the baseline by ~20% more than the OPR/Elo models. The EPA model is the best performing model in 15 of the 20 years, including six of the last seven. The one exception is 2018, where the EPA model struggles somewhat with the nonlinear scoring system.
There are two caveats regarding the EPA model's improved performance. While we compare EPA to Elo and OPR individually, Statbotics previously used a combination of both, which has improved accuracy compared to either model alone. Still, the EPA model individually outperforms this ensemble, and future EPA iterations can ensemble with other models to reach even higher performance. Second, while the EPA model significantly outperforms other models during the season, this does not translate to champs. EPA stabilizes to an accurate prediction faster during the season, but by champs, Elo has caught up and is roughly equivalent to EPA.
We evaluate interpretability by considering the units of the model and the ability to separate the model into components. The OPR and EPA models are in point units, and can be separated into auto, teleop, and endgame components. Elo is in arbitrary units (1500 mean, ~2000 max) and cannot be separated into components. One benefit of the Elo model is that ratings can be roughly compared across years. Using normalized EPA, we can compare EPA ratings across years as well (blog post coming soon). In summary, the EPA model combines the best of both worlds: point units, separable into components, and comparable across years.
We evaluate accessibility by considering the availability of the model. Statbotics previously included the OPR and Elo models, but has now transitioned to the EPA model. The Blue Alliance calculates OPR and their own TBA Insights model. Caleb Sykes publishes a comprehensive scouting database on Excel. Statbotics and TBA have APIs that allow for integration with external projects. Each model has distribution channels that are more or less equally accessible to teams.
The EPA model is the best performing model in 15 of the 20 years, including six of the last seven. The EPA model is the most interpretable model, with point units, separable components, and year-normalized ratings. Finally, the EPA model is highly accessible, available on the Statbotics website and through Python API. In summary, we highly recommend the EPA model for teams and scouting systems. Please reach out to us if you have any questions or feedback.