Soccer World Cup has arrived and maths are being used to analyze potential results. Spanish newspaper El Pais combined current statistics with a parametric model based on a Poisson regression and uncertainty analysis to simulate potential results and to estimate the probability of each team to win games and to become the new world champion. At the end of the post you can find the results. Much more important than the results however, is the methodology applied in the model. In this post we are interested in the maths and we present a short summary about this model. Further statistical-mathematical questions about the model may be discussed later.

The model can be described in three parts: strength of the team, simulating individual games and simulating the whole tournament.

**Strength of team**

Some teams (Brazil, Argentina or Germany) are stronger than others (Panama, Egypt or Saudi Arabia). This difference is quite important in this case, because national teams do not play many games together. Thus, some individual players like Neymar may be vital to win some games. The strength was calculated based on the well known Elo Rating System. This system calculates the relative skills of each participant based on its performance ratings. Although originally developed for chess players, the Elo Rating System has been successfully applied to several sports. The model from El Pais used 3 different Elo ratings. One for the players, one for the teams, and one based on the goals marked by each team.

**Simulating individual games**

Individual games simulate the probability of goals marked by each team. This technique is based on a Poisson regression model proposed by Dixon and Coles (1995). Thus, the model calculates the probability of victory. The model was calibrated considering more than 17000 games. The model calibration considered difference performances for home games, away games and games in neutral field.

The calibrated model was evaluated based on the Rank Probability Score proposed by Constantinou and Fenton (2012).

Image. Calibration of the model (Source: El Pais)

**Simulating the whole tournament**

The previous step not only simulates the victory, but also simulates the goals. This is an important point to simulate the whole tournament. By simulating the number of goals the model can predict the first place and second place of each group; hence, defining the matches for the following stages. The last two steps were repeated 10 000 times (10 000 iterations) in order to consider different uncertainties. Although there are no details about the probability rules to define the next iteration, this was an important step because the model estimated the probability of each team to win a specific game, to pass to the next stage and to become the new Champion.

The following image shows the result of the beast teams. You can visit the whole table.

Image. Simulation result of the best teams