Bundesliga Data Shootout: Machine Learning competition on football data
17 May 2023 – Today most key events in Bundesliga and Bundesliga 2 matches are captured in a time-consuming manual process that involves several people and a multi-step sequencec. In a Kaggle competition titled Bundesliga Data Shootout, more than 500 participating software developer teams submitted their concepts for recognising match events reliably using Artificial Intelligence (AI). Kaggle is an online platform focusing on the areas of data analysis and Machine Learning. The Bundesliga Data Shootout was initiated by the DFL and its subsidiary Sportec Solutions AG (STS) which specialises in processing match data.
“We collect comprehensive data from all Bundesliga and Bundesliga 2 matches and broadcast it in real time with the help of our technology partner, Amazon Web Services. The clubs and our media partners can utilise this data in many different ways,” explains Dr. Hendrik Weber, Director – Sport Technology and Innovation at DFL. “The data provides valuable tactical insights for sport-related purposes such as match analysis, training and individual player coaching. But it also forms the basis for many of the DFL’s own activities and business areas. This is why we are exploring how the current manual process of capturing event data could be automated at least partially, and if possible, scalably. This would not only create synergies for our own competitions but also hold potential benefits for other leagues and competitions.”
While current technology is capable of providing a highly automated process to capture player and ball position data, this does not apply to event data, points out Mirko Janetzke, Senior Vice President Germany at STS, describing the specific challenge addressed by the Data Shootout competition: “We launched a Kaggle Competition in search of self-learning software models that can identify three types of match events reliably in selected video recordings from Bundesliga matches, and time-stamp them properly: throw-ins, passes and tackles.”
The winners of the Bundesliga Data Shootout, a team of three including Dr. Philipp Singer, Pascal Pfeiffer and Yauhen Babakhin, had participated in similar competitions in the past and earned top international Kaggle rankings with their developments. “We are all three football fans,” says Pfeiffer, explaining what motivated the winning team to join. “That alone was enough to make us eager for this competition. We also love challenges in the computer vision domain, which deals with digital interpretation of images. In this case, the requirement to correlate the identified events on the pitch to the match time clock with to-the-second accuracy added complexity to the task that appealed to us. We saw an opportunity to demonstrate and further develop our skills.”
Babakhin, Pfeiffer and Singer are all AI professionals. One of their working principles is to avoid reinventing the wheel whenever possible, and to skip anything that could overcomplicate the resolution of the task. “We always choose a very simple start,” says Pfeiffer. “We make sure we generate models that can be generalised, being careful not to get lost in details. Since we prefer lean end-to-end solutions, we only used the original video images as input without any human modification to achieve the desired insights.” Using greyscale instead of colour frames reduced the data volume to be processed significantly and accelerated processing.
“All in all, the quality of the more than 500 submissions was quite extraordinary,” says Weber. “But judging by the selection criteria, the winning solution was the most accurate model. We were impressed by the lean concept and the on-target accuracy of the results.” Janetzke adds: “This solution wasn’t nearly as complex as we had expected. That showed in the relatively short execution time. For the competition we have specified that the models presented needed to be able to fully evaluate nine match sequences of 30 minutes each within a total processing time of nine hours. This included submitting a list of the captured events per match, the relevant time stamps, and an evaluation of those results. The winning solution accomplished all that in five hours.” This means that the core purpose of this development competition has been achieved, says Weber. “The winning solution may not be production-ready, but it can now serve as a benchmark for what can be achieved realistically. We at DFL will build upon that and transform it for practical use,” he explains.
Luccas Roznowicz, Head of Digital Innovations at DFL GmbH, agrees that the result of the competition confirms the DFL’s innovation strategy: “The DFL Group benefits from challenges such as this one because they provide us with access to external know-how that can enhance our work. At the same time we create an attractive and unique platform for all participating teams to demonstrate their innovative capabilities. Open innovation is thus a win-win situation for everyone involved.” The three top-ranking teams in the Bundesliga Data Shootout each received a prize in the amount of USD 25,000, co-funded by Amazon Web Services (AWS).
This competition has lent new impetus to the discussion about AI-based match event recognition at DFL and Sportec Solutions. Gradually a more defined idea is emerging what a future AI-based automation solution for reliable extraction of match event data could look like. The plan is for such a solution to combine raw data from automated match event recognition with the position data available today, says Janetzke.
As for the winners, the outcome of the competition is definitely a nice achievement. “First place is always special,” smiles Pascal Pfeiffer. “And the prospect that our approach might at some point be put to practical use makes it the more thrilling.”