Using ArcGIS for sports analytics

The statistical component of sport has always provided a fascinating way to analyze performance and success. This might simply be the final score, but for some sports, such as football, baseball, cricket, golf and tennis, meaningful analysis of every facet of the game and a player or team’s actions is part of the essence of the game itself. It is as common to see statistics and graphical summaries of the action reported as it is to see the action itself and this provides a fascinating insight into strategy as well as an explanation of outcome. In this blog entry we explore the results of the London Olympics Gold Medal tennis match between Roger Federer and Andy Murray to show how you can use GIS to identify particular patterns within the match that may not have been exposed by using traditional non-geographical analysis and display techniques.

Created using ArcGIS, figure 1 shows the location of where each player played a winning shot and their movement during every point of the gold medal match.

Figure 1: An infographic showing the player movement and winning shot positions from the Olympic Gold Medal Match between Roger Federer and Andy Murray.

Whilst figure 1 certainly carries a lot of visual impact it doesn’t actually tell us a whole lot. The player movement lines overlap one another and make it hard to distinguish which line relates to which point. We cannot tell the direction of movement in many cases because there are no directional arrows. The infographic also doesn’t show where the winning stroke landed, or the direction of the shot. It also fails to show the temporal component of the match.

Figure 2: The complete data set from the Olympic Gold Medal Match. 1708 point locations were collected from the 3 set match

Capturing the data

For the study we captured the  tennis match data using ArcScene 10.1 and video footage of the match (see figure 3). We built a court at a scale of 1:1 in its correct geographic location (center court at Wimbledon) and were able to quickly capture the location of each player’s stroke and corresponding ball bounce for the match entirely from the video footage. At each location we collected a set of key attributes like who played the stroke, what type of stroke it was, the stroke number, point number, game number, set number, who was serving etc. The data captured provides a statistical summary of every shot in the match.

Figure 3: Video footage of the match in ArcScene. The red dots represent the player’s stroke position and ball bounce. The green lines represent the direction of ball travel for each shot.

By using ArcScene we were able to plot the player’s position and ball bounces to within +/-20cm using the 3D editing tools. We approximated the camera angle of the video footage and set our data view to match. This made the data capture process rapid and increased accuracy, compared to a 2D environment, because we were able to continuously match the changing camera view in the video by using the Navigate Scene control in ArcScene. This also helped us counter the scale distortion in the camera view when capturing points at the end furthest from the camera.

Once all of the point data was captured, we used the XY To Line tool to create connectivity between the points using the shot, point, game and set number attributes. The lines are instrumental in allowing us to visualize stroke patterns (as you will see later in the blog entry). We ran the same XY To Line process to create player movement lines.

Visualising the data

Statistics from the match tell us that Andy Murray made a total of 18 winners to Roger Federer’s 13. What these statistics don’t tell us is where those winners occurred, the stroke of each winner, when the winner occurred and what led to the winning shot occurring. They also fail to show us any potential stroke patterns during the match. By capturing and storing all of the match data in a file geodatabase (figure 4) we are able to take advantage of the geo-location of these winners and create some interesting visualizations to tell a far more interesting story than single snapshots allow.

Figure 4: Using a file geodatabase to store sports data in ArcGIS

One of the challenges in dealing with sports data is that there are many instances of similar events occurring at the same or similar locations over relative small periods of time. This often results in very tight clusters of points over very small areas of your court, pitch or field. If your data has an element of connectivity, you will additionally have overlapping lines along similar bearings and distances or lines that run in completely random directions, depending on the type of sport you are analyzing. This provides us with an interesting challenge of how to represent and compare this information meaningfully.

One way to make sense of so many overlapping points and lines is to use a visualization technique (often promoted by Edward Tufte) called Small Multiples (see figure 5). Small multiples use a series of common basemaps (in our case a tennis court) with different slices of data on top of each map. The maps are arranged in a logical sequence, much like animated movie frames. Small multiples are useful to disaggregate your data, reducing the visual complexity and quantity of information so that it can more easily be seen and interpreted.

Figure 5: Andy Murray’s winning three shot sequence visualized using small multiples. The green lines represent the forehand winning strokes and the blue lines, the backhand winning strokes.

Figure 5 allows us to very quickly see some important patterns from the match that were not visible using traditional tabular statistics. The most immediate pattern observed is the direction of each winning shot (half of Murray’s backhands were down-the-line winners). You can also quickly identify the position of where the player made the winning shot (half of Murray’s shots were made deep inside the court, near or around the service line) and the type of shot that was played (Murray’s number of forehands to backhands ratio was 10 to 8). Temporally, we can see that 7 of Murray’s winners were made on game point, either for or against him. Figure 6 illustrates the amount of information each small multiple illustrates and, therefore, the potential for recognition of patterns across a game or match.

Figure 6: An explanation of the variables being mapped in the small multiples matrix

Each individual image presents a second level of visual information that is likely to suit coaches, players or die-hard fans who want to know a little more about the game’s pattern of play than maybe your average tennis fan or someone scanning the morning news. We have added some important temporal labels to the images to help users identify when the winning shot occurred, we have varied the colour and lineweight of lines in each image to reflect a level of importance and distinguish between line classes.  Each stroke location is dynamically labeled from the stroke field in our file geodatabase, as is the sequence number. The player movement lines show us where the player has run from to make the winning shot. In 6 of Andy Murray’s winners, he moved a considerable distance across the court to make the winning shot. The player movement lines also allow us to see the previous one or two strokes without actually showing the stroke lines on the map.

You will notice we are only showing the two shots prior to the winning shot being made. We are mapping the ‘set-up‘ stroke (point 1), the opponents returning stroke (point 2) and the winning stroke (point 3). Showing more than two lead up strokes prior to the winning shot can cause confusion and potential distraction to the user (figure 7).

Figure 7: The image on the left displays all of the strokes (14 in total) leading up to the 4th winning shot. The image on the right displays only two shots leading up to the winning shot.

Some generalization is needed to ensure you don’t overwhelm the user with information. Finding the correct balance of generalization is one aspect of the research that we are continuing to explore. Trying to determine how many events, and what type of events led to a particular event happening is incredibly dynamic and problematic so it is vital erroneous assumptions aren’t introduced during generalization.

In order for the small multiples to work better in sequence we rotated the data frame of each image using the Data Frame tools in ArcGIS. This allowed us to map all Murray’s shots from one end and Federer’s from another which enabled clearer patterns in the match to be seen. Whilst it was suitable in this instance to shift all of a players strokes to one end for visualization, in some cases this might not be suitable if, for instance, there were particular weather conditions that made play at one end more challenging.  In this situation, being able to assess how different players react to different conditions might be an important component of the pattern of the match itself.

Having already explored Murray’s winning shot sequence, let’s take a quick look at Federer’s three stroke winning pattern in figure 8, below.

Figure 8: Roger Federer’s winning three shot sequence. The green lines represent the forehand winning stroke and the blue lines, the backhand winning stroke.

Federer made only two winners on his backhand side (indicated by the blue lines) and 10 out of his 13 winners came directly from the result of moving his opponent off the court from a wide serve, leaving an open court for Federer to hit an easy winner into. His two backhand winners were both struck with little or no room for error. These two shots could have easily missed the mark, leaving Federer only 11 winners from 3 sets of tennis, all from the forehand side. Five of Federer’s 13 winners came at either game point against or for him.

The small multiple format was perfect for this type of analysis. We were able to present a series of events over time in a logical, clear and concise manner. The two examples of gameplay explored in this blog entry show how powerful representing the results of sports data in a graphic form can be using GIS. By glancing at the images you take more away from the data than you would by simply seeing the totals of each winner in tabular form. By exploring them in detail we are able to reveal dimensions in the points, games and match that are simply impossible to gauge from other approaches. We are currently working on ways to animate particular scenes and looking into applications that serve the data up in an online environment giving users the ability to query the map for themselves and run their own analysis on the data.

Sport Analytics is a growing field, but currently a less frequented field in the world of GIS. Some of the worlds largest sporting organizations like Manchester City, Adidas, Nike and leagues like the EPL, NBA and AFL and are capturing every movement their players make and recording their actions. The challenge is to understand the best way to present this data to the players, coaches, media and fans.