Around the world in 80 days, talking maps and tennis!

For those of you who follow my twitter feed you will have noticed that I’ve been traveling quite a bit lately. It’s been a busy Summer to say the least. So what’s been going on at GameSetMap? Plenty in fact…

During August I took a trip back to Australia and presented to a group of students from the School of Mathematical and Geospatial Science at my former university, RMIT. It’s always great to re-visit the place where so many seeds were planted for my career that lay ahead. And of course it’s rewarding to share your work with the students to give them a taste of what’s possible with their geo-spatial knowledge. Thanks to Gita and Lucas for having me!

RMIT University

Back where it all started at RMIT University, Melbourne.

After Australia I set off for Dresden, Germany where I presented my tennis work at the 26th International Cartographic Conference (ICC).  The conference is the premier bi-annual global cartographic/geospatial meet-up in the world, attracting over 1300 delegates. I presented my work under the stream of 4D Cartography. As we know, much of sports analytics is preformed across 4 dimensions, space (x,y,z) and time, so this was a perfect slot for my spatio-temporal tennis analysis. It was great to see a packed house in for my talk, it certainly raised a few eyebrows!

Google Glass and Tennis

Talking about wearable technology and Google Glass for tennis at ICC in Dresden.

After Germany I was invited to talk at the IE Sports Analytics Innovation Summit in Boston, MA. There were some big names on the program from Manchester United, New York Knicks, NFL, Nike and Adidas so it was humbling to share the same space with some of the big guns of the sports world. Much of the talk from the conference was about data, in particular geo-sports data, what to do with it, how to make sense of it etc. This area is about to blow up big time!

The slides from the conference can be viewed below:


Over the journey I met so many great people, and collected many new ideas! My mind is buzzing with endless possibilities. In the next few months you’ll start to see the results of these ideas come to fruition on GameSetMap.com. Stay tuned…

A 3D Lesson in Clutch Point Serving by S.Stakhovsky

The story from week one at Wimbledon was the exit of so many big name players either through defeat or injury. Rafael Nadal and Maria Sharapova were both forced to pack their bags and head home much earlier than they would have liked. As did the reigning Champion Roger Federer.

Sergiy Stakhovsky played out of his skin against the swiss mystro putting on a clinic of clutch point serving throughout the match. Sergiy was able to back-up his serving with sublime touch at the net. Sergiy won the match 6-7, 7-6, 7-5, 7-6 in just under 3 hours.

To celebrate Sergiy’s win I’ve prepared a unique 3D tennis visualization that invites you to step onto Centre Court at Wimbledon and see how Sergiy bundled out the 7 time Wimbledon Champion Roger Federer in the 2nd round.

3D Interactive Tennis Visualization

Click here to open the 3D application. (Best viewed in Google Chrome on a desktop machine). 

Sergiy served almost exclusively to Federer’s backhand at important points (37 out of 43, 86%). On 4 occasions Sergiy went to Federer’s forehand side. Of those 4 serves he aced him twice! And in the duece court he went straight at Federer’s body two times, having success half of those times.

When Federer was able to return Sergiy’s serve into play (as shown by the white lines on the map), he won 9 of 22 points (40%), while Sergiy won 13 of 22 (59%).

The visualisation only includes serves at 15-30, 30-30, 15-40, 30-40 ans 40-Ad, and all of Sergiy’s serves during each tiebreak.

The red lines on the map are aces. The green lines are where Sergiy forced Federer into a direct error on his return of serve. The white lines are serves that Federer put back in play.

The 3D map is completely interactive. Click on each line and retrieve information about when the serve was made and what the score was.

You can even add a little more realism to the scene by adding shadows to the court.

3D Tennis Visualization Shadows

Use the eye icons in the menu below to turn on/off layers in your scene.

3D Tennis Visualization Menu

To record the historic moment I have added the final score of the match, the match duration and the time the match was completed (local time) to the scoreboard!

3D Tennis Visualization Scoreboard

Spatial serve variation is thought to be a good indicator of ones serve success. However as you can see in the visualization Sergiy was not afraid of becoming predictable. I guess when you are having so much success doing one thing, why change it up right?

I hope you enjoy this immersive 3D tennis experience!

The above scene uses new HTML 5 WebGL technology, so there is no need to install a plugin to view the scenes. For more information about the City Engine viewer click here.

“OK Glass, show me Tennis Analytics”. How Google Glass will revolutionize the way we see tennis.

Early in 2012, the tech world was buzzing with the news that Google was about to release a wearable augmented reality device. Enter Google Glass.  Google Glass puts augmented reality right in front of your eyes, literally!

Sergey-Brin-Wearing-Google-GlassSergey Brin, co-founder of Google models Google Glass earlier this year.

There has been plenty of hype surrounding the product since it’s preview early last year, and we have seen examples how Google Glass can be used to take a picture, record a video, or get directions.

But what else might one do with Google Glass?

To activate Google Glass, you start by saying “OK Glass”. Then you ask Google Glass to show, do, or tell you something. So let’s give it a try:

Lets start with a simple question. “OK Glass, show me the weather forecast at the Australian Open today”

Google Glass Australian Open

Imagine sitting courtside at the Australian Open and wondering what the weather is going to be like for the afternoons play. Up pops the current weather conditions. It’s as simple as that.

Google Glass has the ability to overlay all kinds of information in your field of view. So let’s try this:

“OK Glass, show me Federer’s second shot placement”

Google Glass Federer

Imagine sitting courtside at the Cincinnati Open and wondering where Federer had previously played his second shot after Novak’s return of serve. Bam, up pops the trajectory lines of Federer’s second shot to show you where he’s likely to hit his next shot. Excited yet? Let’s try one more example.

“OK Glass, show me a stroke pattern heat map”

French Open Heat Map

Imagine sitting in the stands at court Philippe Chatrier and wondering where this player is going to hit his forehand? Google Glass immediately overlays the stroke pattern right onto the court so you can see where his shots have been passing on the court. Wow!

These images are a few quick examples that I put together to show you the potential of Google Glass in tennis. Google Glass will enhance our viewing experience of tennis (and all sports) by 10 fold! Sitting court side, we will be able to control when we see the stats, what stats we see and for how long. Whether it is seeing a live heat map, or 3D ball trajectory the potential is endless.

Of course, if tennis analytics isn’t your thing you may find Google Glass useful to find a friend in the crowd, or to video a point and share it on Facebook. You might even ask Google Glass for directions to Arthur Ashe Stadium!

The real time visualization of sports statistics and Google Glass are a match made in heaven. Let’s hope the ATP, WTA, and ITF fast track the delivery of real time tennis analytics to everyone so when Google Glass goes live, the game and our eyes will be ready!

To find out more about Google Glass visit their homepage.

Image Credits:

Sergy Brin wearing Google Glass: Copyright CBS Interactive

Australian Open: http://madamebonbon.com.au/blog/archives/7968

Roland Garros pic: http://lewebpedagogique.com/alaricenglishspeakers/the-tennis-and-roland-garros/

Cincy Tennis: https://shop.cincytennis.com/SeatViewer.aspx

Unlocking Hawk-Eye data: What it means for tennis, the ATP, WTA and ITF.

Since 2005 the governing bodies of tennis (ATP, WTA and ITF) have been collecting data using Hawk-Eye for many top-level tournaments and the Grand Slams. So what have the governing bodies been doing with this data? Where is it stored? Who owns it? Who has access to it?

Hawk-Eye WimbledonHawk-Eye was introduced to tennis in 2005. Since then, the governing bodies of tennis have been collecting valuable data about match play. Image: Hawk-Eye Innovations.

Some background

Early in 2012 I set out to start mapping tennis matches. As a Cartographer, and tennis player this kind of made sense and excited me! Tennis is a spatial game, meaning that the location of the ball and the players are linked spatially to the court. So at any time during a match we can plot where and when a stroke, or player is. The concept of mapping sports matches is not new. It has been around for some time now and is commonly referred to as Sports Analytics or Spatial Analytics. Many sports like Football (Soccer), Basketball and Baseball have been using analytics for years to explore potential unknown patterns about the game, their players and their opponent’s tactics. We have all seen Moneyball right?

To kick off my research into maps about tennis I manually plotted the ball location and player movement from the London Olympics Men’s tennis final using video footage and a 3D visualization application. The results of the research can be read here. This method of data capture was perfect at the time because it allowed me to captured the tags I needed to run my analysis on. As a result of the research I have had tennis players, coaches and other tech companies contact me wanting help analyzing their players patterns, strengths and weaknesses using similar methods as outlined in my research. Sure, I replied with over-the-top enthusiasm. But, we have to manually capture the data first, and that tends to be time-consuming and a tad laborious. So the client says, “Can’t we use Hawk-Eye?” That’s a great question I tell them, but it’s not that easy…

The search begins for Hawk-Eye data

So how would one go about getting access to this infamous Hawk-Eye data that everyone apparently everyone knows about (like its their brother), has seen on TV, but no one knows where it is or who to contact to get access to it? Go direct to Hawk-Eye?

To cut a long story short: Hawk-Eye state that they don’t own the data they capture. The tournaments do. Or do they? After spending the last 6 month trying to track down the right people in the right place at the right time I receive this response recently from Tennis Properties, the management group who runs the ATP. “Tennis Properties own all of the Hawk-Eye data from the Masters 1000 tournaments. We don’t license this data to 3rd parties”. Well at least that clears up who owns the data. But of course that wasn’t the response I had hoped for!

I then turned to Tennis Australia. I figured they might care to share some Hawk-Eye data with another Aussie. This was their response “The Hawk-Eye data is owned by our commercial/IT teams…. but it is not for use for commercial or external endeavors”. So they own their Hawk-Eye data, not Tennis Properties. Confused yet?

So my search started targeting the ATP 500 series tournaments. Tennis Properties had told me that each of these 500 series tournaments has their own agreements in place with Hawk-Eye and that the ATP does not control the data captured at these tournaments. Sounds promising right? Well it was. The team running the Swiss Indoors tournament in Basel granted me permission to all of their match data for their 2012 tournament. I was ecstatic. Finally I would be able to grow my research, and potentially help some of the pending requests from other interested parties. However, they didn’t have the Hawk-Eye data in-house (sigh). I was then directed to Hawk-Eye themselves to retrieve the data….

Swiss Indoors BaselThe Swiss Indoors at Basel granted me access to their Hawk-Eye data from their 2012 tournament.  Image: Swiss Indoors.

A further six long months has passed and I am yet to see any sight of the data from Hawk-Eye. Apparently they are too busy to attend to the request of the Swiss Indoors to release the data (grrrggh!).

Why is Hawk-Eye data so protected?

The answer is simple. The data that Hawk-Eye collects is very powerful. It collects the location of the ball and player, the spin of the ball, speed and flight of the ball (just to name a few). If the data lands in the hands of someone who can pull it apart and reveal patterns about players and opponents (that may not have been seen before) then it becomes a potential sticking point for the ATP, WTA or ITF. Or does it? Let’s take a look at this from another point of view.

Bob Kramer, the former tournament director of the Farmer’s Classic* in Los Angeles, said the technology ran at his tournament cost about $60,000-$70,000 for one court, with much of that cost going to installing the infrastructure. Now if I was a tournament director and I was spending that kind of money on new technology then I would be keen to explore ways I can recoup some of those costs. One of those ways may be selling/licensing the Hawk-Data back to its players, the media and fans. Oh but wait, the tournaments can’t do this because the ATP, WTA and ITF control the data. Or do they?

So who really owns Hawk-Eye data?

The tournaments seem to be funding the implementation of the technology (the richer tournaments like Indian Wells have more Hawk-Eye courts than say Miami) so is it their data to share and/or commercialize? Or is the data in fact the player’s data? They are the ones putting on the show; the data is about them, not the tournament. What if Roger Federer or Serena Williams wanted access to the Hawk-Eye data? How quickly would the ATP, the tournaments and Hawk-Eye react to their request? Are they permitted to even access the data?

Tennis unlike Basketball, Baseball and Football (Soccer) is an individual sport, played mostly on neutral territory (with the exception of Davis Cup). In team sports, it is the teams who are collecting the data at their home games, not the governing bodies of each sport. So where does this leave the players? Does Novak Djokovic have to bring his own data capture equipment on court to trace him movements and map his shots? Let’s hope not!

Novak DjokovicWorld number 1, Novak Djokovic may have to bring his own data capture equipment to matches to record his shot patterns and movements! Image: Reuters

What’s in it for the ATP, WTA and ITF to unlock (open) Hawk-Eye data?

Open data initiatives have been actively gaining momentum (outside of sport) as governments and private industry see the benefit of making their data freely available. Late last year however, the Manchester City Football Club (MCFC) opened up some of its match data so it could crowd source new ways of visualizing the data and encourage innovative ways of making use of it (read the Forbes article about the MCFC program here). They were essentially tapping into the crowd’s knowledge and passion for the game to better understand their players and opposing teams. If the governing bodies of tennis were to do this it would open up a unique opportunity to engage with the fans and media like never before. Tim Davies whom is an open data advocate calls this making use of “social infrastructure” that surrounds sports.  Opening up the vast of amounts of tennis match data available at a relatively low cost (or for free), would lead to third party innovation, where the next generation of tennis fans could design innovative products, which may result in a new wave of interest in tennis analytics and spawn many new products in tennis. Imagine what IBM could do with data, or anyone else that has an interest in commenting and reporting on the game? Imagine the maps and graphics that the tournaments could supply to the pressroom at the end of the day to help report on the days play!

Opening data can be scary (but it’s time to be brave!)

Opening up your data to the whole world can seem scary at first. There is no doubt the ATP, WTA and ITF will have reservations about doing so. But think of the increased two-way interaction, between the innovators and the data suppliers. Perhaps Hawk-Eye data can be extended way beyond what it is currently being used for? Perhaps there is a revenue stream back to the tournaments that may offset their cost of installing the technology. The data may even be turned into physical products, like artwork for Nike’s next Rafael Nadal t-shirt! Who knows? History has shown that opening up data is not in fact scary, it is incredibly exciting and the possibilities appear endless.

Andy Murray Tennis ArtAndy Murray poses in front of ‘tennis art’ at the O2 Arena in London last year. Andy created the unique portrait of himself that was auctioned off for charity late last year.

Natural Evolution for Tennis

Unlocking Hawk-Eye data is a natural evolution for tennis. As pressure builds on the ATP, WTA and ITF to-be-seen-to-be-keeping up with other sports, perhaps the locks will come off the data. At present, only the TV broadcasters and national tennis associations appear to have a key to the data. Sadly, there is a very valuable stockpile of data gathering dust on some internal server at Hawk-Eye with no use for it all! Of course you might get lucky and be granted access to a portion of that data but fail to ever see it! It will only take one of the ‘next gen’ of players, like a Sloan Stevens or Milos Raonic who understand what modern analytics can do for their game, or one commentator (hint hint, Justin Gimelstob) to lean hard on the governing bodies to move this issue in the right direction. Imagine how powerful the ATP FedEx Reliability Stats could be if they integrated space into their stats by using Hawk-Eye data! Let’s hope that happens quickly. Then we can sit back and watch it open up a whole new world of tennis analytics, third party products and applications that will benefit the players, tournaments, the fans, the media and most of all the great game of tennis itself!

 * The Farmers Classic will not be returning to the ATP circuit in 2013. After 86 years, and being the longest running annual professional sporting event in Los Angeles, it ran its last event in 2012.

 

Using spatial analytics to study spatio-temporal patterns in tennis

Late last year I introduced ArcGIS users to sports analytics, an emerging and exciting field within the GIS industry. Using ArcGIS for sports analytics can be read here. Recently I expanded the work by using a number of spatial analysis tools in ArcGIS to study the spatial variation of serve patterns from the London Olympics Gold Medal match played between Roger Federer and Andy Murray. In this blog I present results that suggest there is potential to better understand players serve tendencies using spatio-temporal analysis.

The full research paper, and an in depth discussion about the importance of understanding space-time relationships in sport can be read here.

Figure 1: Igniting further exploration using visual analytics. Created in ArcScene, this 3D visualization depicts the effectiveness of Murray’s return in each rally and what effect it had on Federer’s second shot after his serve. (click to enlarge)

The Most Important Shot in Tennis?

The serve is arguably the most important shot in tennis. The location and predictability of a players serve has a big influence on their overall winning serve percentage. A player is who is unpredictable with their serve, and can consistently place their serve wide into the service box, at the body or down the T is more likely to either win a point outright, or at least weaken their opponent’s return [1].

The results of tennis matches are often determined by a small number of important points during the game. It is common to see a player win a match who has won the same number of points as his opponent. The scoring system in tennis also makes it possible for a player to win fewer points than his opponent yet win the match [2]. Winning these big points is critical to a player’s success. For the player serving, their aim is to produce an ace or, force their opponent into an outright error, as this could make the difference between winning and losing. It is of particular interest to coaches and players to know the success of players serve at these big points.

Geospatial Analysis

In order to demonstrate the effectiveness of geo-visualizing spatio-temporal data using GIS we conducted a case study to determine the following: Which player served with more spatio-temporal variation at important points during the match?

To find out where each player served during the match we plotted the x,y coordinate of the serve bounce. A total of 86 points were mapped for Murray, and 78 for Federer. Only serves that landed in were included in the analysis.  Visually we could see clusters formed by wide serves, serves into the body and serves hit down the T. The K Means algorithm [3] in the Grouping Analysis tool in ArcGIS (Figure 2) enabled us to statically replicate the characteristics of the visual clusters. It enabled us to tag each point as either a wide serve, serve into the body or serve down the T. The organisation of the serves into each group was based on the direction of serve. Using the serve direction allowed us to know which service box the points belong to. Direction gave us an advantage over proximity as this would have grouped points in neighbouring service boxes.

Figure 2. The K Means algorithm in the Grouping Analysis tool in ArcGIS groups features based on attributes and optional spatial temporal constraints. 

To determine who changed the location of their serve the most we arranged the serve bounces into a temporal sequence by ranking the data according to the side of the net (left or right), by court location (deuce or ad court), game number and point number. The sequence of bounces then allowed us to create Euclidean lines (Figure 3) between p1 (x1,y1) and p2 (x2,y2), p2 (x2,y2) and p3 (x3,y3), p3 (x3,y3) and p(x4,y4) etc in each court location. It is possible to determine, with greater spatial variation, who was the more predictable server using the mean Euclidean distance between each serve location. For example, a player who served to the same part of the court each time would exhibit a smaller mean Euclidean distance than a player who frequently changed the position of their serve. The mean Euclidean distance was calculated by summing all of the distances linking the sequence of serves in each service box divided by the total number of distances.

Figure 3. Calculating the Euclidean distance (shortest path) between two sequential serve locations to identify spatial variation within a player’s serve pattern.

To identify where a player served at key points in the match we assigned an importance value to each point based on the work by Morris [4]. The table in Figure 4 shows the importance of points to winning a game, when a server has 0.62 probability of winning a point on serve. This shows the two most important points in tennis are 30-40 and 40-Ad, highlighted in dark red. To simplify the rankings we grouped the data into three classes, as shown in Figure 4.

Figure 4. The importance of points in a tennis match as defined by Morris. The data for the match was classified into 3 categories as indicated by the sequential colour scheme in the table (dark red, medium red and light red).

In order see a relationship between outright success on a serve at the important points we mapped the distribution of successful serves and overlaid the results onto a layer containing the important points. If the player returning the serve made an error directly on their return, then this was deemed to be an outright success for the player. An ace was also deemed to be an outright success for the server.

Results

Federer’s spatial serve cluster in the ad court on the left side of the net was the most spread of all his clusters. However, he served out wide with great accuracy into the deuce court on the left side of the net by hugging the line 9 times out 10 (Figure 5). Murray’s clusters appeared to be grouped overall more tightly in each of the service boxes. He showed a clear bias by serving down the T in the deuce court on the right side of the net. Visually there appeared to be no other significant differences between each player’s patterns of serve.

Figure 5. Mapping the spatial serve clusters using the K Means Algorithm. Serves are grouped according to the direction they were hit. The direction of each serve is indicated by the thin green trajectory lines.  The direction of serve was used to statistically group similar serve locations.  (click to enlarge)

By mapping the location of the players serve bounces and grouping them into spatial serve clusters we were able to quickly identify where in the service box each player was hitting their serves. The spatial serve clusters, wide, body or T were symbolized using a unique color, making it easier for the user to identify each group on the map. To give the location of each serve some context we added the trajectory (direction) lines for each serve. These lines helped link where the serve was hit from to where the serve landed. They help enhance the visual structure of each cluster and improve the visual summary of the serve patterns.

The Euclidean distance calculations showed Federer’s mean distance between sequential serve bounces was 1.72 m (5.64 ft), whereas Murray’s mean Euclidean distance was 1.45 m (4.76 ft). These results suggest that Federer’s serve had greater spatial variation than Murray’s. Visually, we could detect that the network of Federer’s Euclidean lines showed a greater spread than Murray’s in each service box. Murray served with more variation than Federer in only one service box, the ad service box on the right side of the net.

Figure 6. A comparison of spatial serve variation between each player. Federer’s mean Euclidean distance was 1.72m (5.64 ft) –  Murrray’s was 1.45m (4.76 ft). The results suggest that Federer’s serve had greater spatial variation than Murray’sThe lines of connectivity represent the Euclidean distance (shortest path) between each sequential service bounce in each service box.  (click to enlarge)

The directional arrows in Figure 6 allow us to visually follow the temporal sequence of serves from each player in any given service box. We have maintained the colors for each spatial serve cluster (wide, body, T) so you can see when a player served from one group into another.

At the most important points in each game (30-40 and 40-Ad), Murray served out wide targeting Federer’s backhand 7 times out of 8 (88%). He had success doing this 38% of the time, drawing 3 outright errors from Federer. Federer mixed up the location of his 4 serves at the big points across all of the spatial serve clusters, 2 wide, 1 body and 1 T. He had success 25% of the time drawing 1 outright error from Murray.  At other less important points Murray tended to favour going down the T, while Federer continued his trend spreading his serve evenly across all spatial serve clusters (Figure 7).

The proportional symbols in Figure 7 indicate a level of importance for each serve. The larger circles represent the most important points in each game – the smallest circles the least important. The ticks represent the success of each serve. By overlaying the ticks on-top of the graduated circles we can clearly see a relationship between the success at big points on serve. The map also indicates where each player served.

Figure 7. A proportional symbol map showing the relationship of where each player served at big points during the match, and their outright success at those points.  (click to enlarge)

The results suggest that Murray served with more spatial variation across the two most important point categories, recording a mean Euclidean distance of 1.73 m (5.68 ft) to Federer’s 1.64 m (5.38 ft).

Conclusion

Successfully identifying patterns of behavior in sport in an on-going area of work [5] (see figure 8), be that in tennis, football or basketball. The examples in this blog show that GIS can provide an effective means to geovisualize spatio-temporal sports data, in order to reveal potential new patterns within a tennis match. By incorporating space-time into our analysis we were able to focus on relationships between events in the match, not the individual events themselves. The results of our analysis were presented using maps. These visualizations function as a convenient and comprehensive way to display the results, as well as acting as an inventory for the spatio-temporal component of the match [6].

Figure 8. The heatmap above shows Federer’s frequency of shots passing through a given point on the court. The map displays stroke paths from both ends of the court, including serves. The heat map can be used to study potential anomalies in the data that may result in further analysis.  (click to enlarge)

Expanding the scope of geospatial research in tennis, and other sports relies on open access to reliable spatial data.  At present, such data is not publically available from the governing bodies of tennis. An integrated approach with these organizations, players, coaches, and sports scientists would allow for further validation and development of geospatial analytics for tennis. The aim of this research is to evoke a new wave of geospatial analytics in the game of tennis and across other sports. Furthermore, to encourage statistics published on tennis to become more time and space aware to better improve the understanding of the game, for everyone.

References

[1] United States Tennis Association, “Tennis tactics, winning patterns of play”, Human Kinetics, 1st Edition, 1996.

[2] G. E. Parker, “Percentage Play in Tennis”, In Mathematics and Sports Theme Articles, http://www.mathaware.org/mam/2010/essays/

[3] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, No. 1, pp. 100-108, 1979.

[4] C. Morris, “The most important points in tennis”, In Optimal Strategies in Sports, vol 5 in Studies and Management Science and Systems, , North-Holland Publishing, Amsterdam, pp. 131-140, 1977.

[5] M. Lames, “Modeling the interaction in games sports – relative phase and moving correlations”, Journal of Sports Science and Medicine, vol 5, pp. 556-560, 2006.

 [6] J. Bertin, “Semiology of Graphics: Diagrams, Networks, Maps”, Esri Press, 2nd Edition, 2010.

 

Using ArcGIS for sports analytics

The statistical component of sport has always provided a fascinating way to analyze performance and success. This might simply be the final score, but for some sports, such as football, baseball, cricket, golf and tennis, meaningful analysis of every facet of the game and a player or team’s actions is part of the essence of the game itself. It is as common to see statistics and graphical summaries of the action reported as it is to see the action itself and this provides a fascinating insight into strategy as well as an explanation of outcome. In this blog entry we explore the results of the London Olympics Gold Medal tennis match between Roger Federer and Andy Murray to show how you can use GIS to identify particular patterns within the match that may not have been exposed by using traditional non-geographical analysis and display techniques.

Created using ArcGIS, figure 1 shows the location of where each player played a winning shot and their movement during every point of the gold medal match.

Figure 1: An infographic showing the player movement and winning shot positions from the Olympic Gold Medal Match between Roger Federer and Andy Murray.

Whilst figure 1 certainly carries a lot of visual impact it doesn’t actually tell us a whole lot. The player movement lines overlap one another and make it hard to distinguish which line relates to which point. We cannot tell the direction of movement in many cases because there are no directional arrows. The infographic also doesn’t show where the winning stroke landed, or the direction of the shot. It also fails to show the temporal component of the match.

Figure 2: The complete data set from the Olympic Gold Medal Match. 1708 point locations were collected from the 3 set match

Capturing the data

For the study we captured the  tennis match data using ArcScene 10.1 and video footage of the match (see figure 3). We built a court at a scale of 1:1 in its correct geographic location (center court at Wimbledon) and were able to quickly capture the location of each player’s stroke and corresponding ball bounce for the match entirely from the video footage. At each location we collected a set of key attributes like who played the stroke, what type of stroke it was, the stroke number, point number, game number, set number, who was serving etc. The data captured provides a statistical summary of every shot in the match.

Figure 3: Video footage of the match in ArcScene. The red dots represent the player’s stroke position and ball bounce. The green lines represent the direction of ball travel for each shot.

By using ArcScene we were able to plot the player’s position and ball bounces to within +/-20cm using the 3D editing tools. We approximated the camera angle of the video footage and set our data view to match. This made the data capture process rapid and increased accuracy, compared to a 2D environment, because we were able to continuously match the changing camera view in the video by using the Navigate Scene control in ArcScene. This also helped us counter the scale distortion in the camera view when capturing points at the end furthest from the camera.

Once all of the point data was captured, we used the XY To Line tool to create connectivity between the points using the shot, point, game and set number attributes. The lines are instrumental in allowing us to visualize stroke patterns (as you will see later in the blog entry). We ran the same XY To Line process to create player movement lines.

Visualising the data

Statistics from the match tell us that Andy Murray made a total of 18 winners to Roger Federer’s 13. What these statistics don’t tell us is where those winners occurred, the stroke of each winner, when the winner occurred and what led to the winning shot occurring. They also fail to show us any potential stroke patterns during the match. By capturing and storing all of the match data in a file geodatabase (figure 4) we are able to take advantage of the geo-location of these winners and create some interesting visualizations to tell a far more interesting story than single snapshots allow.

Figure 4: Using a file geodatabase to store sports data in ArcGIS

One of the challenges in dealing with sports data is that there are many instances of similar events occurring at the same or similar locations over relative small periods of time. This often results in very tight clusters of points over very small areas of your court, pitch or field. If your data has an element of connectivity, you will additionally have overlapping lines along similar bearings and distances or lines that run in completely random directions, depending on the type of sport you are analyzing. This provides us with an interesting challenge of how to represent and compare this information meaningfully.

One way to make sense of so many overlapping points and lines is to use a visualization technique (often promoted by Edward Tufte) called Small Multiples (see figure 5). Small multiples use a series of common basemaps (in our case a tennis court) with different slices of data on top of each map. The maps are arranged in a logical sequence, much like animated movie frames. Small multiples are useful to disaggregate your data, reducing the visual complexity and quantity of information so that it can more easily be seen and interpreted.

Figure 5: Andy Murray’s winning three shot sequence visualized using small multiples. The green lines represent the forehand winning strokes and the blue lines, the backhand winning strokes.

Figure 5 allows us to very quickly see some important patterns from the match that were not visible using traditional tabular statistics. The most immediate pattern observed is the direction of each winning shot (half of Murray’s backhands were down-the-line winners). You can also quickly identify the position of where the player made the winning shot (half of Murray’s shots were made deep inside the court, near or around the service line) and the type of shot that was played (Murray’s number of forehands to backhands ratio was 10 to 8). Temporally, we can see that 7 of Murray’s winners were made on game point, either for or against him. Figure 6 illustrates the amount of information each small multiple illustrates and, therefore, the potential for recognition of patterns across a game or match.

Figure 6: An explanation of the variables being mapped in the small multiples matrix

Each individual image presents a second level of visual information that is likely to suit coaches, players or die-hard fans who want to know a little more about the game’s pattern of play than maybe your average tennis fan or someone scanning the morning news. We have added some important temporal labels to the images to help users identify when the winning shot occurred, we have varied the colour and lineweight of lines in each image to reflect a level of importance and distinguish between line classes.  Each stroke location is dynamically labeled from the stroke field in our file geodatabase, as is the sequence number. The player movement lines show us where the player has run from to make the winning shot. In 6 of Andy Murray’s winners, he moved a considerable distance across the court to make the winning shot. The player movement lines also allow us to see the previous one or two strokes without actually showing the stroke lines on the map.

You will notice we are only showing the two shots prior to the winning shot being made. We are mapping the ‘set-up‘ stroke (point 1), the opponents returning stroke (point 2) and the winning stroke (point 3). Showing more than two lead up strokes prior to the winning shot can cause confusion and potential distraction to the user (figure 7).

Figure 7: The image on the left displays all of the strokes (14 in total) leading up to the 4th winning shot. The image on the right displays only two shots leading up to the winning shot.

Some generalization is needed to ensure you don’t overwhelm the user with information. Finding the correct balance of generalization is one aspect of the research that we are continuing to explore. Trying to determine how many events, and what type of events led to a particular event happening is incredibly dynamic and problematic so it is vital erroneous assumptions aren’t introduced during generalization.

In order for the small multiples to work better in sequence we rotated the data frame of each image using the Data Frame tools in ArcGIS. This allowed us to map all Murray’s shots from one end and Federer’s from another which enabled clearer patterns in the match to be seen. Whilst it was suitable in this instance to shift all of a players strokes to one end for visualization, in some cases this might not be suitable if, for instance, there were particular weather conditions that made play at one end more challenging.  In this situation, being able to assess how different players react to different conditions might be an important component of the pattern of the match itself.

Having already explored Murray’s winning shot sequence, let’s take a quick look at Federer’s three stroke winning pattern in figure 8, below.

Figure 8: Roger Federer’s winning three shot sequence. The green lines represent the forehand winning stroke and the blue lines, the backhand winning stroke.

Federer made only two winners on his backhand side (indicated by the blue lines) and 10 out of his 13 winners came directly from the result of moving his opponent off the court from a wide serve, leaving an open court for Federer to hit an easy winner into. His two backhand winners were both struck with little or no room for error. These two shots could have easily missed the mark, leaving Federer only 11 winners from 3 sets of tennis, all from the forehand side. Five of Federer’s 13 winners came at either game point against or for him.

The small multiple format was perfect for this type of analysis. We were able to present a series of events over time in a logical, clear and concise manner. The two examples of gameplay explored in this blog entry show how powerful representing the results of sports data in a graphic form can be using GIS. By glancing at the images you take more away from the data than you would by simply seeing the totals of each winner in tabular form. By exploring them in detail we are able to reveal dimensions in the points, games and match that are simply impossible to gauge from other approaches. We are currently working on ways to animate particular scenes and looking into applications that serve the data up in an online environment giving users the ability to query the map for themselves and run their own analysis on the data.

Sport Analytics is a growing field, but currently a less frequented field in the world of GIS. Some of the worlds largest sporting organizations like Manchester City, Adidas, Nike and leagues like the EPL, NBA and AFL and are capturing every movement their players make and recording their actions. The challenge is to understand the best way to present this data to the players, coaches, media and fans.