Pinpointing the serve. Who’s better? The Big Boys or the School Boys?

(Part 1 of 3)

We have all been there, standing on the baseline when the coach places three cones in each service box and says “There’s your target, if you hit the cones you’ll get a free can of drink”.  If you were like me, you rarely hit the cone, and if you did, it was more luck than anything else!

Coaches have been using these types of serving drills for many years. Why? Well, in order to develop a successful serve, you need to practice the placement of your serve. In the USTA book titled Tennis Tactics, Winning Patterns of Play, drill 4.2 (p 45) outlines four target zones in each service court to aim for (see Figure 1).  It is in these zones where coaches place their cones to improve the serve placement of their players (and give away free drinks!).

USTA Target Serve Zones

Figure 1. The four recommended serve target zones in each service court as recommended by the USTA. Down the T (T1), a body serve (T2), a wide serve (T3) and short-ish out-wide serve (T4). Source: Tennis Tactics, Winning Patterns of Play, USTA.

Given the continuous emphasis on serve placement I set out to run a simple analysis to see who was the more ‘accurate’ server, the Big Boys (professional players) or the School Boys (college level players)? Included in this analysis are Roger Federer and Andy Murray representing the Big Boys, and the School Boys (whom shall remain nameless) are from the NCAA Division 1 tennis competition.

Some Context:

  • Murray defeated Federer: 6-2, 6-1, 6-4
  • School Boy A defeated School Boy B: 6-1, 6-1

Total number of serves hit by each player:

School Boy A: 58   School Boy B: 54   Federer: 95   Murray: 111

Total number of serves hit IN:

School Boy A: 44 (76%)   School Boy B: 45 (83%)   Federer: 78 (82%)   Murray: 86 (77%)

Total number of serves hit OUT:

School Boy A: 14 (24%)   School Boy B: 9 (17%)   Federer: 17 (18%)   Murray: 25 (23%)

In order to determine which player landed the highest percentage of balls in the four USTA zones (and therefore could claim they were the most accurate server!) I ran a simple select by location algorithm between each serve bounce and the four target zones in each service court. This enabled me to very simply return a count of how many balls landed in each box, for each player. Figure 2 shows the results of the selection.

PinpointingYourServeFig1

Figure 2. The percentage of serves that landed in the USTA defined target zones for each player.

Surprised? Most of us would expect the Big Boys to place a higher percentage of their serves in the target zones than the School Boys right? However the results showed that School Boy A landed 15 out his 58 (26%) serves into the target zones, making him arguably the most ‘accurate’ server of the four players. School Boy B closely followed with 12 out 58 (22%). Murray was next up, landing 23 of 111 (21%) serves into the boxes, while Federer brought up the rear with only 16 out of his 95 (17%) serves landing in the boxes.

Accuracy: If we loosely define accuracy as being how close a measured value is to an actual value, where the actual value are the USTA target zones, then we can with some caution claim the School Boys out served the Big Boys in the accuracy department. Hard to believe I know.

But wait a minute, what if the Big Boys weren’t actually aiming for the USTA target zones, and instead were aiming outside of those zones? Perhaps they were aiming for the lines, which are outside the USTA defined target areas but still legally within the service court? What would the results look like if we extended the target zones further towards the lines? Let’s see…

Playing the Lines

You could argue that the service line is the optimum position for the placement of your serve, and that the corners of each service box are the ultimate targets. However targeting the lines brings a higher degree of risk, and a lower margin or error. Which is why coaches & the USTA don’t recommend us amateurs to go-for these targets every time! However at the top level where the Big Boys play, where there is so much on the line and so little margin for error (in all facets of the game) they are more likely to take the risk. By sending their serves as close to the lines as possible they give themselves a greater chance of setting up the point in their favor. We would also expect that they are more likely to consistently execute a higher level of accuracy, given their higher-level skill set. We shall see…

In order to test this I added two more 12.5cm (4.7 inch) wide target zones around the original USTA target zones. I call these Medium and High risk zones, where the High risk zone abuts and includes the service lines. By running the selection again using these two extra zones we will see who is taking the risk and pushing their serve towards the lines more, the School Boys or the Big Boys?

PinpointingYourServeFig2

Figure 3. The percentage of serves that landed in the two additional High and Medium risk serve zones for each player. The width of each additional zone is 12.5cm (4.7 inches) (roughly twice the width of a tennis ball). In the second part of this blog we will see the spatial spread of serves across all target zones and all services boxes.

Figure 3 starts to tell a different story. By moving the target Federer was now clearly winning the most accurate server competition, landing 13 (14%) serves in the medium risk zone, and 18 (19%) in the high-risk zone. Murray’s success in these zones was a littler lower than Federer, with 10 (9%) for the medium risk zone, and 13 (12%) in the high-risk zone. School Boy A scored, 3 (4%) in the medium risk zone, and 5 (13%) in the high risk zone, while School Boy B scored, 2 (5%) and 7 (8%).

Clearly Federer was able to consistently pop more serves in the high-risk zones than any of the other three players. This would suggest that the Fed is arguably the most accurate server of the bunch? Most commentators of the game are unlikely to argue with that statement, but of course it depends on where the target is and where the players are aiming! School Boy A has every right to claim he is the most accurate server given he landed the highest proportion of his serves in the USTA target zones.

Some Further Ponderings

Given that each of the four USTA target zones in each service box are roughly 0.75m (2.46 ft) square I am surprised that the Big Boys are not landing a higher percentage of serves in these areas. No disrespect to the School Boys, they aren’t playing NCAA Level 1 tennis for no reason, but I expected the professional players to have a higher percentage of serves land in the target zones than the School Boys. I also expected Federer and Murray to land more serves in the higher risk zones. The results showed this was partly the case. Murray’s numbers in these zones are a little surprising given he swept aside Federer in straight sets on that day.

Perhaps at the highest level, simply aiming your serve at the USTA zones is not enough. Maybe the margin is too great. And in doing so you make life a little too easy for the returner?

So why do the School Boys have such a high percentage of serves in the USTA zones (compared to the Big Boys)? Is it because they serve with less speed and spin, therefore allowing them to slow things down and hit the ‘safe’ targets? Perhaps at this level, the players are taught to play the percentages? Perhaps their skill level forces them to do so?

The School Boys will no doubt develop their serving skills, and pop more serve speed and aggressive ‘kick’ on the ball as they mature. Being able to maintain that accuracy as they increase their serve speed and spin will be on ongoing player development challenge.

It is worth noting that each School Boy in the study served just over 50 times in their match, less than half that of Federer and Murray. Would they be able to maintain their high serve percentage into the USTA zones over a longer match where they may be required to serve 100+ serves? Would we see the same consistency, or could we expect it to see it drop off?

So what do these figures mean, if anything? What if I miss the USTA zones by a ball width or two? Am I still an accurate server? What if I’m only a little bit too short, or a little bit too central to the service box on my serve? Will I still win the same number of points if I’m a few centimeters or inches wide of the mark?

In part 2…

In the second part of this three-part blog I will endeavor to determine if there is a positive relationship between serve position, and outright success. I’ll explore if it’s possible to determine if the game of serving is really about a few centimeters or inches here and there? And in part 3 we will answer the most important question of all, who takes home the most free drinks!

Note: This study only looked at a very small sample of data from all players, so we need to be careful about making gross assumption based on the findings.

Using spatial analytics to study spatio-temporal patterns in tennis

Late last year I introduced ArcGIS users to sports analytics, an emerging and exciting field within the GIS industry. Using ArcGIS for sports analytics can be read here. Recently I expanded the work by using a number of spatial analysis tools in ArcGIS to study the spatial variation of serve patterns from the London Olympics Gold Medal match played between Roger Federer and Andy Murray. In this blog I present results that suggest there is potential to better understand players serve tendencies using spatio-temporal analysis.

The full research paper, and an in depth discussion about the importance of understanding space-time relationships in sport can be read here.

Figure 1: Igniting further exploration using visual analytics. Created in ArcScene, this 3D visualization depicts the effectiveness of Murray’s return in each rally and what effect it had on Federer’s second shot after his serve. (click to enlarge)

The Most Important Shot in Tennis?

The serve is arguably the most important shot in tennis. The location and predictability of a players serve has a big influence on their overall winning serve percentage. A player is who is unpredictable with their serve, and can consistently place their serve wide into the service box, at the body or down the T is more likely to either win a point outright, or at least weaken their opponent’s return [1].

The results of tennis matches are often determined by a small number of important points during the game. It is common to see a player win a match who has won the same number of points as his opponent. The scoring system in tennis also makes it possible for a player to win fewer points than his opponent yet win the match [2]. Winning these big points is critical to a player’s success. For the player serving, their aim is to produce an ace or, force their opponent into an outright error, as this could make the difference between winning and losing. It is of particular interest to coaches and players to know the success of players serve at these big points.

Geospatial Analysis

In order to demonstrate the effectiveness of geo-visualizing spatio-temporal data using GIS we conducted a case study to determine the following: Which player served with more spatio-temporal variation at important points during the match?

To find out where each player served during the match we plotted the x,y coordinate of the serve bounce. A total of 86 points were mapped for Murray, and 78 for Federer. Only serves that landed in were included in the analysis.  Visually we could see clusters formed by wide serves, serves into the body and serves hit down the T. The K Means algorithm [3] in the Grouping Analysis tool in ArcGIS (Figure 2) enabled us to statically replicate the characteristics of the visual clusters. It enabled us to tag each point as either a wide serve, serve into the body or serve down the T. The organisation of the serves into each group was based on the direction of serve. Using the serve direction allowed us to know which service box the points belong to. Direction gave us an advantage over proximity as this would have grouped points in neighbouring service boxes.

Figure 2. The K Means algorithm in the Grouping Analysis tool in ArcGIS groups features based on attributes and optional spatial temporal constraints. 

To determine who changed the location of their serve the most we arranged the serve bounces into a temporal sequence by ranking the data according to the side of the net (left or right), by court location (deuce or ad court), game number and point number. The sequence of bounces then allowed us to create Euclidean lines (Figure 3) between p1 (x1,y1) and p2 (x2,y2), p2 (x2,y2) and p3 (x3,y3), p3 (x3,y3) and p(x4,y4) etc in each court location. It is possible to determine, with greater spatial variation, who was the more predictable server using the mean Euclidean distance between each serve location. For example, a player who served to the same part of the court each time would exhibit a smaller mean Euclidean distance than a player who frequently changed the position of their serve. The mean Euclidean distance was calculated by summing all of the distances linking the sequence of serves in each service box divided by the total number of distances.

Figure 3. Calculating the Euclidean distance (shortest path) between two sequential serve locations to identify spatial variation within a player’s serve pattern.

To identify where a player served at key points in the match we assigned an importance value to each point based on the work by Morris [4]. The table in Figure 4 shows the importance of points to winning a game, when a server has 0.62 probability of winning a point on serve. This shows the two most important points in tennis are 30-40 and 40-Ad, highlighted in dark red. To simplify the rankings we grouped the data into three classes, as shown in Figure 4.

Figure 4. The importance of points in a tennis match as defined by Morris. The data for the match was classified into 3 categories as indicated by the sequential colour scheme in the table (dark red, medium red and light red).

In order see a relationship between outright success on a serve at the important points we mapped the distribution of successful serves and overlaid the results onto a layer containing the important points. If the player returning the serve made an error directly on their return, then this was deemed to be an outright success for the player. An ace was also deemed to be an outright success for the server.

Results

Federer’s spatial serve cluster in the ad court on the left side of the net was the most spread of all his clusters. However, he served out wide with great accuracy into the deuce court on the left side of the net by hugging the line 9 times out 10 (Figure 5). Murray’s clusters appeared to be grouped overall more tightly in each of the service boxes. He showed a clear bias by serving down the T in the deuce court on the right side of the net. Visually there appeared to be no other significant differences between each player’s patterns of serve.

Figure 5. Mapping the spatial serve clusters using the K Means Algorithm. Serves are grouped according to the direction they were hit. The direction of each serve is indicated by the thin green trajectory lines.  The direction of serve was used to statistically group similar serve locations.  (click to enlarge)

By mapping the location of the players serve bounces and grouping them into spatial serve clusters we were able to quickly identify where in the service box each player was hitting their serves. The spatial serve clusters, wide, body or T were symbolized using a unique color, making it easier for the user to identify each group on the map. To give the location of each serve some context we added the trajectory (direction) lines for each serve. These lines helped link where the serve was hit from to where the serve landed. They help enhance the visual structure of each cluster and improve the visual summary of the serve patterns.

The Euclidean distance calculations showed Federer’s mean distance between sequential serve bounces was 1.72 m (5.64 ft), whereas Murray’s mean Euclidean distance was 1.45 m (4.76 ft). These results suggest that Federer’s serve had greater spatial variation than Murray’s. Visually, we could detect that the network of Federer’s Euclidean lines showed a greater spread than Murray’s in each service box. Murray served with more variation than Federer in only one service box, the ad service box on the right side of the net.

Figure 6. A comparison of spatial serve variation between each player. Federer’s mean Euclidean distance was 1.72m (5.64 ft) –  Murrray’s was 1.45m (4.76 ft). The results suggest that Federer’s serve had greater spatial variation than Murray’sThe lines of connectivity represent the Euclidean distance (shortest path) between each sequential service bounce in each service box.  (click to enlarge)

The directional arrows in Figure 6 allow us to visually follow the temporal sequence of serves from each player in any given service box. We have maintained the colors for each spatial serve cluster (wide, body, T) so you can see when a player served from one group into another.

At the most important points in each game (30-40 and 40-Ad), Murray served out wide targeting Federer’s backhand 7 times out of 8 (88%). He had success doing this 38% of the time, drawing 3 outright errors from Federer. Federer mixed up the location of his 4 serves at the big points across all of the spatial serve clusters, 2 wide, 1 body and 1 T. He had success 25% of the time drawing 1 outright error from Murray.  At other less important points Murray tended to favour going down the T, while Federer continued his trend spreading his serve evenly across all spatial serve clusters (Figure 7).

The proportional symbols in Figure 7 indicate a level of importance for each serve. The larger circles represent the most important points in each game – the smallest circles the least important. The ticks represent the success of each serve. By overlaying the ticks on-top of the graduated circles we can clearly see a relationship between the success at big points on serve. The map also indicates where each player served.

Figure 7. A proportional symbol map showing the relationship of where each player served at big points during the match, and their outright success at those points.  (click to enlarge)

The results suggest that Murray served with more spatial variation across the two most important point categories, recording a mean Euclidean distance of 1.73 m (5.68 ft) to Federer’s 1.64 m (5.38 ft).

Conclusion

Successfully identifying patterns of behavior in sport in an on-going area of work [5] (see figure 8), be that in tennis, football or basketball. The examples in this blog show that GIS can provide an effective means to geovisualize spatio-temporal sports data, in order to reveal potential new patterns within a tennis match. By incorporating space-time into our analysis we were able to focus on relationships between events in the match, not the individual events themselves. The results of our analysis were presented using maps. These visualizations function as a convenient and comprehensive way to display the results, as well as acting as an inventory for the spatio-temporal component of the match [6].

Figure 8. The heatmap above shows Federer’s frequency of shots passing through a given point on the court. The map displays stroke paths from both ends of the court, including serves. The heat map can be used to study potential anomalies in the data that may result in further analysis.  (click to enlarge)

Expanding the scope of geospatial research in tennis, and other sports relies on open access to reliable spatial data.  At present, such data is not publically available from the governing bodies of tennis. An integrated approach with these organizations, players, coaches, and sports scientists would allow for further validation and development of geospatial analytics for tennis. The aim of this research is to evoke a new wave of geospatial analytics in the game of tennis and across other sports. Furthermore, to encourage statistics published on tennis to become more time and space aware to better improve the understanding of the game, for everyone.

References

[1] United States Tennis Association, “Tennis tactics, winning patterns of play”, Human Kinetics, 1st Edition, 1996.

[2] G. E. Parker, “Percentage Play in Tennis”, In Mathematics and Sports Theme Articles, http://www.mathaware.org/mam/2010/essays/

[3] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, No. 1, pp. 100-108, 1979.

[4] C. Morris, “The most important points in tennis”, In Optimal Strategies in Sports, vol 5 in Studies and Management Science and Systems, , North-Holland Publishing, Amsterdam, pp. 131-140, 1977.

[5] M. Lames, “Modeling the interaction in games sports – relative phase and moving correlations”, Journal of Sports Science and Medicine, vol 5, pp. 556-560, 2006.

 [6] J. Bertin, “Semiology of Graphics: Diagrams, Networks, Maps”, Esri Press, 2nd Edition, 2010.