Friday, November 3, 2017

Getting into Sports Analytics



I’ve been working almost a year and a half now in a full time role as a data scientist at Opta and Perform Group working in football analytics for a variety of clients both in the pro and media spaces.

I’m not going to lie it’s a pretty cool gig. Which is why I guess it isn’t too surprising the question I get most often is “how do I get a job in football analytics?”

A few years ago I was asking a lot of “more established” analysts this question too. Most of the answers you get are things like: do more work publicly, show you know how to ask a good question and approach the problem in a meaningful way. When I was asking these questions and getting these answers I was always a bit disappointed — I get that you have to do work but what else should I be doing?

As time went on I wrote more and more, got good feedback on my work from people in the analytics community, sort of stumbled my way into a freelance writing gig (from what I understand of the journalism game I think this was a pretty lucky break if I am being honest, less to do with me and more just being in the right place at the right time), and eventually started to get in contact with performance analysts at clubs doing the odd piece of work here and there. It was probably about three or four years after I had first started in the football analytics community that I graduated from university and lo and behold I found myself in a full time job in football analytics.

Obviously there was lots of hard work along the way but it wasn’t like I had a concrete plan of going from step one to step two and ending up at Opta. So when people ask “how do I get a job in sports analytics” I don’t really have a satisfactory answer either beyond what was told to me or an in-depth and overly specific life story.

But because this is a question I get so much I thought I could give at least a few tips that I’ve picked up along the way that have either helped me or people I know in the industry.

Start doing work now and make it public

As of today I have done work in Python, R, SQL, Stata, Apache Spark and MatLab, I have experience with general linear models, supervised and unsupervised machine learning models, bayesian models and more.

I don’t say that to brag. I say that because when I started writing about football analytics I would have only known what about 20% of the words in that sentence were.

When you get into an industry like this you’ll inevitably compare yourself to the people who are better than you, and there are always people who are better than you. When I started most of my work was in excel with data I’d copy+pasted. I compared myself to people who knew how to code and wished I could be as good as them. As I started learning how to code I’d compare myself to people who’s math ability made their modelling skills significantly better than mine. Today my coding and math backgrounds are both pretty good and I compare myself to people who make fancier web-apps than I can or academics who build complex models with tracking data.

The point is you are never going to be the best so don’t wait until you are. I hear from a lot of people that they want to learn how to do x before they start writing. If you do that you’ll be waiting forever. Start now. Doesn’t matter how simple or silly you think the idea is. If you get your work out there, even if you just demonstrate you know how to ask a good question and your methodology is suspect people will still find it. The feedback may not be what you want all the time, but you’ll learn from it. I grimace when I look back at some of my early stuff, but I know if I’d never done it or put it out there I certainly wouldn’t be where I am now.

So don’t let the gatekeepers keep you out. Publish something and you’ll have an audience.

Sports Analytics isn’t a degree (and it doesn’t need to be)

To be fair the sentence “sports analytics isn’t a degree” probably isn’t true anymore. Actually I think now there are several degrees offered in sports analytics, but that’s beside the point. One of the follow up questions that comes with how to get into sports analytics is “what should I study to work in sports analytics?” This one I think I have a better answer to: study something you are interested in.

I studied economics (with lots of political science and math thrown in), because it’s something I was — and still am — really interested in. People I know have gotten jobs or are big names in sports analytics who have studied history, philosophy, chemistry, meteorology, physics, theology and just about everything in between. A lot of sports analytics is about thinking intelligently about a problem and communicating complex concepts in terms people can understand. These are skills you will pick up across a myriad of academic disciplines. Going into a degree because you think it will get you a job in sports analytics is a) incredibly limiting and b) probably not true, the first thing a potential employer in sports is going to look at is never going to be what your degree was in.

Learn

Saying that you should start publishing work as soon as possible and study what you are interested in doesn’t mean you shouldn’t learn skills specifically for sports analytics. The learning part is often a bit more intimidating but it really shouldn’t be.

The first is learn your sport. There is a tendency in some areas of the media to frame analytics as something which is at odds with the expertise of experts in the field, and sometimes analysts themselves fall into this trap, but there is so much you can learn about the sport from people in it. Analytics gives you a new way to approach and sometimes challenge these ideas, but it is important to learn from experts in the sport to even be able to start discussing these ideas.

The second is learning to code. Chose a language (I’d recommend python or R) and learn how to code in it. This is often the most intimidating step for newcomers, but regardless of your educational background coding is something you can learn.

One thing I would suggest is learn with a project you are interested in, yes it’s important to learn how to print “hello world” but it can also be a bit tedious. If you start with a project you want work on, preferably a simple one to start with it can make the process of learning new a new language and inevitably getting frustrated with it along the way bit more palatable. Again don’t get fed up if you aren’t an expert right away, because the truth is you’ll never be one: there will always be people better than you who you’ll be learning from.

Finally I think if you really want to advance and work in a more technical field it’s important to learn a bit of math and get an underlying understanding of probability and statistics. Again there are plenty of places to learn online and you don’t need a degree in a mathematical field to necessarily be a good analyst.

Working in a club isn’t the only job

In analytics circles working in a club or team has come to be seen as the pinnacle, when an analyst has really made it. The truth is club jobs are only a fraction of the jobs out there. There are people like myself and many others who work for data companies or consultancies. As analytics gains more traction in the mainstream the demand for journalists and people in the media who have a good understanding of analytics work will increase as well. In fact many people work in clubs and don’t enjoy it finding that they prefer working in media or consultancy spaces.

The roles are all different but the skill sets are similar: you need to understand the sport, what the problems that the sport presents are, how data can be used to approach these problems and how to communicate all of this in a succinct and easy to understand manner.

Don’t limit yourself to looking at clubs, the demand for smart people working in sports is much broader than just clubs.

You still might not get a job and that’s okay

I worked hard to get where I am and following some of the steps above helped me get a job in the industry. I’m also a very privileged straight, white guy who had lots of support from family + friends and moved across an ocean on my path to working in football. It’s no secret sports jobs are pretty highly sought after so I can’t guarantee that by following these steps you’ll get a job.

That being said the skills I’ve outlined are things that are useful in many fields and it’s not like if you don’t get a job in sports analytics it will have all been for naught. If you learn how to code, how to use data and how to communicate mathematical concepts effectively you will be more employable in any field.
Also of course — this stuff is fun! We do it because we love the sport and we want to learn more about it. If you aren’t enjoying it, stop because there are industries that will pay you more.

While this isn’t a blueprint of “how to get into sports analytics” I hope some of this helps. And if you don’t think any of this is useful then ignore it all — you may well have better ideas!

It still feels surreal that a sport I grew up loving, playing (poorly), coaching and refereeing is now my full time job. I still have imposter syndrome all the time when I am in meetings with coaches, analysts, players, broadcasterts and journalists, but slowly as more and more people listen to me and realise I genuinely have something to offer it becomes more natural. It isn’t like was ever a breakthrough moment when I became a proper sports data analyst^TM, so hopefully talking about all of the thing helps in some way to de-mystify the process.

So now that I’ve written all of these ideas out somewhere I hope that next time instead of messaging me to ask for advice on how to break into the industry you’ll message me with your first blog post or an example of some public work!

Thanks for reading and good luck!

Cross posted to medium: https://medium.com/@GregorydSam/getting-into-sports-analytics-ddf0e90c4cce

Thursday, June 30, 2016

PEDs and Moralizing: The Hazy Lines

2016 has not been a good year for Performance Enhancing Drugs in sport. Maria Sharapova was suspended for two years for a failed drug test at the Australian Open, the entire Russian team has been banned from the Olympics (contingent on the 67 athletes who appealed earlier this week) and the discussion worked its way into football through the Mamadou Sakho case.

One thing that separates PED or steroid cases from other suspensions is the moralizing that comes into play. People who are found guilty of using PEDs in sport are not just athletes who cheated, they are "cheaters". There are very few other infringements I can think of across any sport that brands athletes cheaters for life (corked bats in baseball?).

So what is it about steroids? Well firstly there is a complicated history behind the PED-taboo. I can't claim to be an expert in this area but a lot of the current atmosphere surrounding PEDs stems from East German cases during the 1970s at the height of the Cold War. In an era where everything was a race between the Eastern and Western blocs any excuse to say the what the others were doing was cheating made for easy propaganda. Since then we've had hundreds of high profile cases from Ben Johnson at the 1988 Olympics to baseball in the 1990s-2000s and of course Lance Armstrong.

Whether rightly or wrongly Ben Johnson was completely disgraced and spent the rest of his career doing dumb commercials for Cheetah energy drinks which made use of the oh so clever cheater-cheetah pun, Barry Bonds one of the most prolific baseball players of all time looks like he will never get near the hall of fame and Lance Armstrong is most remembered today for an Oprah interview. The discovery of PED use ends careers and at this point it would be surprising if Maria Sharapova ever sheds the title of cheater.

The problem is that none of these infractions are nearly as clear cut as they may initially sound.

Firstly the definition of a PED is problematic. The name Performance Enhancing Drug as a synonym for something illegal is incredibly misleading, since every professional athlete on the planet takes performance enhancing drugs of some sort or another. Whether those are pain-killers, anti-depressants, or anabolic steroids. They all improve performance and some I think all sport fans would be okay with. Anti-depressants for example are an essential part of many people's day-to-day lives and they surely enhance performance for athletes who suffer from depression, yet no one would ever suggest to ban athletes for taking these type of drugs. So what is an illegal PED? Well something that the competition rules say is illegal.

These banned substance lists are fluid, constantly changing and always up for debate. What is a banned substance today might not be tomorrow and vice-versa. So here the moralizing starts to get a little hazy. If an athlete takes a drug that was legal and then stops taking it when it becomes banned was that athlete cheating before hand? Should we look back on their results differently? Sure they were following the rules, but if we truly believe that taking PEDs is a moral infringement then it shouldn't matter what the "rules" were at the time. Unless of course it is the simple act of breaking these rules that is so egregious.

However, this brings up another problem. If it is only the breaking of the rule that is the problem not the taking of the drug itself why is it such a bigger deal to break PED rules than say the Laws of the Game. Players are often commended for pushing boundaries during games if they don't get caught. For example in football we often talk about the "dark arts of defending" where players attempt to put each other off in the box by pulling shirts or holding onto arms and we never brand players cheaters for doing this. Obviously there are differences with PEDs but supposedly the main issue with both is that they are breaking a rule set out by the competition.

The main response to this line of thinking is often the health effects, that PEDs negatively affect an athlete's health. I'm not a doctor and I'm not well read enough on the subject to address the validity of these claims, but I'm sure some banned substances have negative effects on an athlete's health (on the flip side I'm sure many have negligible negative health effects as well but that's beside the point). This is obviously a reasonable concern, however the problem is that being a professional athlete is almost certainly bad for your health regardless of what drugs you take.

How many athletes do we hear about across all sports who struggle with chronic pain, depression, and a variety of mental and physical health problems after retirement? There is a difference between consistently exercising and the incredible physical and emotional stress that professional athletes are subjected to. So sure many banned substances may be bad for your health but so is being a professional athlete and we don't hear calls to mandate training hour limits or intensity limits (and if there were these regulations athletes would find ways around them just like they find ways around PED regulations).

Now all of this is not to say PED use is something that should be ignored. I think it's important to make it clear that especially in the cases of minors, young athletes should not be forced into making decisions between their health and their athletic futures. If kids feel the need to resort to PEDs to make it to the next level then we clearly have a problem. However, it's not clear the current PED approach is the best to make sure that doesn't happen.

In order to make progress on this front one thing that needs to happen around PED use in sport is an understanding that the line is not as clear as many make it out to be. Athletes caught using banned substances are not "cheaters" in a way that is substantially different to athletes who consistently infringe upon the laws of their sport. Moralizing about it doesn't help, whereas being understanding about it does.

Undoubtedly countless athletes who have used PEDs their whole careers will never be caught, so right now all that is happening with these severe punishments is that the few who are caught are taking the brunt of the discipline for not having as good masking agents or whatever else as their competitors. I don't think these cases should be ignored, but banning Sharapova for two years and essentially ending her career is not going to end doping in tennis. She made a mistake, she did something wrong, but to ban her for two years suggests she did something far more egregious than what she really did. She tried to gain an edge and didn't get away with it.

The Sakho case earlier this year is inevitably just the tip of the iceberg when it comes to PED use in football, and FIFA need a plan. The haphazard approach of arbitrary suspension lengths, which often serve to end or at least poison careers just doesn't match up with the reality of what PED use is and isn't. It has hurt baseball, it has hurt cycling and if it continues in tennis it will inevitably hurt that sport as well.

So what's the right approach? I'm not sure, but PED use in sport is messy and complicated whereas the approach to dealing with it so rarely reflects this reality. There is no clear line of what is right and what is wrong yet so often we talk in terms of this imaginary line. Hopefully sport can learn from it's mistakes of the past and come up with a strategy that better reflects the reality of the modern day professional athlete.

Wednesday, April 13, 2016

What really is a Football Club?

One thing I've heard a lot this season is something along the lines of "Brighton are having a really good year because they do analytics." Last year it was Brentford and Midtjylland people were saying this about, who knows which teams it will be next year.

Which brings me to the question what the hell does "doing analytics" actually mean? Does it mean using data? Well, every club in the world uses data. Does it mean using data well? Well, what does using data well mean? Who is using the data? What are they using it for? Recruitment? Opposition analysis?

It's a meaningless statement to say a football club "does" anything. Football clubs don't do things. Football clubs are the output of the work of sometimes hundreds of different people all of whom from the academy coaches to the first team players have different ideas about what they'd like the club to be in an ideal world.

Think of a football club like a movie. A movie doesn't do things, it is the final product of the work of the many individuals who were involved in putting it together. A movie can be shitty because the lead actor is terrible. Now whose fault is that? The casting director? The director? The actor himself? The screenwriter for giving him poor material to work with? It's hard to say and like a football club the final product is the result of many different people's opinions, work and effort.

A year ago I really had no idea how football clubs worked. Having worked with Analytics FC for the past six months or so now I've had a lot of really cool opportunities and although I'm far from a voice of authority on this matter I feel I have a much better idea of what football clubs are and how they operate.

The first thing I think is important to say is that I've spoken with people at just about every level at a club from Directors of Football to Academy Coaches to Performance Analysts to 'Analytics' staff and none of them are stupid. I don't say this to be deferential or anything, it's just that these people watch tons of football, think about the sport every second they are at work and all bring valuable skills and experience to the club.

So when people say "x club is dumb" it's not only a simplistic way of thinking about things but it is wrong. There are bound to be plenty of smart people at that club and smart people in important positions. So why is the club unsuccessful? It could be something to do with the process that is wrong, it could be that people are in roles to make decisions that they really shouldn't be given their specific skill set, it could just be a matter of bad luck. The world isn't divided into smart and dumb clubs, because again clubs aren't single-entities they are the joint output of many different members of staff. Sometimes the output is good, sometimes it isn't.

Many of the same ideas are true with respect to analytics. There aren't clubs that do analytics and clubs that don't do analytics, because doing analytics isn't a thing. There are people at every club who use data to varying degrees and with varying efficiency. That being said every single club in the world could improve how they use data, and how they incorporate the use of data in their decision making process.

I think about my personal skillset and I honestly believe I could offer something to every club out there. That isn't to be arrogant, this sentiment probably applies to most of the people reading this as well. At some clubs like Arsenal who already have a full time analytics company working for them my role would just be to assist people smarter than me in continuing to do what they already do and allowing them to watch more games or whatever else it might be. At other clubs it would be to try and incorporate new ideas that would be considered low hanging fruit to many in the online analytics community.

The way analytics works its way into a club isn't to turn a dumb club into a smart club, but it's the idea of improving the process and increasing the number of ideas heard by people in important decision making roles. That's why people say again and again communication is the most important part of analytics, because it isn't something we do after the analysis, it is part of analysis itself because without it none of those things happen.

One thing I've heard from everyone who has worked at a club in whatever role is that they know when to push an idea and when to just accept decisions they might disagree with, because a club is not one person making decisions it's a massive team. This may sound obvious, but I've found it really helpful to think about all these moving parts when we talk about clubs. It helps to move past useless debates about "smart" and "dumb" clubs and actually address what it is that makes certain clubs more successful than others, because there are smart people everywhere in football. Finding what these people are good at and what they should and shouldn't be responsible for is much more challenging and really a more interesting question to ask.

Friday, October 16, 2015

Heart of a Lion Joe and Lazy Bob

One of the biggest critiques of analytics and statistics in general is that they don’t take into account intangibles like confidence, “wanting it more”, desire and heart. On the surface this seems like a fair argument, we don’t have a statistic for leadership, heart and character per 90, but to suggest analytics is ignorant of these ideas ignores how these factors actually affect the game.

I’m not a sports psychologist so I’m not going to say which of these intangible traits matter in a game like football and which don’t, but let’s assume for the sake of argument that some of them do matter. I don’t think this is a ridiculous assumption, personally I’m much better at a lot of things when I’m confident or motivated.

Assuming these things do matter there’s a few ways they could present themselves on a football pitch, let’s unpack these one-by-one.

First imagine two players, let’s call them Heart of a Lion Joe and Lazy Bob. On a purely technical level Heart of a Lion Joe and Lazy Bob are equivalent, they are both strikers and have the exact same skill set. What is different about them comes down entirely to attitude. Heart of a Lion Joe loves football, has lots of heart and always wants to win football matches above everything else. Lazy Bob isn’t that bothered about winning and probably won’t commit to a 50-50 ball with the same vigour that Heart of a Lion Joe will.

Individual Effects

Now assume that intangibles affect the game through individual effects. That is to say that because Heart of a Lion Joe has better intangibles than Lazy Bob he will be a better player. Well if he is a better player because of these intangibles than these effects will show up in the data. Sure in a comparison of individual skills outside of a game context Heart of a Lion Joe and Lazy Bob are identical, but if these intangibles really do make you a better player then we will see that Heart of a Lion Joe putting up better numbers than Lazy Bob.

Since they are strikers we will probably see Heart of a Lion Joe score more and take more shots than Lazy Bob because he “wants it more” and will be more aggressive in going for the ball or pushing himself that little bit harder. So sure in the statistics we won’t see that Heart of a Lion Joe has more ‘heart’ than Lazy Bob, but we will see that Heart of a Lion Joe is a better player who puts up better numbers than Lazy Bob and if we are capturing that does it really matter that we aren’t capturing the exact intangible that makes him better? Probably not.

Team Effects

The second argument that anti-stats people will tell you about intangibles is that you can’t measure them because they don’t just affect one individual player they affect the whole team.  When a team plays with Heart of a Lion Joe up front it gives the whole team a boost and they all play better because they are inspired by Heart of a Lion Joe’s incredible leadership. When Lazy Bob starts up front though he just mopes around and that dampens everyone’s mood.

Well luckily we can measure this as well! Using Shapley Values or GoalImpact we can compare how players affect their teammates. These are completely agnostic measures which means they don’t try and pin down the mechanisms through which players are making their team play better or worse they just compare how the team plays with or without them in the team (weighting for other factors like who they are playing with, strength of opponent etc.). So even though there isn’t a way to directly measure how one player’s intangibles affect another’s, if these mechanisms exist they will be picked up by Shapley Values or GoalImpact. Good news for Heart of a Lion Joe, we can even figure out if he makes his teammates better through his inspiring leadership.

One argument against this method, which is probably the only one I don’t have a good answer for is that maybe a player’s magical leadership qualities are so powerful that he doesn’t even need to be on the pitch to transfer them to his teammate. His presence in the dressing room alone is enough to make his team better. If that’s the case then maybe there is room to hire players purely for their dressing room abilities, but we probably shouldn’t pay them player’s wages. Teams should hire these people as inspirational speakers or as people who just “hang out” with the team.

 Jekyll and Hyde

After all this time maybe we find out that Heart of a Lion Joe and Lazy Bob were actually the same person. When he’s playing well he has confidence and takes on the persona of Heart of a Lion Joe and when he’s playing poorly he becomes Lazy Bob. This is probably the narrative we hear most often in the media, the confidence storyline. Analytics people respond to this narrative by saying it’s just random variation.

If you take off your analytics-tinted-glasses for a second you have to admit the mainstream media story of ‘confidence’ is a lot more satisfying than ‘randomness’. Randomness is an ugly word in sport. We like to think that everything happens for a reason so attributing these changes in performance level to confidence just feels better than saying it’s random variation.

The correct response here is who cares? Really, maybe the media are right and all random variation in scoring or performance level comes down to confidence, but it looks random from a data perspective. All that matters in the end is output, so if a player can’t control when he is Heart of a Lion Joe or Lazy Bob then why should teams?

If it looks like random variation that the player can’t control and it acts like random variation the player can’t control then for all intents and purposes we should treat it like random variation the player can’t control. Sure, maybe it’s confidence but if confidence is completely exogenous to what a footballer has control over why should teams care about it or evaluate based on it?

The mainstream media is probably right that the footballing world is full of mixture of Heart of a Lion Joes and Lazy Bobs, but they are wrong to suggest that analytics ignore the differences between these two players.