Another interesting data-centric post appeared over at NCCSEF, and when it comes to data, I just can’t help myself but comment.
This time we’ve got some slides that seem to be trying to draw a relationship between WJC results and winning a medal at (I believe) the Vancouver Olympic Games. We’re only shown the (partial) results for six skiers, so I’m not sure what exactly the lesson is supposed to be.
We seem to be mixing sprint and distance results together as an indicator for future success. That seems strange to me, but I’m certainly not an expert in that sort of thing. We’ve also selected a curiously successful subset of Olympic medalists to examine. Absent is Pietro Piller Cottrer, who’s best (and only) result at WJC was 32nd (admittedly, a long time ago). Also missing is Aino-Kaisa Saarinen who’s WJC results were 15th and 23rd. How about Tobias Angerer (WJC: 18th, 26th, 28th)? On the other hand, we are shown Marcus Hellner, who’s WJC results were good but not spectacular: 15th and 21st.
The further information provided at the bottom regarding time to an athlete’s first podium also contains mostly skiers who achieved this feat fairly young, but then also two who did not (Gaillard and Rickardsson).
What am I to learn from this? That the right path is to podium at WJC (Northug), except when it isn’t (Bjørgen, Haag)? That the right path is to be successful early in your 20’s on the WC (Northug, Harvey), except when it isn’t (Gaillard, Rickardsson)?
When I read stuff like this, I’m left feeling mostly confused, like I’ve been presented a bunch of data, but that no one has gone to the trouble to transform this data into information. The reader is left alone, drifting in a sea of numbers, wondering what exactly was the author’s point.
I’m absolutely not going to argue with the idea that skiers who show considerable promise early on are more likely to develop into successful WC skiers. Indeed, I’m less interested in the nuts and bolts of what results mean at a given age than I am in effective and clear presentation of data.
I’ve written about connections between WJC results and medal on another occasion and I tried to emphasize the fact that when you look at all the data, there’s certainly a connection, but the different paths that skiers take toward success can vary so much that it’s difficult to create many useful generalizations just from the data.
But let’s revisit this idea with a few simple approaches and see if we can organize the data in a way that’s informative (and maybe interesting too!). First, I’m going to broaden the scope from medals to top ten results at either Olympics or World Championships. The problem with looking only at medalists is that there are just too few of them. Much can be learned by imitating a single good skier, but there’s always the danger that what worked for them only worked because of something unique about them, rather than having stumbled across some universal truth of skiing.
Let’s tackle the connection between WJC results and whether or not someone achieves a top ten result at the Olympics or World Championships. I fit a simple model (actually, not so simple; no OLS regressions here!) and plotted the model’s predictions for the probability of a top ten result at a major championship based on that athlete’s best result at WJC (sprint or distance): Continue reading ›
Tagged Analysis, medals, podium, prediction, technical