Skip to content

Ruka Classic Sprint Predictions

The methodology is also still unfinished. I’m sure I’ll be fiddling with it a lot over the course of the season, but I’m trying to commit to publish these for as many World Cup events as I can. It’s sort of reliant on my having a reliable start list, so if I’m busy or traveling the day before a race, I may not get to it in time, we’ll see.

For the moment all I’m going to say is that I’m publishing predicted finishing order for the entire start list. The “Predicted PR” column is the actual value the model outputs, so it gives some relative sense of certainty. Values that are further apart represent more certainty of the gap in finishing place. But they are not probabilities and do not have any particular units.

Also, in case it wasn’t clear, I’m not terribly interested in participating in Twitter anymore, so anything I produce this season will be going here.

Men's Ruka Classic Sprint Predictions

Generated by wpDataTables

Women's Ruka Classic Sprint Predictions

Generated by wpDataTables

Diversity of Nations

I thought it might be a decent time to revisit a thing I’ve done periodically, which is measure the diversity of nationalities among the top finishers. There are a ton of different diversity indexes, coming up mostly from either ecology (species diversity) or information theory (think Claude Shannon). I don’t think I’ve been very consistent, but here we’ll use the Gini-Simpson index since it has a convenient interpretation: the probability that two skiers chosen at random (with replacement) are from different nations. The higher the probability, the more ‘diverse’ the nationalities.

Here’s a plot of this measure for major international distance events going back to the early 90’s.

I want to draw your attention to the separate y-axis scales in each of the panels. This is important to keep in mind, to remember that despite the appearances of how “bouncy” the lines are, the variability in the Top 30 panel is much, much smaller.

The variability of the diversity for podium finishers (Top 3) is quite large, which we’d expect since it will be very responsive to dominant seasons by 1-2 individuals, like, say, Marit Bjoergen. The confirmation a lot of fans today should note is the general steady decline in diversity among the men since 2008 or so. The women’s diversity at this level was also the highest in the 00’s and then fell during the heyday of Bjoergen and Kowalczyk but has been climbing back up lately. A similar trend can be spotted at the Top 10 and (kind of) Top 30 levels, although the changes are happening at much smaller scales (again, keep the y-axis scales in mind).

Here’s the same plot for sprints, going back to 2002 or so. This time I’m using the Top 12, to generally align with the semifinals.

Again, keep in mind the different scales on the y-axis. The Top 30 level is bouncing around much, much less than the others, despite how jagged the graph looks.

Not as much of an obvious overall trend here, except that the 2015 season seems to have marked a fairly dramatic transition from a period of relatively high nationality diversity in sprints to one dominated by Norway & Sweden that we’re only sort of coming out of lately.

Bolshunov’s Decimation of the Field

Alexander Bolshunov won the 2022 Olympic 30k skiathlon the other day with a rather eye-popping margin. This sort of thing always causes people to start asking questions about You Know What. I’m not interested or qualified to weigh in on that, but anything unusual is always interesting from a data perspective, and this is no exception.

I posted a couple of graphs on Twitter showing how extreme the field separation was in this race compared to either just skiathlons or all skiathlons and other long mass start races as shown below:

Someone asked if I could do the same thing but include all events, even shorter races and interval starts. I think the reason I restricted myself to skiathlons and mass starts initially is pretty self explanatory, but just for fun let’s look at all race types. I’m still going to exclude pursuit starts because, well, frankly at this point I hardly even consider them real races (🔥). And this includes all major international events stretching back to the late 70’s. The ones that I have, at least. My data from the 70’s and 80’s is incomplete; I probably only have around 60-70% of the races from that era. Anyway, here you go:

This one is a bit too busy to annotate cleanly on the graph, so here’s a list of the 30 most extreme cases for easier browsing. Be sure to sort on the percent difference from 10th place column first.

pb10_top30_list

Generated by wpDataTables

Sprint Quarterfinal Choices

Of course Devon would spend a bunch of time talking numbers in this week’s podcast when I happen to have tons of stuff I should be doing besides making skiing graphs. Oh well. That’s what I get for outsourcing all my ideas on what to write about to other people.

Anyhoo, Devon spent some time talking about how rare it is for sprint podium finishers to have come from the later quarterfinals (i.e. the top finishers tend to come from quarterfinals 1 & 2). As you can see below, that is obviously true, although as he also alluded to this is a textbook correlation-causation gotcha.

The best skiers tend to qualify well, so they tend to have better quarterfinal choice options, so they will tend to pick the option that gives them the most rest between the semifinal and the final. One interesting thing, though, is how much more severe the pattern is for the men than the women.

A more sophisticated analysis would attempt to somehow control for skier ability in the above graph, although that gets complicated pretty quickly. Perhaps I’ll revisit that later when I have more time.

(And yes, people who follow the World Cup closely will know that the bar for 7th place finishers in a sprint final in the women’s panels is not a mistake. It would have been really funny if it had been Sophie joining him on this episode rather than Sadie.)

Sophia Laukli Has A Strong Early Race in Beitostoelen

The first round of ‘pre-season’ races are complete and one American result stood out to me in Beitostoelen. Sophia Laukli finished 7th in a 10k freestyle in a field comprised of mostly Norwegians. She finished 1:24 behind the winner Therese Johaug, or around 6.4% back. That’s a big margin to be behind most winners, but maybe not so big a margin to be behind Johaug. Also, the result is interesting because Laukli is fairly young.

These early season races obviously should be taken with a grain of salt. The fields can be uneven, and many of the athletes are not necessarily in top form, or may be using them more as training as anything else. But this race happens to offer some convenient potential for comparisons because with Johaug in the race your percent back from her is pretty likely a decent preview of your percent back in a genuine World Cup.

The split times don’t show anything too surprising. Johaug pulled away early and the top 4 all maintained their positions fairly consistently through the race. Laukli started somewhat conservatively and moved up dramatically during the first half and then held on during the second lap. (Relative to Johaug, at least.)

Next, we can look at a lap pacing plot.

This tells roughly the same story, but maybe emphasizes a bit more that Laukli really did seem to accelerate through the first lap and then hold that speed quite well for the second lap. Laukli’s Instagram post on the race suggests that she caught a ride from Heidi Weng for a good chunk of the first lap (second lap, I was wrong) (and looked really good going through the 2.5k marker, frankly).

So what does 6.4% behind Johaug in an interval start race translate into, roughly? Well, this wouldn’t be cross-country ski racing if there weren’t a healthy bit of variability. In last years major international interval start races that would have put Laukli anywhere from 8th (in the WSC 10k freestyle, no less!), to 18th, 20th, 26th or 29th. That’s just based on where 6.4% back would have put her last season in the available interval start races. That WSC race seems like a bit of an outlier, so high-teens to low-20’s is probably the safe estimate.

One of the interesting things for American fans though, is how young Laukli is. If we take US women’s results in interval start major international races since 1992, tack on Laukli’s Beitostolen race (the assumption being that Johaug provides a decent yardstick), and plot their percent back versus age we get this:

That’s pretty solidly in the “exciting” quadrant of the graph. Zooming in a bit, we get this:

Again, this is all an enormous amount of analysis for one pre-Thanksgiving result. But what are pre-Thanksgiving races for if not for generating either irrational exuberance or premature panic? Isn’t that their whole purpose? 😉

The USA-CAN Men’s Relay Rivalry Has A Very Long History

Warning: this post contains no statistics and virtually no data, but I suspect it may be fairly entertaining!

During most of my lifetime, the US & Canadian men’s 4x10km relay teams have not often been in medal contention, although the Canadians definitely had some strong teams for a period with Devon Kershaw, Alex Harvey, Ivan Babikov and others that were occasional outside contenders for a medal. (For instance, I believe the Canadians were 3rd in one WC sprint and 5th in another in the last 10 years or so.) But frequently both North American men’s relay teams have been out of medal contention fairly early. Naturally, you would expect a friendly competitive spirit to develop between the two neighboring nations as a sort of “race within the race”.

I’ve been doing some long overdue organizational work with the several boxes of paper race results records that I’ve received (primarily from Ruff Patterson & John Estle). In the process I (re)-discovered an entertaining little nugget: a telex (!) sent to the US from Europe summarizing the men’s & women’s relays at the 1985 World University Games in Nevegal, Italy. I haven’t done the research to determine for sure who sent the telex and who specifically the recipient was (presumably either Ruff Patterson, or US Ski Team staff generally) but it tells quite an entertaining story.

Here’s the scene. The US men’s relay team consisted of Joe Galanes, Terry Daley, Josh Thompson (future biathlete) and Todd Boonstra, in that order. The Canadians ran Allain Masson, Wayne Dustin, Owen Spence and Benoit Letourneau. Before we even get to the North American “race within a race” on the men’s side, there was some notable drama in the women’s race. What follows is a lightly edited transcript of the start of the telex:

Team nordic Sovi-jet (sic) women nailed for skating in tag zone, cost ’em gold. No impact on US. For (sic) good legs from men and we’d have bronze except the Canucks blew in and took it with USA fourth.

Hotline: ‘They upheld a protest against skating in the tag zone of a cross-country ski relay todayĂ¢â‚¬Â¦and it cost the Soviet Union a gold.

Three-by-five-kilometer relay at the World University Games in Nevegal, ItalyĂ¢â‚¬Â¦But the five-member jury disqualified the Soviets for skating in the exchange zone between laps. The decision was the first time skating protests have been upheld in a major international raceĂ¢â‚¬Â¦coming on the heels of disallowed protests last month at the Nordic World Championships in Austria and last week at the World Junior Championships in Switzerland, by knocking out the Soviet women, the jury handed the relay gold medal to Czecoslovakia (sic)Ă¢â‚¬Â¦with Poland the silver medalist and Finland third.

Exciting! The telex continues in depth on how the men’s relay played out:

In the men’s relayĂ¢â‚¬Â¦the Soviets tooch (sic) charge on the final lap and earned the gold with Italy second and Canada holding off the lead from–of all teams Japan early on the second lap while the Soviets were struggling. However, Vladimir Nikitin took over at the start of the final 10-kilometer lap and pulled the Soviets in firts (sic), Todd Boonstra kept the US close on the last lap but it wans’t (sic) enough as the US had to settle for fourth in the 11-team field.

This section of the telex (the “Hotline”) reads like a press-release of sorts. What follows appears to be more of a direct communication between ski team staff describing the race in more detail:

Bat fecal matter (sic), got the okay for Joe to race and he was accredited without problemĂ¢â‚¬Â¦but who wud (sic) have figured the Canadians wud (sic) replace Austrians and Finns? We had figured Soviets and ITA as 1-2 and that gave us a shot at the bronze we got four excellent legsĂ¢â‚¬Â¦but the Canadians upset everything. Galanes was six secs out of first at about 3.5km, sitting in sisth (sic) place at back of second three-man knot when his pole got caught in netting along track on a lefthand curve. He lost about 20 secs but regained it and was in fourth before running out of gas at end of lap. Still we were close to third, which was what we wanted, Josh and Terry gave us strong middle laps we needed, but the Canadians wudn’t (sic) drop. They went with speed to hang-on at the end and pulled it off. Masson had ’em third behind JPN and SOV, just 25 secs out of firstĂ¢â‚¬Â¦Dustin took lead and they held it till end of third leg when Soviets regained it, and Nikitin cruised. Boonie cudn’t (sic) catch junior Benoit Letourneau altho (sic) he cut 11 secs off the gapĂ¢â‚¬Â¦women never in it with Butts skiing scramble and seventh (last) right from the start. When the Sovi-jets (sic) were dumped, that moved US to sixth.

No, I do not understand the “bat fecal matter” reference either. It continues…

Taylor: ‘We wanted four good legs and we got ’em but there was no way we could anticipate the Canadians being so tough. Dustin is skiing very hot right now and is the classic picture of someone getting better as his confidence grows. Getting Joe last night was a shot-in-the-arm and obviously was the key to us being any kind of a contender, who knows what wud hv (sic) happened if he hadn’t gotten hung-up in the nets but he gutted it out and made good comeback. Terry and Josh really showed something and I’m going to talk with Terry about his thoughts on skiing for us, he’s raw talent and wud (sic) be a terrific addition if he decided he’d put his career on hold. Boonie skied a honey of a race, too, but he’s not the Todd Boonstra we’ve seen in the past, still, absolutely no complaints–except someone forgot to tell the Canadians what we planning (sic)’

The US would have won the bronze if it hadn’t been for those meddling Canadians! After all that, one of the things that actually fascinates me the most is that after mentioning the surprise position of the Japanese team early in the race, the telex fails to mention that they were disqualified! Here’s images of the actual race results:

I can’t find an Article 382.5 in the current FIS rules, so I’m not sure what the Japanese team was disqualified for. But I guess that was overshadowed by the drama with the Soviet women skating and the battle royale between the US & Canadian men.

Even Bigger Winner Margins!

Therese Johaug already made me write one post on this topic this championships and now she’s forced me to write another. So I promise I’ll keep this short, unlike the gaps from Johaug to the next skier.

Therese Johaug once again won a race by a very large margin, this time in the 30km classic mass start in the 2021 Oberstdorf WSCs. Personally, I don’t think that I’m as impressed by huge mass start winning margins as other people seem to be. Obviously, unlike interval start races the visual impact of large gaps in mass starts can make quite an impression. But for me, at least, the relationship between my amazement and the size of the gap is not strictly linear. It’s probably something more like this:

When a skier pulls away in a mass start race, I do become steadily more amazed as the gap grows, up until it reaches a certain point and then my amazement sort of plateaus, and even declines a bit after a while. (Obviously, the specific units on the x-axis of this Very Serious and Rigorous graph would change for races of different lengths.)

Which is all just to say that I found Johaug skiing away from the field extremely amazing, up until the gap started getting up into the 90 second territory. After that, I’m not sure any further increases in the margin mean all that much. But that’s just me! You can obviously take joy and amazement from whatever you like.

A consequence of this is that I don’t think you can meaningfully compare winning margins from mass starts except at the level of breaking it down to basically three categories:

  • It was a sprint finish
  • They were in contact but not enough for a sprint
  • There was a very big gap, it wasn’t close at all

For me, mass start winning margins fall into one of those three categories, and I personally don’t get much out of any further parsing of the time gap. But what would I even be here for if I didn’t give you a list? So here’s a list of the biggest (by percent back) winning margins in major international events in the last 30 years or so.

Largest %-Back Time Gaps

Generated by wpDataTables

So perusing this table you can see why Johaug’s 10km freestyle interval start gap is still the most impressive thing I’ve seen recently.