I thought we could take a look at race data and try to figure out what it tells us about Zwift racing in a few posts. This will be the first one.
Background
Few will remember, but it has been a few years since I stopped spamming the forum with opinions and race data. The reason I stopped was that my goal was achieved. Zwift moved from the WKG based race categorization to a results-based one. (Oh, and I got long-covid too, which made racing a bit less fun, but that’s another story.)
I do have opinions on ZRS. It is not perfect. But the important step was this move from a very unhealthy system. Which means that I can look at race data today with just curiosity and no real agenda.
I could be wrong, but my impression a few years back was that Zwift didn’t do much race data analysis. Today I think they do. Zwift provides us with plenty of data, but they tell us very little of their own analysis of that data. It is understandable that they don’t. They have no real reason to. In fact, there are reasons why they shouldn’t. The obscurity of the ZRS model, for example, makes sure we can’t exploit it easily, the way we exploited the simple WKG system.
Third parties, e.g. ZwiftRacing, and ZwiftPower before Zwift acquired it, don’t really share much either. But us Zwifters have no stake in this other than our commitment to the racing itself and we don’t have access to the actual Zwift database other than the parts shown to us in the Zwift app or on ZwiftPower. So why not allow ourselves to guesstimate a little based on what little we can see, just out of curiosity? (It’s fun!)
Does the data we are shown explain race results?
ZwiftPower has limitations. We only get to see 400 pages of events. It may seem like a lot, but with today’s heavy calendar that only amounts to a month’s worth of racing. But let’s start with that. If we look at our row in the results table for our latest race on ZwiftPower, does the data displayed there explain our finish position?
We filter out all time-trials and funny formats and just look at the mass-start races. We shouldn’t mix categories, so we can look at cat C specifically and only races with at least 5 finishers. Cat A is its own beast and differs in some respects, but between B to E, C is the largest category, meaning we get more races we don’t have to filter out because of low attendance. Also, there are, I think, no strong reasons to believe that what might explain results in cat C should be much different in B or D.
One month’s racing with those exclusion criteria means we get 284 races with 5766 unique riders, some of which have attended more than one race. It is a small data set but it will have to do for now .
WKG
WKG should never be used as basis for categorization, but that doesn’t mean it is unimportant in explaining race results. Of course WKG must matter. It is a measure of fitness, one of many, but still an important one. So how is average WKG over races distributed in cat C?
This is interesting. We get something of a bell curve, but there is also a spike around the 3.2 mark. I have an idea why, but I will leave it to you to interpret it for yourselves.
A measure of success
We need a good measure for race results. Finish position just won’t do. Finishing 3rd in a race sounds pretty good, but there is a huge difference between finishing 3rd in a field of 50 riders compared to a race with only 5 in it. So we normalize finish position, i.e. we convert it into a number between 0 and 1, where a number close to 0 is a very good finish and close to 1 means finishing among the last. That makes races comparable. Let’s call this measure rank percent.
Then if we look at avg. WKG and this rank percent, how do they relate to each other? And is it any different from how average raw Watts relates to rank percent?
Looking at the left graph, we see that rank percent loosely follows avg. WKG, but with a rather wide spread. Also, it is not a linear relationship. Instead, it is something of an exponential curve. Then, if we compare it to the second picture, we see the same kind of relationship between avg W and rank percent, which isn’t really surprising. But we also get an even wider spread.
This is huge to me, the implications. Most races in Zwift are mostly flat, and we know, or at least we think we know, that on the flat raw power rules. WKG is for the climbs, right? Yet the narrower spread in the leftmost picture speaks clearly. Implied, WKG is actually a stronger potential explanatory variable for race results than raw Watts. This surprises me. (Divide into small groups and discuss.)
Correlation (value from 0 to 1) is a statistical measure of how closely tied changes in one variable are to changes in another. If we increase A, will B increase too? We can calculate the correlations between rank percent, i.e. our relative finish position, and avg. WKG and also avg. Watt. Let’s throw in Normalized Power too for good measure. The correlations between these three power related measures and rank percent are :
- avg. WKG: -0.55
- NP : -0.47
- avg. W: -0.46
(They are all negative which is because rank percent is reversed – 0 is good and 1 is bad.) The correlations, all around 0.5, are not super strong yet still significant. If we increase WKG, then it will have an impact on results as well, although not one-to-one. But it also implies that other things explain results as well. What could that be? Before we start looking for other deciding factors in races, let’s just pause for a minute and contemplate the fact that WKG is more strongly correlated with results than both NP and avg power.
HR
One aggregated measure that has interested me in the past is the max-to-avg HR ratio. I leave it to you to figure out why. But could a high HR ratio relate to results these days with ZRS?
The answer is no. Unlike in the previous graphs, there is no tendency here. Interesting and, I think, good news.
Modeling
Now let’s try to model what explains race results using the data we have available. I have picked the following variables from the data set, which all seemed potentially interesting:
- avg WKG
- avg W
- NP
- 20 m in W
- 5 min W
- 1 min W
- weight
- avg HR
- max HR
Then we need to pick a model, a method. I have evaluated a few but a method called random forest regression works pretty well here. I won’t bore you with the details, but let’s just say it is a type of statistical model that can be used to predict things. We plug in variables we think might explain something, like e.g. race results, and then we train the model on our data. We sort of let a computer tune it to the data. If we pick good variables and the model turns out good, then we learn something from it, but we can also use the model to make predictions with. Given so and so avg WKG, 20 min W, etc, what is the finish position? And we can find out if it is any good at making predictions.
So, did the model turn out well? Well, it depends on how you look at it. Models need to be evaluated and there are statistical measures for how accurate a model is. One suitable measure here is R2 (a value from 0 to 1). What it says here is, simplified, how much the race results can be explained by the 9 variables I picked. A 1.0 means they explain results perfectly, 100%, that they are all there is to it, so to speak. Anyway, after some tweaking, I could get the model up to:
Model R2: 0.42
You could say these 9 variables explain your results to a degree, 42% specifically, but that things are still missing in the model. What could that be? More Watt measures? Possibly, but we also already know that a million things like tactical decisions, strength of the field, etc matter a lot too. Some things are awfully hard to capture in data.
Anyway, I wouldn’t use this model to bet money on your races. Still, this is interesting. Even though there is more to it, these 9 variables do actually matter for results. Which of these variables matter the most then? There are caveats here, but basically the importance (again a statistical measure) of each of the 9 variables looks like this (they sum up to 1.0, or 100%):
Again, to the extent that the model can explain race results (42%), avg WKG turns out to be the most important factor. But 1 min W also seems to be rather important. It makes sense when you think of it. Avg WKG can loosely be explained as “fitness”. The fitter you are, the better you do. The ability to keep pressure on the pedals all through the race, come what may. Then, if you can keep up with the front until the finish, the race is decided in a sprint or otherwise short bursts intended to drop others. And that is the 1 min W. As for the other variables, well, they contribute to race results a bit, but not by that much actually.
So, wasn’t this interesting? Not really, if you ask me. All we did here was to look intra-race. It would be much more interesting to look at riders’ previous race data and see to what extent it can predict future race results. We will look at that in the next post.



