Race Data Analysis #1

Andreas_Traff · May 31, 2026, 1:10am

I thought we could take a look at race data and try to figure out what it tells us about Zwift racing in a few posts. This will be the first one.

Background

Few will remember, but it has been a few years since I stopped spamming the forum with opinions and race data. The reason I stopped was that my goal was achieved. Zwift moved from the WKG based race categorization to a results-based one. (Oh, and I got long-covid too, which made racing a bit less fun, but that’s another story.)

I do have opinions on ZRS. It is not perfect. But the important step was this move from a very unhealthy system. Which means that I can look at race data today with just curiosity and no real agenda.

I could be wrong, but my impression a few years back was that Zwift didn’t do much race data analysis. Today I think they do. Zwift provides us with plenty of data, but they tell us very little of their own analysis of that data. It is understandable that they don’t. They have no real reason to. In fact, there are reasons why they shouldn’t. The obscurity of the ZRS model, for example, makes sure we can’t exploit it easily, the way we exploited the simple WKG system.

Third parties, e.g. ZwiftRacing, and ZwiftPower before Zwift acquired it, don’t really share much either. But us Zwifters have no stake in this other than our commitment to the racing itself and we don’t have access to the actual Zwift database other than the parts shown to us in the Zwift app or on ZwiftPower. So why not allow ourselves to guesstimate a little based on what little we can see, just out of curiosity? (It’s fun!)

Does the data we are shown explain race results?

ZwiftPower has limitations. We only get to see 400 pages of events. It may seem like a lot, but with today’s heavy calendar that only amounts to a month’s worth of racing. But let’s start with that. If we look at our row in the results table for our latest race on ZwiftPower, does the data displayed there explain our finish position?

We filter out all time-trials and funny formats and just look at the mass-start races. We shouldn’t mix categories, so we can look at cat C specifically and only races with at least 5 finishers. Cat A is its own beast and differs in some respects, but between B to E, C is the largest category, meaning we get more races we don’t have to filter out because of low attendance. Also, there are, I think, no strong reasons to believe that what might explain results in cat C should be much different in B or D.

One month’s racing with those exclusion criteria means we get 284 races with 5766 unique riders, some of which have attended more than one race. It is a small data set but it will have to do for now .

WKG

WKG should never be used as basis for categorization, but that doesn’t mean it is unimportant in explaining race results. Of course WKG must matter. It is a measure of fitness, one of many, but still an important one. So how is average WKG over races distributed in cat C?

This is interesting. We get something of a bell curve, but there is also a spike around the 3.2 mark. I have an idea why, but I will leave it to you to interpret it for yourselves.

A measure of success

We need a good measure for race results. Finish position just won’t do. Finishing 3^rd in a race sounds pretty good, but there is a huge difference between finishing 3^rd in a field of 50 riders compared to a race with only 5 in it. So we normalize finish position, i.e. we convert it into a number between 0 and 1, where a number close to 0 is a very good finish and close to 1 means finishing among the last. That makes races comparable. Let’s call this measure rank percent.

Then if we look at avg. WKG and this rank percent, how do they relate to each other? And is it any different from how average raw Watts relates to rank percent?

Looking at the left graph, we see that rank percent loosely follows avg. WKG, but with a rather wide spread. Also, it is not a linear relationship. Instead, it is something of an exponential curve. Then, if we compare it to the second picture, we see the same kind of relationship between avg W and rank percent, which isn’t really surprising. But we also get an even wider spread.

This is huge to me, the implications. Most races in Zwift are mostly flat, and we know, or at least we think we know, that on the flat raw power rules. WKG is for the climbs, right? Yet the narrower spread in the leftmost picture speaks clearly. Implied, WKG is actually a stronger potential explanatory variable for race results than raw Watts. This surprises me. (Divide into small groups and discuss.)

Correlation (value from 0 to 1) is a statistical measure of how closely tied changes in one variable are to changes in another. If we increase A, will B increase too? We can calculate the correlations between rank percent, i.e. our relative finish position, and avg. WKG and also avg. Watt. Let’s throw in Normalized Power too for good measure. The correlations between these three power related measures and rank percent are :

avg. WKG: -0.55
NP : -0.47
avg. W: -0.46

(They are all negative which is because rank percent is reversed – 0 is good and 1 is bad.) The correlations, all around 0.5, are not super strong yet still significant. If we increase WKG, then it will have an impact on results as well, although not one-to-one. But it also implies that other things explain results as well. What could that be? Before we start looking for other deciding factors in races, let’s just pause for a minute and contemplate the fact that WKG is more strongly correlated with results than both NP and avg power.

HR

One aggregated measure that has interested me in the past is the max-to-avg HR ratio. I leave it to you to figure out why. But could a high HR ratio relate to results these days with ZRS?

The answer is no. Unlike in the previous graphs, there is no tendency here. Interesting and, I think, good news.

Modeling

Now let’s try to model what explains race results using the data we have available. I have picked the following variables from the data set, which all seemed potentially interesting:

avg WKG
avg W
NP
20 m in W
5 min W
1 min W
weight
avg HR
max HR

Then we need to pick a model, a method. I have evaluated a few but a method called random forest regression works pretty well here. I won’t bore you with the details, but let’s just say it is a type of statistical model that can be used to predict things. We plug in variables we think might explain something, like e.g. race results, and then we train the model on our data. We sort of let a computer tune it to the data. If we pick good variables and the model turns out good, then we learn something from it, but we can also use the model to make predictions with. Given so and so avg WKG, 20 min W, etc, what is the finish position? And we can find out if it is any good at making predictions.

So, did the model turn out well? Well, it depends on how you look at it. Models need to be evaluated and there are statistical measures for how accurate a model is. One suitable measure here is R² (a value from 0 to 1). What it says here is, simplified, how much the race results can be explained by the 9 variables I picked. A 1.0 means they explain results perfectly, 100%, that they are all there is to it, so to speak. Anyway, after some tweaking, I could get the model up to:

Model R²: 0.42

You could say these 9 variables explain your results to a degree, 42% specifically, but that things are still missing in the model. What could that be? More Watt measures? Possibly, but we also already know that a million things like tactical decisions, strength of the field, etc matter a lot too. Some things are awfully hard to capture in data.

Anyway, I wouldn’t use this model to bet money on your races. Still, this is interesting. Even though there is more to it, these 9 variables do actually matter for results. Which of these variables matter the most then? There are caveats here, but basically the importance (again a statistical measure) of each of the 9 variables looks like this (they sum up to 1.0, or 100%):

Again, to the extent that the model can explain race results (42%), avg WKG turns out to be the most important factor. But 1 min W also seems to be rather important. It makes sense when you think of it. Avg WKG can loosely be explained as “fitness”. The fitter you are, the better you do. The ability to keep pressure on the pedals all through the race, come what may. Then, if you can keep up with the front until the finish, the race is decided in a sprint or otherwise short bursts intended to drop others. And that is the 1 min W. As for the other variables, well, they contribute to race results a bit, but not by that much actually.

So, wasn’t this interesting? Not really, if you ask me. All we did here was to look intra-race. It would be much more interesting to look at riders’ previous race data and see to what extent it can predict future race results. We will look at that in the next post.

David_Galbraith · May 31, 2026, 7:27am

Looking at NP/kg might be interesting …

RogueTrail · May 31, 2026, 11:00am

Yeah, I agree. I have w/kg and NP/kg setup in one of my data overlays during rides. NP/kg, for me, better reflects my perceived exertion. Also I use the delta between w/kg and NP/kg to gauge how ‘smooth’ my ride is - more surging and uneven pacing means bigger delta.

Andreas_Traff · June 1, 2026, 5:59am

Just the fact that both NP and kg are both in the model captures some of it, but that’s an interesting idea I’ve never thought of. I like the concept of NP. NP/kg makes a lot of sense intuitively.

But it touches on a problem I decided not to weigh down the already long post with - multicollinearity. Some of the 9 variables are highly correlated with each other. They are not independent. Particularly the various W measures.

In other types of regression models, this is a big problem. Less so with random forest but still a problem. The best would be if I had the treasure trove Zwift has, the full data set, or access to the API ZwiftPower must have used back in the day when community members ran it. Then, instead of using a few select power measures, we could use the entire power curve as “one” variable (a long array of numbers).

But I’ll look into NP/kg for the next post.

Tom_Shelton · June 1, 2026, 7:42am

No, because the data that we are shown is only watts or w/kg. Neither of those correlates well with speed over a wide range of rider weights, let alone the impact of height or even bike and wheel choice.

If you don’t account for those things, by coming up with a formula that maps power and weight (and ideally height) to speed, then you’ll never find a good correlation.

Andreas_Traff · June 2, 2026, 10:02pm

Well, weight is there already, also as a separate variable. I have skipped height here because the impact is fairly small even though it’s there of course. Bikes and wheels we can never get data on, sadly, although there too the impact isn’t huge.

In other human studies similar to this you rarely get R2’s much higher than 0.4. The rest, what’s missing, is “noise”. I.e. all those little variables affecting an outcome that you couldn’t get to or didn’t even know about. Us humans are so charmingly unpredictable, or… hard to predict at least. The by far best predictor of things is almost always just “past behavior/performance”.

And speaking of, don’t worry. I’ll show you a simple yet quite decent model soon. I have one in store… Those power metrics are actually quite powerful. I just didn’t want to bloat with too much information at a time.

Topic		Replies	Views
Race Data Analysis #2 Racing	4	372	June 10, 2026
Race ranking on Zwift different from Zwiftpower Racing	16	1952	October 14, 2023
Racing Classification Racing	40	3604	April 12, 2021
Did I calculate my race category wrong? General Discussion	55	7772	April 14, 2019
Minimum category calculation Racing	38	3516	December 16, 2021

Race Data Analysis #1

Related topics