Race Scoring (Ranking) - a new discussion

Really interesting stuff James, thanks.

Are these current system flaws likely to impact Zwift’s forthcoming ranking system and are they easily fixable?

Surely any such loopholes need to addressed in tandem with the new system otherwise, what’s the point?

Just looking at how the CE system has been water-down and fudged to placate a vocal minority, I don’t really have much faith in any Zwift racing system being designed with fairness being one of its cornerstones.

2 different things. The flaws with the existing system (either ZP cats or CE) are mainly around the fact they categorise on threshold, whereas the key factor for success is short duration power and repeatability, with just enough threshold to stay in the mix. Of course the benefit of a decent ranking system is that the ‘reasons’ for success are pretty irrelevant - as it is success itself that determines the ranking.

In terms of Zwift’s upcoming rankings, I am still hoping to steer them towards this methodology. The flaws with the ZP/USAC ranking system are explained in previous posts, and are patently obvious when you compare the results of the ELO-MMR approach to that one.

This approach (ELO-MMR) is born from online gaming. It is developed with that in mind, with all the constraints and challenges that go with it. I don’t think it is a coincidence that I researched ranking systems and settled on ELO-MMR as a great fit for Zwift, and soon afterwards found a member of the community who had developed a ranking system already based on exactly that methodology - because if you think about it properly, it’s a no brainer.

2 Likes

Very impressive that you were able to come up with this in such a short amount of time. I hope Zwift can be convinced to give this system a look considering how quickly it can be applied

It wasn’t me. I had the same idea but someone far cleverer than me had already done it!

2 Likes

i see the logic but i am seeing people who can’t do 8+wkg for 2 mins in the top 10 already. the best on zwift should really be capable of 7-8wkg for 3 mins, let alone 2, if you are designing a ranking that would sort most riders for most courses.

i am not a sport scientist so i am not sure where the cross over point between W’ and whatever the correct term is for an aerobic dominant effort lies exactly, but i assume it’s different for every person and probably between the 2-4min range

Perhaps true, but it ranks based on results not physiological metrics (new riders aside), so if they keep beating better riders (maybe through careful course selection or race type) then so be it. If they race those 8+wkg riders and lose as you are expecting them too, then they will lose ranking.

Also you haven’t seen that they can’t do it - you’ve seen that they haven’t done it in a race. Maybe the same, maybe not.

Let’s take ranked rider #6. They look like an anomaly on physiological metrics, but in their 21 races they have podiumed 17 and won 7. There comes a point that they need to face better riders - which is what pretty much everyone has asked for.

Of course, being able to actually set up races based on ranking forces the issue somewhat, which isn’t available at present. (Not in this system there are 7 ranks, with the bottom 5 ranks having 3 sub levels, so there are 17 levels in theory. Even if race organisers couldn’t customise the boundaries exactly, if they could chose which ranks go in to which pen this would offer ‘enough’ dynamism.

In a way showing you the top of the rankings chart was not the best idea. That is not the true value in this system when comparing to ZP. The true value is for the other 30,000 racers who race infrequently or at a much lower standard.

1 Like

Thanks for sharing James, much appreciated.

I wonder if W’/kg would be a more relevant metric, as the benefit of W’ is weight dependent. Maybe doesn’t change much in the large picture and for highest raked racers.

I think the formular in column N (weight) contains an error, why riders at and above 100 kg = 10 kg in the sheet.

Can you share which period this files covers?

Ah I removed some decimal places which must have limited it to 2 characters. Ooops!

It covers the whole of 2022, but there are 3 seasons in a year. Rankings are ‘sort of’ reset at the beginning of each season (you don’t start from scratch, but the weighting of results resets). I probably haven’t explained that well, more detailed info to come ASAP. This ‘seasons’ approach is typical in online gaming to keep things interesting, and that’s why it was included. It also means you can have end of season champions and such.

I’m not sure this is true, in that I have never seen W’ used in this manner.

1 Like

Compund Score looks to show a pretty good correlation with ranking for the top 10k racers.

1 Like

well, regardless, its the closest thing to a good system i’ve seen anyone actually propose since it’s simple enough and targets the efforts that actually decide zwift races. whether it’s better than a cat system or not, hard to say without seeing it in practise. but i’m not against it

there are benefits to being light too, despite what zwiftinsider says. if you have a large W’ at a low weight then the metabolic cost of a deep effort is lower and higher repeatability is possible. the problem with being light on zwift is that most light people have never performed a deadlift in their lives so obviously the bigger guy (who has probably also never performed a deadlift in his life) is going to win that sprint

2 Likes

I agree with you. My question was more if raw W’ is such an important contributor to the position as indicated by the trendline. At CP = 4 W/kg a 60kg rider and a 70 kg rider needs 18000 vs 21000 J to do 5m = 5 W/kg. So both needs 300 J/kg for the “same” effort.

depends on the length of the race, i’d say. i’ve observed it in KISS 100 races enough, there’s only so much food you can take on. i can get to the end of one in B on a bag of haribo and a single bidon and do a spectacular finish but a 90KG guy is going to be suffering even if you have him on an IV and drip feed him lucozade for the entire 3 hours

but most races are shorter than an hour so probably not an issue

4 Likes

There are many interesting things you can glean from the data. However they are just observations really, there are many many variables at play and they are not the same variables for the top riders exclusively racing other top riders, versus the occasional dip-in C racer. For example race tactics. One thing I have noticed floating between B and A is that in higher-quality B fields, the start is far tougher than A cat, and stays tough for a long time. My guess is primarily due to a slightly heavier average weight and generally less race craft. If you watch the Zwift Grand Prix you will see power drop very very low where it doesn’t make sense for anyone to attack - and then when it goes off, it really goes off. The joy of this system is that the most effective riders, for whichever physiological or tactical reason, will rise to top. Which is great, but what is better is that a returning C racer after a few more Summer beers than most will immediately be able to enter enjoyable races with all to play for.

4 Likes

chart

I created a chart for rating vs different W/kg duration.

Observations:
Generally, we see that rating is more or less linearly correlated with W/kg for the different durations.
Maybe the graphs break a bit for ratings above ca 4400. This means it requires more W/kg to increase the rating above this rating.

The gradient gets less steep with longer durations - Short power duration is more decisive and longer power duration less decisive.

We can see that 5 and 15-sec W/kg have about the same gradient - They are equally important.
Also, note that there is very little difference between 20m and CP - CP is not a better metric than 20m in predicting rider ranking.

We also see that the variance increases with higher ratings. This is probably because there are more data points at lower ratings.

EDIT: Note that the data series average to make them less noisy and make it easier to see the trends.

7 Likes

A couple of interesting ‘state of Zwift racing’ stats (we’re getting close to release now!)

Since the start of September, 34,360 Zwifter’s have completed a race.

Of that, 9783 (over a quarter) have only done 1 race.

7743 have done 5 or more races.

5 Likes

Hey James, great stuff thanks for digging in! I love reading about this stuff. Two thoughts:
I did some similar analysis in the winter/spring and summarized my takeaways in some zwiftinsider articles (can’t include links :confused:)

  • zwiftinsider: racing-landscape-1
  • zwiftinsider: racing-landscape-1b
  • zwiftinsider: racing-landscape-2

In particular, on the last one I was looking at how predictive ZP race rankings were, even between cats (eg will a rank 200 B actually beat a rank 300 A - typically not). Would probably be resolved if the CE cats were wholesale replaced, but some interesting data.

I think my main concern with Elo (having quite a bit of experience with it in Chess, StarCraft, LoL, etc) is that sandbagging / “smurfing” (as it is sometimes called) is still very much a problem in those games. So Elo doesn’t actually solve that -

I appreciate you quoting the actual abstract that says Elo accounts for it, but the key assumption that is based on is that everyone is optimizing to get as low ranking as possible. In fact, that’s not really the case in those games, nor will it be in Zwift - often people just want to enter races they can win regardless. Deliberate poor performance is counterproductive for your ranking, but not for your likelihood to get on the podium on your next race.
Talk to folks from games that actually use an Elo system and they will tell you sandbagging is a problem. Because zwift is a mass race format, it gets even more exaggerated (eg if 1/10 people are sandbagging in StarCraft, you will run into them once every 10 games, in Zwift, 1/10 means pretty much every race)

If I understand your solution of not allowing your rating to degrade - it is sort of not really doing the Elo system then. I believe fundamental to that algorithm is when you lose your rating goes down, when you win it goes up. If you restrict it so it only goes up weird things will happen and it’s not Elo anymore.

All of this to say, I think some version of “best x performances in last y days” is required to minimize sandbagging, which essentially gets you back to the ZP ranking / USA cycling system. So I think Zwift is probably on the right track

The only edit I would suggest would be a separate ranking by event type (climbing, short, long, etc) and potentially some sort of rating floor set with 5min power metrics (what I would consider to be the power length required to finish with the front group)

Loving the discussion overall though

3 Likes

Hi Joe,

Some of those points were more just putting different ideas out there, rather than my actual recommendation. All of your points are very valid, but they have been addressed to an extent (not saying it’s perfect, but 1000x times better than ZP/USAC) by the ELO-MMR system detailed higher up this chain (ELO has some clear flaws, which this extended approach helps to solve - primarily via the Bayesian methodology used).

Can you lower your rank by deliberately performing badly? Yes. Would it pay off versus the commitment required? Really, no. There will always be ways to game the system, the goal is to make it so pointless / boring that no-one bothers. Getting rid of the limited A-D thinking is of course critical to this.

I will DM you so you can check out the ranking system as it is now alive and soon to be shared publicly. Feel free to do some analysis on it, particularly if you want to compare it to the ZP rankings. You will see, especially for lower ranked riders, it does an incredible job and really you cannot compare the two (The #1 issue with ZP/USAC is that you have 20,000 racers with practically the same ranking with vastly different ability).

1 Like

James, is this blue-sky thinking or a viable alternative to Zwift’s new ranking system?

It exists! (in beta). I’ve put together an article alongside its creator, giving an overview of the system, the thinking behind it, and some next steps. Will be shareable soon (this week).

5 Likes

Exciting :+1:

How and where will this be implemented? Are Zwift on board with it?

Sorry if I’ve missed all the background on this. I’ve not really been folllowing the discussion too closely :blush: