New racing categories

Thanks Zwift for keeping on trying with racing. The new idea to alternate racing categories so that there is a better chance for people to find a banding they can be competitive in is a great one!

5 Likes

Yes, it’s great. And it might placate all the whiners who used to be in the top of a category but now find themselves in the bottom of one and they just can’t cope with it at all.

I still don’t like the point span between categories though. It’s harmonized. Basically the same steps for every cat. It would have been fine if it wasn’t for the seeding.

The seeding is based on a physical performance measure. That measure can be expected to be more or less normally distributed over the subscribers of Zwift. Probably skewed but at least somewhat bell shaped.

This bell shape, with its abundance of riders with sort of in-the-middle 5 min power, is then translated in to a scalar points system, i.e. an even distrubution. They are slapping a bell shaped distribution onto an even distribution. Meaning a lot of people will end up with an initial seeding of about 250-400, a 150 point range. Whereas there will be far fewer riders in the 100-250 or even in the 400-550 range.

The points adjustments in races seem to be non-relative, i.e. they don’t take into account the actual ZRS’s of other racers like in an ELO system, or if they do, then it’s not by much. I assume here that the points adjustments in races are the same for any category. Even though the number of contestants vary from one category to the next.

And all of this is problematic. First, the upper and the lower cats have few participants and it wouldn’t have to be that way. You could have a fairly even distribution of participants in all categories if they hadn’t translated the seeding into ZRS in such a simplistic and thoughtless way.

Second, since this is so, it also means that the same score change will have a different impact on a racer depending on which category he is in. And, while not a gamebreaker perhaps, I still don’t think that is a good idea.

2 Likes

Yeah this is my only complaint.

The only races I ride tend to be 650-1000 races known for being hard, with lots of 900+ riders and orange tick riders in attendance.

My score is 645, and tanks down if I don’t finish in the first half, which is a tough assignment in races where the top 10 are all doing over 4.7w/kg average.

Yesterday I placed 16 / 37, just 32s behind the winner (2 orange tick riders on the podium) and only gained 2 points.

Other riders who are far less strong than I am in terms of watts, ride easier races (either lower level, or less well attended) and rack up points, meaning they have a higher score than me.

Of course, I could also do this and surely my score would increase.

But it is odd that a rider who consistently races with an average of 3.5 has a higher score than a rider who averages 4.5

1 Like

They do take it into account somehow, but I’d agree they don’t seem to do it well or enough.
The only way I’ve noticed is that I had a race where I had by far the highest score at the time and lost a few points for coming 2nd.
That race happened during the seed score V1 time, after the seed algorithm was changed and the scores recalculated I now show a small gain for that race.

Yeah, that is not only weird but an undesirable property of ZRS, if understood the right way. It should be results and results alone, somehow, affecting the score in races. Thus completely disjoint from any Newtonian measures once initial seeding is history. But… at the same time, let’s be real. You can’t just “race smart” (I hate that argument) as a 3.5 and expect to win against someone with 4.5. It just doesn’t happen because the difference is too big, and so it is proof of some undesirable properties. So, indirectly, the sort of hierarchy between different levels of performance should be reflected in ZRS, as a consequence. A results-based system that can reflect this is showing healthy signs. And at the same time you can’t hardcode Newton into the algorithm - we had that before and it was bad.

Very interesting. I want to give them credit for the new system. I do think they thought hard about it, and so it doesn’t surprise me that there is a bit of “relativity” in the score. But it’s no easy thing to balance a results-based system. I am convinced it is possible, to do it well enough at least. We’re not just quite there yet and I do think a bit more relativity would be good or even necessary. I guess one reason why you’d want to avoid it is that it complicates all calculations exponentially to the n:th power. Simple updates from a single race won’t require an enormous computing power, but if you’re to seed many thousands of subscribers and then slip into results-based mode from there, before everyone has a reasonably stable score… yeah, that would require vast cloud server resources and even then calculations may grind to a halt depending on formulas. But I hope they keep looking for a better middle way than this.

Still, though, have you noticed how the heated and quite active discussions from last year have died down lately? Or since v3 rather. I think there is a reason for that. So I’m optimistic.

Edit: With “middle way” I’m thinking that with a little creativity it is often possible to make some simplifications in calculations to make them manageable, working with proxies like averages or various ways of indexing, in order to take other participants scores and placings into account when adjusting your own score after a race. I don’t really like that not-cut-in-stone-yet-still-very-rigid tendency that those in the upper half get score upgrades and those in the bottom get downgrades. It’s a nice carrot to chase for if you’re on the cusp, I can attest to that. But it’s undesirable if e.g. you place on top of the bottom half over people with higher scores than you and still get a downgrade.

Yeah, if ZRS applied to real life, a lower tier ‘pro’ rider who podiumed a lot in local races would have a higher score than a world tour domestique who finished all three Grand Tours in the same season.

i mean, I have raced in a ‘harder’ Zwift race with some guys who have a significantly higher score than I do. They get dropped, I don’t.

There are so many ways to do this, although it’s not self-evident what would be the best way. You’d have to try, I guess.

The way they do it in x-country skiing is interesting, although their ranking is not translatable to Zwift since we have a lot of people coming and going and they don’t, and they have a distinct season, which we don’t. Etc. But anyway, the ranking is calculated with a formula where the inputs are, among other things, the time difference between winner and a participant to be calculated, and also a factor which is different between types of races (long, short etc). It’s a relative rank indeed.

I also always kinda liked the US cycling system with a points system. We can’t use it either for the same reasons as with skiing. But I like the feature where points are distributed according to finish position - most points going to no 1, a little less to no 2, and so on, and with the depth of the points payout scheme varying depending on size of field. And once your total season score is high enough you are eligible for an upgrade, or low enough for a downgrade. It’s sort of pseudo relative.

Anyway, I really think they should consider at least average ZRS of both those placing ahead of you and those behind you. And they should have an ELO like factoring, so that if you place ahead of a crowd scored higher than you, then you should get a considerably bigger score upgrade than if you place ahead of equals (you’d have to eliminate tanking racers first of course). I haven’t tried to reverse engineer any numbers, but the impression is they don’t really do any of those two, at least not both. Could be wrong, but in that case there are other things going on at the same time in the score calculation and they should weight the above things more aggressively, give them a bigger impact.

Or use the VELO method already made in Zracing.app. To me those scores make sense. It does 99 percent of what we want. No need for all this ‘complexity’ nor need of huge amounts of cloud resources. However, this has been stated numerous times and still we are stuck with ZRS where you are punished for racing a lot, attending poorly attended races and huge discrepancy between power in a race.

3 Likes

OK, I’m going to say something very important to you and everyone else here.

First, I have the utmost respect for Bruno Gregory, the creator of vELO. He understands data science for real. I would guess he doesn’t identify as a data scientist, but I’m sure a lot companies looking for a data science consultant would have him.

(Note: Data science is not an academic subject. Well, actually it is. But it is the application of statistical and mathematical method on data. A data scientist is not an academic title but a business title. It’s the guy creating, among many things, the algorithms that make you see things on SoMe that someone else wants you to see because the data scientist has proved it is profitable or beneficial for the client/employer to have you see it…)

Anyway, I have read his informal “paper” on the development of vELO. The work really seems to be by the book. Solid stuff. And quite impressive too. Well, an important part of impressive results, when it comes to machine learning, lies in actually getting any results, something you can’t control. Either there are tendencies in data that can be discerned through machine learning methods and perhaps even, as in this case, be used for predictions, and that’s not thanks to you. Or such tendencies are simply just not there or there is too much noise in data. Still, it’s impressive work, vELO.

Second, I can understand the hype behind vELO. It came out of nowhere, and from a non-profit third party, at a time when Zwift racing was at peak poop, so to speak. Racing was more or less the same as always but people had opened their eyes and could see the problems the WKG categorization caused, but at the time Zwift wasn’t willing to change anything. A small community around vELO materialized, and these people felt that this was sooo much better than what we had. I can sympathize with that too.

So there is vELO. It’s tested, in a double sense. Both statistically as well as tried and tested through many races. People who got to know it liked it. And yet, Zwift decided not to go with it. Why? Morons! Why is that? Narrow-mindedness? Pride? Greed? (As in they don’t want to end up having to pay Gregory for thourough analysis of their own proprietary product?)

I am so glad they didn’t. For two reasons. First, it would have been a bad choice. The least bad choice, at worst, we’ll see. But as long as other routes had not yet been tried and tested, there was still hope for something else, something proper. Second, Zwift’s decision seemed to suggest that they had actually understood what I had been yapping about for years, and, given such a benevolent interpretation, that was heart-warming.

You see, vELO is fantastic as an analytical tool. The predictive power of it is staggeringly high, all things considered. It may not seem like it just looking at the numbers, but with some prior experience of trying to predict complex things…

As an illustrative example, I remember this class back when I was a student, where the teacher gave us this course project/group assignment to predict the stock market index, using multiple regression (which Gregory tried first with vELO, before evaluating other, better methods to predict race results). We were allowed to use any variables, any data we could get our hands on as input - oil prices, interest rates, astrological conjunctions, whatever. It taught us the method well, but the project as such was doomed to being with, and of course the teacher knew that. There is too much noise from millions of unknown variables affecting the decisions of the players in the market, and should you ever find the faintest trace of a pattern, someone will already have spotted it before you and exploited it to make a profit, which in turn makes the pattern disappear. You simply can’t predict the stock market, no matter what the silly TikTok bros claim. But apparently, you can actually predict race results in Zwift with fairly high accuracy. Amazing!

So vELO is a fantastic analytical tool. But it doesn’t cut it as a dynamic, progressive method to categorize racers in order to make Zwift racing sports-like. The majority of the features (variables, input parameters) in the model are Newtonian measures. Racer physical performance - Watts and stuff. Only three features have to do with past race results. The model is much more sophisticated than the 20 min WKG categorization of old as well as the later “CE” model, sure, but essentially they are the same. And this is not what I want. Nobody should want it, because it kills the sport.

vELO is a predictive model. Imagine a perfect vELO. One that doesn’t predict a racer’s placing in the upcoming race with just a fair degree of certainty, but one that predicts it exactly. If the model says you’re gonna finish 4th given your best effort and given the competition, you finish 4th. Complete determinism. Then why bother to race at all? There is no point.

From there, now imagine a less-than-perfect vELO. I.e. vELO as it stands today. The model might still predict you end up 4th, which makes you suitable for the category in question, you do seem to belong there, but it places you in the upper end among all the participants. Why race? You race because you know the prediction, i.e. your categorization, isn’t perfect. Your upcoming race result isn’t entirely determined by past Watts and placings. Maybe you have improved since? Maybe someone ahead of you is going to have bad legs? It says 4th but you could still win. Well, isn’t that a good system then? No! If you win, you only win because you “beat the system”. You can only improve your racing in the cracks and blind spots of the model. So what good does the model do you? “It groups me together with racers of somewhat equal prowess”, you say, “which makes for fun racing.” Yeah… it works in the here and now. It works for the next race. But it doesn’t work for Zwift as a sport.

What you want is a system that rewards accomplishments and improvements. A system that promotes you through categories, looking forwards, not backwards. You did good? Good boy! Here are some points, or here is a rank score upgrade, as a reward. Collect more of that and we will promote you to the next challenging category. This is what we want.

vELO isn’t that system. It’s static. It doesn’t promote anything. I want nothing to do with it as a race categorization.

But can you really design a promotion system that works under our severe conditions? You know… No seasons. People coming and going all the time. People loafing, tanking, cruising. Various kinds of races. Well, whatever it is, it will never be perfect, that is for sure. But I still have hope for a system that is good enough. And we won’t know until we have tried. They try by tweaking and making changes, just like Bruno Gregory did, what he refers to when he talks about applying an agile method. We, in turn, try by testing and giving feedback. It’s the only way forward, the only alternative to giving up, to resorting to something like 20 min WKG. Or vELO.

Keep going, Zwift! This is the right track.

EDIT:
Two more things:

  1. People should stop staring at the ZRS. The exact number isn’t necessarily important. If a low-ranked rider keeps winning and winning in a category, then we have a problem.
  2. I was in a reasonably well-attended race yesterday. I knew I couldn’t podium, I just tried to place as high as I could. Afterwards, looking at the ZHQ data, I could spot one single rider who maybe loafed a little bit, but she wasn’t anywhere near the podium anyway, so I don’t see the slightest problem in that. Her choice. Other than that, everyone with HRM worked their behinds off. People really tried their best, and so every placing was deserved. This is to be predicted in a system that isn’t completely dysfunctional. It will necessarily drive people towards exerting themselves, just like in a crit or a cyclocross race. It’s full gas. I don’t think I ever experienced races like this before. It’s a sign of health. A sign we are getting somewhere. We shouldn’t stop here, but progress is happening for real this time.
1 Like

One thing I hate about Zwift Racing Score is because post-race points allocation isn’t based on a vELO system, the odds of everyone racing to the best of their ability are very low.

If I’m going to get a points increase for simply finishing in the top ~55% of the field, regardless of whether on paper I’m only in the worst 25% of the field by ability and relative abilities don’t change the points allocation, why would I risk increasing my score from 178 to 180+?

At 180+, I’m going to be allocated to pen D for all the community events using default pen ranges, which iirc is 180-350.

It’s bad enough trying to race with some who are under 210 in the pen E range 2 Zmonthly races and the race ability falling under 260 for pen E Tiny Races is insane.

Essentially flat races bore me to tears, but there are numerous racers with a Racing Score under 260, who are blasting out in excess of 3.2W/Kg (the old weakest pen zMAP limit under Category Enforcement for 6mins) for well in excess of 10mins on races with longer climbs of approx 260+ feet.

So as I say chapeau to Zwift for keeping on iterating. And the new more granular categories make racing enjoyable for more people.

But if you have categories at all, you’re always going to have people gaming them. I don’t blame the at all. I’d do the same to remain in a category that’s competitive for me.

I would love it if I could sign up for a race at a certain time. Then Zwift looked at all the people signed up, say 2-5 mins before start, and then assigned them to the the group that was most competitive at that time, no categories. The algorithm could assign pen numbers based on optimum splits on the numbers who had signed up and drop them in.

1 Like