Results/ranking based categories NOW!

People are tired of the sandbagging and constant abuse of the race category system by a few against a majority of paying subscribers, ruining the fun and motivation of thousands. Nobody likes it. It cannot be justified. And there is only one way out of it. And that has nothing to do with Watt dampeners, cones, ghosting, enforcing ZP categories, introducing age categories or anything of the kind.

Instead the W/kg based categorization must be removed and get replaced by the only proven system within sports and esports that has ever had a chance to deliver fun, challenge, personal development and fairness across the board over all demographics and levels of fitness, skill or aptitude. And that is a results based system. Past achievements (not efforts) should determine your race category and nothing else, because nothing else matters more.

In the interview with the Zwift Podcast of Feb 6 Eric Min hinted at a future replacing of the W/kg categorization with some form of ranking or results based system. In the meantime we were to be given some sort of consolation in the form of various measures against sandbagging, which have since replaced each other because none of them really works. And the reason why they don’t work is that they don’t solve the underlying problem.

So why the delay? Why the hesitation? Why are we not seeing a results based categorization yet? Granted, it is no easy task to fine tune such a system, but it is not rocket science. It worked for chess players, what, a 100 years ago already? And many esports have already implemented their versions of player rankings and quite successfully too. Oh, and in the US this is how riders are categorized in IRL racing since forever. So obviously it would work for Zwift too.

Any race category system will suck for some, even a system where you have only one category, because that is also a categorization. The problem with the current system is that it is as bad as categorizing riders based on their shoe size. So if your race performance is 3.1 or 3.9 +/-0.2 or your shoe size is 10, then you are a contender by definition. And if you’re not, then you’re a loser. This will happen with any system. The problem with the current system is that it does not encourage category mobility, not enough. There is no inherent mechanics to put riders in their proper categories. You give riders the incentive to stay size 10 even when they are not. Not everyone will grab the opportunity to stay behind, but the opportunity is there and will always attract enough people to ruin racing for the majority.

Nevermind the weight or height cheaters, the problems are worse still. Look at the result list of any race, look at the participants and their efforts. In any race there will be people giving it all to do the very best they can. The zone 4 crowd. And then there are those who ease along in zone 3 until there is a hill or a final sprint. That’s when they drop the competition, while still carefully managing their average W/kg.

There will always be an overrepresentation of the Big Fish in a Small Pond type on any podium below cat A in Zwift, and in cat D and C it is particularly bad. I’m talking about the people to whom winning is so important that they’d rather stay behind in a lower category than move up to where they belong. Just so that they can win. Thereby they remove some of the incentives to participate in the race for everyone else and they also cause some other problems in races. We all know that. I’m perfectly fine with getting beaten by my superiors. Fair is fair. I want them to beat me. But I don’t want to pay to get beaten by cheaters. And these ‘effort managers’, who know how to stay within limits in a lower category, are effectively cheaters. They get rewarded for not doing their best.

Now imagine instead a system where past results will determine your category. If someone has a bad day and can’t or doesn’t want to give it all, then that is fine. But you can’t easy-mode yourself into wins if wins or otherwise good results is what moves you up to the next category. There will always be an incentive to do your best. There will be no way to stay behind in a category and keep winning at the same time. You will have to choose, if you even have the opportunity to. Category limits will still hurt those who happen to fall just above them, but at least the system won’t encourage dishonesty and sub-par efforts.

A results based system will not work well without certain standards when it comes to races. And this will put some pressure on the many clubs in Zwift who do an amazing job already in staging races. But I think most would agree that it would be worth the effort as it benefits everyone in the long run.

So what do I propose? Two things:

  1. Zwift should deliver on the promise ASAP. There should be a results based category system NOW.

  2. As for you Zwift riders… well, why don’t you do like me? If you can’t beat them, join them, they say. So why not help create urgency and emphasize the need for a new category system by breaking the current system apart? I’m dropping category now. Officially, it’s due to previous covid-19 infection, ‘corona kilos’ and a new set of scales. But I intend to crush in the category below me after summer and ■■■■■■ podiums from honest, hard-working, paying subscribers. I’m just waiting for the ZP 3 month cooldown. I suggest you do the same. Because this nonsense has to stop.

9 Likes

Well written @Andreas_Traff.

I would say I agree with 90% of what you said. I don’t completely agree with point 2 and the reason is that it will spoil the fun for other racers like you and me, It wont impact Zwift in any way.

This is not a new issue it has been asked for since 2017 one of the popular topics is this one: Auto-Assign Race Categories

But 3 years down the line we have moved past asking for races based on w/kg but rather performance.

Imagine how much fun we could have if every rider had a rank and we could set up handicap races based on rank.

Or races using dynamic ranks, so one day you would race in group 7 but the next in group 5 just because different rank people show up.

4 Likes

Schrodinger’s ‘cat’. It’s dead but somehow still alive.

I hate my point 2, I want to stress that. But I see no other way around it anymore.

You are welcome to follow my point 2 exploits on Zwift, Strava, Xert from now on. I had a shaky start yesterday as I rode a race just after posting here. I overdid it slightly. It takes some practice to cheat effectively apparently. The thing about cheating in Zwift is you want to win but at the same time you are running against a bunch of other cheaters in any race, so there’s a fine balance there. You mustn’t overdo it. But it’s probably like drafting, it just takes practice. I’m sure I’ll get the hang of it soon enough.

Anyway, I crushed them in the sprint yesterday, like I have been crushed so many times before by other cheaters. (Note: I have never crushed anyone in any sprint in any race before, ever.) There were plenty of green cones and dropouts in that race by the way, but I somehow managed to slip under Zwift’s radar at least, although I got a UPG on ZP of course.

The ambition now is to create a suitable set of 3 Best Races with an average that will have ZP allow me in races below my category after the summer. I expect to make it. Then I’ll crush some more.

I see no other option now than protest and a little civil disobedience. Like you say, 3 years later and nothing has happened so far. And what annoys me the most is that Zwift seem hesitant to make the shift by some weird references to subscriber happiness (listen to the interviews). As if the primary objective was to cater fo the cheaters, who are in minority at that. Who would openly object to a fairer race category system? ‘Nah, these races are too much fair game for my taste, imma cancel my subscription now!’ Really?

I’m sure it must be provoking that somebody steps forward and openly admits to cheating. And even worse, encourages others to cheat. On purpose. But look at it this way. This has been going on for a long time, on purpose, we are all aware of that. And still there has been far too little protest and far too little effort in finding the remedy to cheating. Nothing ever happened! Who ever stepped up to my defense when I got crushed over and over by cheaters? Nobody. Nothing more than weak gestures. And if nobody will help you, you have to help yourself. I’m merely trying to make the problem more visible, otherwise there’s nothing new really.

It’s harsh, I know, but I strongly believe that the only effective measure from a subscriber right now is to run the system into the ground for these two reasons:

  1. to try create urgency
  2. because you still can (you should never have been able to)

Otherwise nothing will ever happen. And I’m really sorry for all the collateral damage in the coming months, paying subscribers getting their experience ruined, by me. But I believe it will be for the better in the long run in a hopefully not too distant future. At this point it seems only a sacrifice will have a chance to pay off. Like we haven’t sacrificed ourselves already, to cheaters who didn’t want the system changed.

Turn on Zwift. Tune in to what is actually going on beneath the surface in races. And draw your own conclusions.

My own conclusion is it’s time to kill the Schrodinger cats for good. Go drop cat and help make it happen.
#turnontuneindropcat

1 Like

Fighting crime with crime is never a good approach. That being said, the rest of your points are correct. It is inexcusable that this has not been resolved yet. Zwift are going down completely the wrong rabbit hole with their anti-sandbagging measures. There are plenty of proven ranking systems in eSports that they can borrow from.

Unless the excuse is ‘meh, we’re taking your money anyway’.

3 Likes

This got zero traction but could we not take control with the tools available/soon to be available?

That’s EXACTLY what the excuse is. Only it’s not an excuse, it’s a reason.

1 Like

Garry, I this could be an excellent idea. Your system is simple (and simplicity might actually be a good thing here) but should also be effective. You start at the dirty end with W/kg because, well, you have to start somewhere. But then transition into a well-defined results based categorisation. 6 move up, 6 move down. Easy. Clear. Undisputable. I think it’s going to work splendidly for a single league. And - sounding pompous here - I salute your approach and intent.

As for Zwift as a whole, I think a new system would have to be a little more intricate as it’s going to have to span across various leagues and single races.

Why don’t we take the time to discuss exactly how such a Zwift-wide system could be designed? I have thought some about it, but there are certainly angles to consider that I’m not even aware of that some others may have spotted already. I’m just convinced it can be done, because it has been done before. No system will be perfect, but it can be a lot better than what we have and that’s good enough for me.

I will, however, continue to break the law without actually breaking the law until your league is up and other organisers follow suit. I want to see a movement here. Then things will materialise. They always do, even if it’s just money talking.

I figured another reason wkg cats are flawed, and it isn’t anything to do with cheating.

Speaking as a lighter rider who races B I usually put out 4wkg on the dot for races. I never podium because heavier riders will put out higher absolute power at 4wkg.
If I did match their performance I would have to put out something like 4.3wkg, which puts me firmly in the A’s. (not that I could)

If I can’t even podium in B, what would be the point of racing A?

If the pro teams that have made a huge “investment” in Zwift recently (mostly in time I would think, I doubt they are making any financial contributions to Zwift other than monthly subscriptions like the rest of us) had to deal with racing like we all do I think there would be a faster solution being presented. But it seems the pro teams get their very own races and invitation only events where sandbagging is not an issue (they should all be A+ or what are they doing on the team?).

Imagine the upcoming virtual tour de France, if I could enter it riding a spin bike that is grossly miscalibrated and blew apart the race. I bet the solution would come real quickly!

2 Likes

Chris, I am aware of that problem (I’m normally light myself, not just right now… err…). There is also another problem just as severe if not worse that also shows how flawed W/kg as basis for a categorization.

IRL specialization aside, you should never have to end up in the situation where you have to stop to think “Wait, I can’t join this race, it’s too long, 50k. I won’t be able to compete” or “Wait, I can’t join a sprint race because I might get upgraded and then I have to come in last in every race longer than a sprint from now on up in the next category”. But this is how people think today, because their concerns are very real.

1 Like

@Andreas_Traff Mike may be up to something. Let’s just say I don’t still don’t think it is a good idea to spoil races for other riders.

Currently we don’t see any broadcast of C and D races because it is a joke. The broadcaster has to ignore the front 20 riders because all of them are out for cat.

2 Likes

I have some other thoughts on the W/kg problem as well.

ZP has done loads for Zwift and the community. There is no denying that. But where we stand today, looking at the W/kg system in isolation, I believe they are more of a detriment than a helper, because right now they only serve to consolidate a fundamentally flawed system. The reason for this is the loose ties between Zwift and ZP, which leaves the W/kg system a headless monster that nobody fully controls despite best intentions.

If you have ever worked in the IT industry you will know that systems or projects without clear ownership always create problems. What I would do if I was in the upper Zwift management would be to either bring ZP closer or to distance the company from them.

Next thing I would do would be something slightly unusual. I would divide my reasonably sized data science team (that I would have hired since long ago) into two, allocate some time and resources to them among their other duties, and then have the two sub teams compete in coming up with a new system on paper, backed by data and ideally also user experiences. Then I’d bring the two teams together for a comparison and discussion. And then let them work on an optimal synthesis. Then just have the IT division develop the new system and push it. And never look back. Not such a huge cost but a big win.

It’s all about customer retention at the lower end, which is where most of the nasty stuff happens and where the subscribers are normally far less loud mouthed than me and where blind loyalty is not won yet - A and B riders may complain about things, but will keep paying anyway, just like hardcore gamers in triple A games. And customer retention has a massive effect on the bottom line. It’s about keeping and cultivating the subscribers that were won during the unfortunate corona crisis. Everybody wins. Even Zwift. Oh, except the cheaters.

5 Likes

Meanwhile, there are tons of games with online mode where you automatically join rooms that are based on your demonstrated results. And there is no “subscriber happiness” issues with that. You crush them, you will face tougher opponents. You get crushed, you get an easier set of opponents next time. Keeps everyone happy and challenged. Tested, proven, in real life just as much as in the online world.

You don’t even need fixed categories. You can auto-define and fill the cats based on attendance and rankings. I’m certain there’s off-the-shelf code for all that functionality - there’s off-the-shelf code for everything.

This would allow points-handicap races and TTs, points improvement challenges, name it. It’s simple, it’s effective.

It’s not coming soon to a screen near you.

7 Likes

I’m not sure anything off the shelf is compatible with Zwift.

2 Likes

I agree, I started a similar thread a couple months ago. I showed there the amount of overlap in finishing times between people in different categories, who ought to be racing each other. There are many threads on this general concept before mine too, but it seems the Zwift team doesn’t want to invest the money to make it happen.

3 Likes

I couldn’t agree more about the need for something akin to auto matching and/or performance base groups. Does anyone really think W/kg at FTP is a good solo metric to predict race performance?

As someone whose fitness has plummeted from parenting and life related time constraints, I am not worrying about ranking highly, but rather improving over my own level. I would argue that accurate categories are equally important to filthy casuals like me :wink:

What I really want it to ride hard with a group that pushes me to go harder, and motivates me to chase. I’m totally fine with a DNF or being le lanterne rouge but it is disheartening to see the “matched” group leave you like you aren’t moving when you are giving it your all.

I recently tried my first Zwift race and chose group C due to my ftp (a short sprint race, ouch as my VO2 max is still a work in progress). Since it was my first race I made some rookie errors (Trainer difficulty at 100% wasn’t great for the short downhill with a compact chainset, PS why is this a option in races?)

Off the start I realized I had made a mistake trying to hang on to a group that was clearly capable of going far far harder than me. Once I fell back I was able to clustered with a group that offered a genuine challenge and motivation to ride with. That was a lot of fun. But I would say I was really racing with 5 of 30ish people. Seeing the top cluster of Cs finish in the upper Bs made me wonder who they are trying to impress… It was interesting to that very few of the top spots were zwiftpower users.

I can see a lot of potential for fun in these races, but they would be more fun in a better matched group of people who you can draft with and be challenged by, or at least still see off in the distance.

6 Likes

You don’t even need fixed categories. You can auto-define and fill the cats based on attendance and rankings.

Very important point.

Some related random thoughts on cat systems and rankings below:

The W/kg cat limits were, I would assume, not arbitrary originally. Well, they were in a sense, but Zwift probably had good reasons for them. You can’t have too many categories because then smaller races will not get enough participants to make it interesting, so probably somewhere around 4, or 5 at most like in the US IRL system. And where do you draw the lines? I’m guessing they expected (and probably got) something that started to look like a bell shape, although perhaps a skewed one, with C and B as the big cohorts (meaning 5 cats could be a possibility) in terms of subscribers although with high activity in B and A. And D as the ‘growth’ cat, which a fair chunk of subscribers will only pass through and not stay in, plus a lower race participation. A is unattainable for a majority, the majority that is going to finance the whole enterprise.

But a results based system is never truly static. And so you don’t even have to try make it so. In fact, you definitely shouldn’t.

Typical league-with-divisions style rankings with a predetermined number to be culled from a cat and then moved either up or down doesn’t make sense when you have to handle loads of various races of which most are organized out of your direct control by the community itself. Tennis style rankings are absolute and theoretically very fair, but there are far too many Zwifters to manage such a system, it would just be a pain with little gain to even try and for many reasons. So you’d have to go with some kind of ranking score instead. Like the relatively pointless ZP rank (a shame really).

Score ranking systems share a common problem. Common in a double sense. There is always the issue with score stability and in more than one way. First you need to calibrate mobility, which is not trivial. What would warrant an increase in your ranking and by how much? You want reasonable mobility up and down. Not too fast, because then you’d just oscillate wildly around your ‘true’ rank. Not too slow or people get restless and give up. Rather, just right, or thereabout.

Second, there is another stability issue, and that is the risk of rank score inflation. If we think of cycling prowess as a typical human characteristic, it will, like everything else of the kind, have a gaussian distribution. I.e. Zwifters will fit under a bell shaped curve, with most being ‘average’ - they go under the fat middle of the curve - and smaller numbers in the weak and the strong end respectively. Now, if nothing else, the strong end tends to drive score inflation. Why? Because they keep winning. And if every podium increases your rank, even if only by a little… yeah, you get it.

Score inflation, however, is largely a ‘cosmetic’ problem, especially with cycling. One example of a system that suffered (or suffers) from inflation is the ranking that was/is used in the online computer game Dota 2, the world’s ‘biggest’ pro e-sport (not in numbers but in prize pool). Top players kept driving their rankings up and up and you could never really reach those numbers, no matter how talented you were, if you picked up the game late. So seen as a tennis style ranking it sucked because it did a bad job at ranking individuals against each other. But as a tool to pool appropriate levels of competition together and make fun and fair games it still worked pretty damn well - I actually think they loosened it up a little on purpose just to create a higher variance and make games more unpredictable and thus more interesting (although more streaky).

So all the developer had to do once inflation became too obvious was a) to hide the actual rank score and replace it with a symbol or, actually, a categorization (you’re silver, and you’re gold, and you there are gold with two stars and a swoosh, sort of), and b) to openly reset and recalibrate the rankings at appropriate intervals, to make sure newcomers had a chance to get on a level with the veterans and not trail behind. But that’s ranking score as a sum. A ratio based ranking will behave differently. There are many ways to design a system.

The system of systems, the so-called ELO system in chess, is also suspected [sic!] to suffer from inflation but only by surprisingly little considering everything. It seems stable enough over time. So it’s obviously possible to get it just right, or right enough. And cycling would be less afflicted anyway, since both chess and Dota 2 are games where two players or teams face each other, so on average and all else equal every second game you play you win and your ranking has a chance to increase (depending on how highly ranked your opponents are). But even the best cyclists typically don’t win every second race since there are so many participants in each race and so many factors at play that mediate success.

And besides, like Robert points out above, stability doesn’t have to be a big problem since category limits can be made dynamic. C is swelling too much? Make it smaller by adjusting the limits. It’s not much of a problem.

Mobility, though, can be a little tricky. When is a rider doing too well and in what contexts. And how quickly should we make that decision?

But the real challenge lies elsewhere. A chess game is a chess game, but in Zwift you need to design a system that can handle many different kinds of races. And how do you value those? Should a high placing in a sprint race award the same rank score increase as a win in a bambino fondo length race? And is a 33 km race worth as much as a 44 km race or rather only 75%, and why then 75% and not 80% or 65%? And by what factor should rank score gains be affected by the rank score of other race participants (or would it be too complicated to even consider those)?

Another issue is that you will need some kind of governance of race formats to help both the system and the clubs that organize races to fill the calendar with races that can fit into the cat system and not ruin it. And you also need to take full control and ownership of the categorization. It can’t be left to a third party to decide. Zwift would thus have to ‘intrude’ a little more on the community than they do today (plus allocate staff/resources to manage it all). I wouldn’t mind at all though.

Actually ELO (or the alternate whole history method used in GO) could be promising. Since ELO roughly translates to your likelihood of winning against someone it could form logical groupings. (which could be as broad or narrow as desired)

In various implementations of ELO you can score across a tournament. A single bike race is similar to a collection of one on ones in that sense.
Let’s say you race against 4 other people and place 3rd that’s 2 wins and 2 losses so 2 points. Now if the two people that beat you were higher ranked and two you beat were lower your score won’t change much as your pre calculated expected score will be close to 2. But if the two below you were higher ranked you move up (and similarly they move down). If you were all similarly ranked the scores would stay about the same. So this could tolerate the expected fluctuations in bike racing results and start to separate the broad clusters.

The main complication I initially see is selecting a good score muliltiplier (K) to calculate the change in score. I imagine you would want to correct for large vs small events via a score limit or variable K to avoid over inflating scores from one good well attended race.

Now in regards to long vs short races etc. That is a good point. I see two additional options. Ignore it an accept that people will have their specialties (and racing in a different format will go poorly), or you could do something sophisticated and build an event type specific score say TT vs under 1 hour vs over 1 hour events. I can see pro and cons for both and something in the middle.

The great thing is zwift has tons of data and they could test ranks like this on real data and see if the predicted ranks better align with actual results. Many Races seem to have all categories so you can look across cats as well.

  • edited out a comment on score inflation. I need to think on that a bit more, but I believe ELO has some favorable characteristics in this regard -
3 Likes

In various implementations of ELO you can score across a tournament. A single bike race is similar to a collection of one on ones in that sense.
Let’s say you race against 4 other people and place 3rd that’s 2 wins and 2 losses so 2 points. Now if the two people that beat you were higher ranked and two you beat were lower your score won’t change much as your pre calculated expected score will be close to 2. But if the two below you were higher ranked you move up (and similarly they move down). If you were all similarly ranked the scores would stay about the same. So this could tolerate the expected fluctuations in bike racing results and start to separate the broad clusters.

Yeah, I think a system would have to take the other participants’ ranking into account when calculating score change. Or you’d risk get a model with rampant inflation in any segment. It may seem complicated but it’s nothing compared to the complications you will run into if you don’t.

The main complication I initially see is selecting a good score muliltiplier (K) to calculate the change in score. I imagine you would want to correct for large vs small events via a score limit or variable K to avoid over inflating scores from one good well attended race.

Exactly. And that’s another aspect you bring up as well. Number of participants in a race must matter to some extent, though, or you could exploit small races, short-term at least, so you’d get too much variance in the system. And so you get another calibration problem.

Now in regards to long vs short races etc. That is a good point. I see two additional options. Ignore it an accept that people will have their specialties (and racing in a different format will go poorly), or you could do something sophisticated and build an event type specific score say TT vs under 1 hour vs over 1 hour events. I can see pro and cons for both and something in the middle.

Mhm, and that’s why I think Zwift would have to intrude more on the community. You may have to standardize races a little more. Right now almost anything goes. And I like the variety, I think most do. So you’d have to find some middle road there. Just like IRL. But a better cat system doesn’t have to prevent people from organizing weird single-occasion races. Like a 400 yard sprint or a quadruple 4 Horsemen in reverse or whatever. You shouldn’t thwart organizers creativity, but all races don’t have to be ‘official’ and affect rank. You could still use the official rank in ‘funnies’ to create good start groups. And people can even run leagues outside of the system, only seasonal changes in participant fitness won’t be reflected in what start group you get placed in - but it sure will matter for your results in the league still. It’s gonna work out fine enough and in a sense you come closer to how IRL racing leagues work anyway.

The great thing is zwift has tons of data and they could test ranks like this on real data and see if the predicted ranks better align with actual results. Many Races seem to have all categories so you can look across cats as well.

This!
They have sooo much exciting data! (I wish we could get a dump of it, that would be so cool to dig through.) And that’s why they need their data scientists (I sure do hope they have a DS team, sometimes I wonder). Because this is a classic machine learning optimization problem, nothing a decent data scientist couldn’t handle.

Sometimes I wonder, maybe they do have looked into it a bit and ran into perceived problems we haven’t even discussed here for whatever reasons. Simple things such as their current IT architecture being an obstacle in some way can matter. I just wish they would communicate more. We could work things out. Sure, they have their competitors to worry about, I get that. But you don’t need an audit or assessment by McKinsey to draw the conclusion that their customer loyalty (or let’s just call it for what it is: Love) is their biggest asset, not their business secrets. Proper asset management in the proper place.

1 Like

Brad, I hear you. You just wrote a description that could fit perfectly on me or any of A LOT of the Zwift subscribers. It’s exactly like that.

And let’s not forget that although the subscribers will never reflect ‘the average person riding a bike with a funny handlebar’, as they will be somewhat skewed towards elite and top amateurs, i.e. typical people stupid enough to dump a few thousand dollars worth of equipment and fees into a silly computer game, still, even so, Zwift has the same business situation as e.g. the top bicycle brands. While the elite customers may drive the direction the business takes and the way it develops its product, it’s still us, the hopeless, sucky, over-age riders, who look up to the elite but who will never see an A race or even a B race from the inside, that pay Zwift’s hefty cloud storage bills. We actually matter quite a lot. And it’s in our fold that the biggest category problems exist.

2 Likes