Results/ranking based categories NOW!

Chris, I am aware of that problem (I’m normally light myself, not just right now… err…). There is also another problem just as severe if not worse that also shows how flawed W/kg as basis for a categorization.

IRL specialization aside, you should never have to end up in the situation where you have to stop to think “Wait, I can’t join this race, it’s too long, 50k. I won’t be able to compete” or “Wait, I can’t join a sprint race because I might get upgraded and then I have to come in last in every race longer than a sprint from now on up in the next category”. But this is how people think today, because their concerns are very real.

1 Like

@Andreas_Traff Mike may be up to something. Let’s just say I don’t still don’t think it is a good idea to spoil races for other riders.

Currently we don’t see any broadcast of C and D races because it is a joke. The broadcaster has to ignore the front 20 riders because all of them are out for cat.

2 Likes

I have some other thoughts on the W/kg problem as well.

ZP has done loads for Zwift and the community. There is no denying that. But where we stand today, looking at the W/kg system in isolation, I believe they are more of a detriment than a helper, because right now they only serve to consolidate a fundamentally flawed system. The reason for this is the loose ties between Zwift and ZP, which leaves the W/kg system a headless monster that nobody fully controls despite best intentions.

If you have ever worked in the IT industry you will know that systems or projects without clear ownership always create problems. What I would do if I was in the upper Zwift management would be to either bring ZP closer or to distance the company from them.

Next thing I would do would be something slightly unusual. I would divide my reasonably sized data science team (that I would have hired since long ago) into two, allocate some time and resources to them among their other duties, and then have the two sub teams compete in coming up with a new system on paper, backed by data and ideally also user experiences. Then I’d bring the two teams together for a comparison and discussion. And then let them work on an optimal synthesis. Then just have the IT division develop the new system and push it. And never look back. Not such a huge cost but a big win.

It’s all about customer retention at the lower end, which is where most of the nasty stuff happens and where the subscribers are normally far less loud mouthed than me and where blind loyalty is not won yet - A and B riders may complain about things, but will keep paying anyway, just like hardcore gamers in triple A games. And customer retention has a massive effect on the bottom line. It’s about keeping and cultivating the subscribers that were won during the unfortunate corona crisis. Everybody wins. Even Zwift. Oh, except the cheaters.

5 Likes

Meanwhile, there are tons of games with online mode where you automatically join rooms that are based on your demonstrated results. And there is no “subscriber happiness” issues with that. You crush them, you will face tougher opponents. You get crushed, you get an easier set of opponents next time. Keeps everyone happy and challenged. Tested, proven, in real life just as much as in the online world.

You don’t even need fixed categories. You can auto-define and fill the cats based on attendance and rankings. I’m certain there’s off-the-shelf code for all that functionality - there’s off-the-shelf code for everything.

This would allow points-handicap races and TTs, points improvement challenges, name it. It’s simple, it’s effective.

It’s not coming soon to a screen near you.

7 Likes

I’m not sure anything off the shelf is compatible with Zwift.

2 Likes

I agree, I started a similar thread a couple months ago. I showed there the amount of overlap in finishing times between people in different categories, who ought to be racing each other. There are many threads on this general concept before mine too, but it seems the Zwift team doesn’t want to invest the money to make it happen.

3 Likes

I couldn’t agree more about the need for something akin to auto matching and/or performance base groups. Does anyone really think W/kg at FTP is a good solo metric to predict race performance?

As someone whose fitness has plummeted from parenting and life related time constraints, I am not worrying about ranking highly, but rather improving over my own level. I would argue that accurate categories are equally important to filthy casuals like me :wink:

What I really want it to ride hard with a group that pushes me to go harder, and motivates me to chase. I’m totally fine with a DNF or being le lanterne rouge but it is disheartening to see the “matched” group leave you like you aren’t moving when you are giving it your all.

I recently tried my first Zwift race and chose group C due to my ftp (a short sprint race, ouch as my VO2 max is still a work in progress). Since it was my first race I made some rookie errors (Trainer difficulty at 100% wasn’t great for the short downhill with a compact chainset, PS why is this a option in races?)

Off the start I realized I had made a mistake trying to hang on to a group that was clearly capable of going far far harder than me. Once I fell back I was able to clustered with a group that offered a genuine challenge and motivation to ride with. That was a lot of fun. But I would say I was really racing with 5 of 30ish people. Seeing the top cluster of Cs finish in the upper Bs made me wonder who they are trying to impress… It was interesting to that very few of the top spots were zwiftpower users.

I can see a lot of potential for fun in these races, but they would be more fun in a better matched group of people who you can draft with and be challenged by, or at least still see off in the distance.

6 Likes

You don’t even need fixed categories. You can auto-define and fill the cats based on attendance and rankings.

Very important point.

Some related random thoughts on cat systems and rankings below:

The W/kg cat limits were, I would assume, not arbitrary originally. Well, they were in a sense, but Zwift probably had good reasons for them. You can’t have too many categories because then smaller races will not get enough participants to make it interesting, so probably somewhere around 4, or 5 at most like in the US IRL system. And where do you draw the lines? I’m guessing they expected (and probably got) something that started to look like a bell shape, although perhaps a skewed one, with C and B as the big cohorts (meaning 5 cats could be a possibility) in terms of subscribers although with high activity in B and A. And D as the ‘growth’ cat, which a fair chunk of subscribers will only pass through and not stay in, plus a lower race participation. A is unattainable for a majority, the majority that is going to finance the whole enterprise.

But a results based system is never truly static. And so you don’t even have to try make it so. In fact, you definitely shouldn’t.

Typical league-with-divisions style rankings with a predetermined number to be culled from a cat and then moved either up or down doesn’t make sense when you have to handle loads of various races of which most are organized out of your direct control by the community itself. Tennis style rankings are absolute and theoretically very fair, but there are far too many Zwifters to manage such a system, it would just be a pain with little gain to even try and for many reasons. So you’d have to go with some kind of ranking score instead. Like the relatively pointless ZP rank (a shame really).

Score ranking systems share a common problem. Common in a double sense. There is always the issue with score stability and in more than one way. First you need to calibrate mobility, which is not trivial. What would warrant an increase in your ranking and by how much? You want reasonable mobility up and down. Not too fast, because then you’d just oscillate wildly around your ‘true’ rank. Not too slow or people get restless and give up. Rather, just right, or thereabout.

Second, there is another stability issue, and that is the risk of rank score inflation. If we think of cycling prowess as a typical human characteristic, it will, like everything else of the kind, have a gaussian distribution. I.e. Zwifters will fit under a bell shaped curve, with most being ‘average’ - they go under the fat middle of the curve - and smaller numbers in the weak and the strong end respectively. Now, if nothing else, the strong end tends to drive score inflation. Why? Because they keep winning. And if every podium increases your rank, even if only by a little… yeah, you get it.

Score inflation, however, is largely a ‘cosmetic’ problem, especially with cycling. One example of a system that suffered (or suffers) from inflation is the ranking that was/is used in the online computer game Dota 2, the world’s ‘biggest’ pro e-sport (not in numbers but in prize pool). Top players kept driving their rankings up and up and you could never really reach those numbers, no matter how talented you were, if you picked up the game late. So seen as a tennis style ranking it sucked because it did a bad job at ranking individuals against each other. But as a tool to pool appropriate levels of competition together and make fun and fair games it still worked pretty damn well - I actually think they loosened it up a little on purpose just to create a higher variance and make games more unpredictable and thus more interesting (although more streaky).

So all the developer had to do once inflation became too obvious was a) to hide the actual rank score and replace it with a symbol or, actually, a categorization (you’re silver, and you’re gold, and you there are gold with two stars and a swoosh, sort of), and b) to openly reset and recalibrate the rankings at appropriate intervals, to make sure newcomers had a chance to get on a level with the veterans and not trail behind. But that’s ranking score as a sum. A ratio based ranking will behave differently. There are many ways to design a system.

The system of systems, the so-called ELO system in chess, is also suspected [sic!] to suffer from inflation but only by surprisingly little considering everything. It seems stable enough over time. So it’s obviously possible to get it just right, or right enough. And cycling would be less afflicted anyway, since both chess and Dota 2 are games where two players or teams face each other, so on average and all else equal every second game you play you win and your ranking has a chance to increase (depending on how highly ranked your opponents are). But even the best cyclists typically don’t win every second race since there are so many participants in each race and so many factors at play that mediate success.

And besides, like Robert points out above, stability doesn’t have to be a big problem since category limits can be made dynamic. C is swelling too much? Make it smaller by adjusting the limits. It’s not much of a problem.

Mobility, though, can be a little tricky. When is a rider doing too well and in what contexts. And how quickly should we make that decision?

But the real challenge lies elsewhere. A chess game is a chess game, but in Zwift you need to design a system that can handle many different kinds of races. And how do you value those? Should a high placing in a sprint race award the same rank score increase as a win in a bambino fondo length race? And is a 33 km race worth as much as a 44 km race or rather only 75%, and why then 75% and not 80% or 65%? And by what factor should rank score gains be affected by the rank score of other race participants (or would it be too complicated to even consider those)?

Another issue is that you will need some kind of governance of race formats to help both the system and the clubs that organize races to fill the calendar with races that can fit into the cat system and not ruin it. And you also need to take full control and ownership of the categorization. It can’t be left to a third party to decide. Zwift would thus have to ‘intrude’ a little more on the community than they do today (plus allocate staff/resources to manage it all). I wouldn’t mind at all though.

Actually ELO (or the alternate whole history method used in GO) could be promising. Since ELO roughly translates to your likelihood of winning against someone it could form logical groupings. (which could be as broad or narrow as desired)

In various implementations of ELO you can score across a tournament. A single bike race is similar to a collection of one on ones in that sense.
Let’s say you race against 4 other people and place 3rd that’s 2 wins and 2 losses so 2 points. Now if the two people that beat you were higher ranked and two you beat were lower your score won’t change much as your pre calculated expected score will be close to 2. But if the two below you were higher ranked you move up (and similarly they move down). If you were all similarly ranked the scores would stay about the same. So this could tolerate the expected fluctuations in bike racing results and start to separate the broad clusters.

The main complication I initially see is selecting a good score muliltiplier (K) to calculate the change in score. I imagine you would want to correct for large vs small events via a score limit or variable K to avoid over inflating scores from one good well attended race.

Now in regards to long vs short races etc. That is a good point. I see two additional options. Ignore it an accept that people will have their specialties (and racing in a different format will go poorly), or you could do something sophisticated and build an event type specific score say TT vs under 1 hour vs over 1 hour events. I can see pro and cons for both and something in the middle.

The great thing is zwift has tons of data and they could test ranks like this on real data and see if the predicted ranks better align with actual results. Many Races seem to have all categories so you can look across cats as well.

  • edited out a comment on score inflation. I need to think on that a bit more, but I believe ELO has some favorable characteristics in this regard -
3 Likes

In various implementations of ELO you can score across a tournament. A single bike race is similar to a collection of one on ones in that sense.
Let’s say you race against 4 other people and place 3rd that’s 2 wins and 2 losses so 2 points. Now if the two people that beat you were higher ranked and two you beat were lower your score won’t change much as your pre calculated expected score will be close to 2. But if the two below you were higher ranked you move up (and similarly they move down). If you were all similarly ranked the scores would stay about the same. So this could tolerate the expected fluctuations in bike racing results and start to separate the broad clusters.

Yeah, I think a system would have to take the other participants’ ranking into account when calculating score change. Or you’d risk get a model with rampant inflation in any segment. It may seem complicated but it’s nothing compared to the complications you will run into if you don’t.

The main complication I initially see is selecting a good score muliltiplier (K) to calculate the change in score. I imagine you would want to correct for large vs small events via a score limit or variable K to avoid over inflating scores from one good well attended race.

Exactly. And that’s another aspect you bring up as well. Number of participants in a race must matter to some extent, though, or you could exploit small races, short-term at least, so you’d get too much variance in the system. And so you get another calibration problem.

Now in regards to long vs short races etc. That is a good point. I see two additional options. Ignore it an accept that people will have their specialties (and racing in a different format will go poorly), or you could do something sophisticated and build an event type specific score say TT vs under 1 hour vs over 1 hour events. I can see pro and cons for both and something in the middle.

Mhm, and that’s why I think Zwift would have to intrude more on the community. You may have to standardize races a little more. Right now almost anything goes. And I like the variety, I think most do. So you’d have to find some middle road there. Just like IRL. But a better cat system doesn’t have to prevent people from organizing weird single-occasion races. Like a 400 yard sprint or a quadruple 4 Horsemen in reverse or whatever. You shouldn’t thwart organizers creativity, but all races don’t have to be ‘official’ and affect rank. You could still use the official rank in ‘funnies’ to create good start groups. And people can even run leagues outside of the system, only seasonal changes in participant fitness won’t be reflected in what start group you get placed in - but it sure will matter for your results in the league still. It’s gonna work out fine enough and in a sense you come closer to how IRL racing leagues work anyway.

The great thing is zwift has tons of data and they could test ranks like this on real data and see if the predicted ranks better align with actual results. Many Races seem to have all categories so you can look across cats as well.

This!
They have sooo much exciting data! (I wish we could get a dump of it, that would be so cool to dig through.) And that’s why they need their data scientists (I sure do hope they have a DS team, sometimes I wonder). Because this is a classic machine learning optimization problem, nothing a decent data scientist couldn’t handle.

Sometimes I wonder, maybe they do have looked into it a bit and ran into perceived problems we haven’t even discussed here for whatever reasons. Simple things such as their current IT architecture being an obstacle in some way can matter. I just wish they would communicate more. We could work things out. Sure, they have their competitors to worry about, I get that. But you don’t need an audit or assessment by McKinsey to draw the conclusion that their customer loyalty (or let’s just call it for what it is: Love) is their biggest asset, not their business secrets. Proper asset management in the proper place.

1 Like

Brad, I hear you. You just wrote a description that could fit perfectly on me or any of A LOT of the Zwift subscribers. It’s exactly like that.

And let’s not forget that although the subscribers will never reflect ‘the average person riding a bike with a funny handlebar’, as they will be somewhat skewed towards elite and top amateurs, i.e. typical people stupid enough to dump a few thousand dollars worth of equipment and fees into a silly computer game, still, even so, Zwift has the same business situation as e.g. the top bicycle brands. While the elite customers may drive the direction the business takes and the way it develops its product, it’s still us, the hopeless, sucky, over-age riders, who look up to the elite but who will never see an A race or even a B race from the inside, that pay Zwift’s hefty cloud storage bills. We actually matter quite a lot. And it’s in our fold that the biggest category problems exist.

2 Likes

The points inflation and weighting by events problems are already solved elsewhere. I’m sure there are other similar systems, but the FIS ranking system, for example, works for events with large number of starters, uses best 3 ranks from top 5 event finishers to establish event base points, is asymptotic to zero (it’s a lower-is-better system, and racers gravitate towards zero without ever reaching it), and has a built-in points penalty system to cater for race type/duration (used in Nordic events). Racer points are based on best 5 events in last 12 months, which automatically makes your points degrade if you don’t participate in events to a certain regularity.

Forgot - individual points for an event are based on the race base points (3 best ranked in top 5), race penalty, and the ratio of an individual’s time over the winner’s. So no matter the race length or your placement, you always have an incentive to be as fast as possible, regardless of your position in the race. In other words: finishing within 1% of the winner’s time pays the same, regardless of the number of other racers between you and the winner.

The wheel exists, it does not need to be invented.

2 Likes

The points inflation and weighting by events problems are already solved elsewhere. I’m sure there are other similar systems, but the FIS ranking system, for example, works for events with large number of starters, uses best 3 ranks from top 5 event finishers to establish event base points, is asymptotic to zero (it’s a lower-is-better system, and racers gravitate towards zero without ever reaching it), and has a built-in points penalty system to cater for race type/duration (used in Nordic events). Racer points are based on best 5 events in last 12 months, which automatically makes your points degrade if you don’t participate in events to a certain regularity.

Forgot - individual points for an event are based on the race base points (3 best ranked in top 5), race penalty, and the ratio of an individual’s time over the winner’s. So no matter the race length or your placement, you always have an incentive to be as fast as possible, regardless of your position in the race. In other words: finishing within 1% of the winner’s time pays the same, regardless of the number of other racers between you and the winner.

The wheel exists, it does not need to be invented.

Robert, let me make sure I understand this. Correct me if I’m wrong, but what you’re saying is that there is already a working system in another sport that is not particularly sensitive to the number of participants in a race (except maybe races with very few participants) and not sensitive at all to race distance or difficulty because of the 1% thing? A system, even, that promotes a reasonable (as opposed to unreasonable) amount of race activity and that also promotes making your best effort in every race?

Wow.

Let’s do some thinking here. To implement such a system you would need to collect a whole lot of data of course and then do many calculations on it. So let’s go through the entire list of types of datapoints needed for a model implementation to work. For all participants in every ‘official’ race Zwift would have to collect:

  1. PLACING
  2. FINISH TIME

Looking right now at the Zwift Companion app, to my surprise I get the impression that Zwift does in fact already collect every point in this list. There is more to the model but they seem to be just constants. Zwift even seems to store the above datapoints long term, because I notice that I can go back in time and see my placings and finish times in races way back. I can also look at my competitors’ placings and finish times in those races.

So then, this big project, if we imagine for a second that Zwift would set out to implement a FIS style categorization, would require quite a bit of work on their behalf. Let’s see now. We play that we are chief product owners here for a minute, so let’s rough sketch this project from start to finish.

First, they would need to add at least one, probably two new fields to every subscriber profile. Two new columns in some database table. One for the rank score. And then, for convenience, another one for race category. This could take several minutes to add.

Then you need a server side application to calculate the stuff going on in the model for every race and then update the subscriber profiles with any resulting changes. You also need to feed the race reports. This is way worse. It might take a skilled developer a whole day to code all of this, but let’s be generous and give them a full week.

Then you need to visualize this somehow to the subscribers. Some changes are needed to the interface. This would be a completely different team’s responsibility, although one that would normally work in parallel with the model application team. But it’s gonna take quite a bit of time nevertheless. Let’s give the frontend team a week of their own too, and let’s make it a week where nobody else does any job on the project.

The model is going to affect race generation too or there would be no point to it. Instead of the current free choice, subscribers will now be forced into a specific category (making ZP redundant). This could take a whole day too but let’s plan for the standard week again.

We’re up to 3 weeks already! And as we all know any project always takes longer than expected, even when you try to account for that fact. But let’s cut them some preliminary slack and make it a full month.

Then, or actually before you start any of the above tasks, you need to have a couple of data analysts look at suitable numbers of categories, project optimal category sizes, look into the seasonality issue, some validity reality checks on historical data, report all of this back to some manager etc etc. That could take a full month.

Even before the analysts come into play, and in fact during the entire project, management needs some time to sit and discuss everything in virtual meetings (it’s still corona days after all). Should we really do this? It seems so scary! Will subscriber numbers take a hit? Risk / consequence analysis plz! Etc. Ok, let’s do it! How do we plan this? Let’s do it so and so. How is the model application team progressing? Any blockers? Blah blah. And so on. Let’s give them another 30 days to spread out over the project, during which nobody else can work on this because they are waiting for meeting appointments and manager decisions.

So that’s 3 months. That means we can look forward to a launch on September 13. Yaaay!

It’s mainly a matter of which year. Will it be 2018 or 2019? Or even 2020?

I seem to have a bit of a cold right now. Very inconvenient. Or I could have cheated today, that was the plan, honestly. But I promise to be a good boy and continue cheating properly like the others as soon as I feel better and get the opportunity. Because I still can, since Zwift and even ZP will allow it.

1 Like

That system is flexible enough to be used for world rankings in alpine skiing (with events duration in the 2 minutes range and time differences in the hundreds of seconds) as well as in XC skiing, with events duration in the hour or two range, with minimum adaptation. If there’s one winter sport that’s similar to cycling, it’s XC skiing.

You don’t need a field for category. In database designs, you generally want to avoid having a field that’s a direct calculation of another field.

1 Like

I have lost all faith in Eric min, seems to lack passion to fix anything. Persistent problems remain unaddressed and the yearly interview on zwifcast is getting embarrassing with the lack of development. If this ever gets fixed it will be the community without which zwift would be a shadow of what it is. The zwift M O of minimum acceptable change is long running.

1 Like

Correct me if I’m wrong, but doesn’t the FIS system fit into the traditional model of earning points over a season to rank in tournaments? (I.e. world cups)

One of the things I conceptually like about an ELO type system is that it also works well for matchmaking in games across a wide range of players (frequent, infrequent, old, new)

I wonder if there is a single system which really fits both needs well. I am interested in the setting up a challenging ride with similarly skilled riders. But that is subtly different from appropriately progressing into higher ranks and more traditional racing season structures.

However, I totally agree that there is no need to reinvent the wheel. There are so many well developed systems out there. It’s likely just a simple parameter optimization problem to figure out which ones work best at predicting performance in Zwift.

That would be the World Cup points system, not the ranking system. Different animal. The ranking system is used to determine eligibility to events and event starting order, among other things.

1 Like

I have lost all faith in Eric min, seems to lack passion to fix anything. Persistent problems remain unaddressed and the yearly interview on zwifcast is getting embarrassing with the lack of development. If this ever gets fixed it will be the community without which zwift would be a shadow of what it is. The zwift M O of minimum acceptable change is long running.

I refuse to lose faith in Min because I refuse to lose faith in reason and fairness. I’m having this change, results based categories. Period. I will never give up. I am flexible with other things in Zwift and usually happy with whatever they come up with. But this is the game breaker and not up for discussion. It’s happening.

I saw something mentioned in the other big thread, the one where people are still stuck thinking inside the box (guys, you really can’t save the W/kg categories, they were doomed to begin with). And that’s the priorities argument. One reason why the category system is not reworked could be that staff resources are spent elsewhere, on issues and ideas that may seem to impact revenue more. You can’t fix everything at once, you have to make priorities. But I think that argument is dumb.

What is the standard onboarding scenario for a new Zwifter? First it’s hooking up and getting it to work at all. Then what? You probably just ride around a bit to get a feel for everything. I know I did. Then you may want to take part of more structured functionality, so perhaps you try a workout on your own. You’re still a little scared to interact with other riders. But group rides or GWO’s seem fairly safe socially, so eventually you sign up for a group ride. It could be nice.

You want to fit in so you try your best to stay close to the group leader as instructed. And it really is a great experience. However, you notice that quite a few riders seem to ignore the group leader. You on the other hand are barely clinging on to her wheel for dear life. So you’re a little disappointed that what could have been a 100% positive new experience for you is overshadowed by what may be perceived as disrespect for the leader and in a sense the fliers also crap on your own effort - it’s so easy for them to go flying and doing so is sort of disrespectful also to you. At any rate it makes you feel bad about yourself although it really shouldn’t. So first contact with social interaction in Zwift wasn’t all positive. And you’re still just on trial…

Negative interaction between riders is a disaster for Zwift because then competitors like TrainerRoad and similar become options. I really get it why they have listened to group leaders and decided to create the Fence. I also heard a convincing explanation to why they had to drop it. And now I really understand why they are trying to find a way to make it work again after all.

Once you’re comfortable with attending group rides and you feel reasonably included, you might want to try a race, just to see what it’s all about since everybody is talking about them. Conscientiously you sign up to the right category. But once again you get crapped on by a set of other participants (the sandbaggers), whether your understand it or not. You end up feeling bad about yourself, again, and may decide to never race again. It wasn’t for you. You stick with the group rides and try to avoid the ones without a good blob.

Revenue follows these people. Treat them well and you will have revenue all over Zwift. Let other subscribers abuse them and they will hide away in nooks and crannies within Zwift where they feel safe.

Fix the categories. Results based categories now!

2 Likes

Hi Andreas,
I commend you on your quest to improve the racing scene.

I totally agree.

I hope as we speak, Zwift is finalising a workable solution.
Good luck with your efforts and I hope you don’t run out of energy, or patience.
“Ride On”

ps I don’t think we need the fence back, just let the flyers fly. As long as the leader sticks at the suggested pace, everyone can be happy.

1 Like

ps I don’t think we need the fence back, just let the flyers fly. As long as the leader sticks at the suggested pace, everyone can be happy.

Well, I don’t mind too much either. In some very large group events like the ones we have seen during corona, it’s even a good thing to let fliers of all levels fly, since it results in a group ride that fits everyone fitness wise. I nevertheless think it’s good for group leaders to have the choice whether to use a fence or not. So I’m all for the efforts from Zwift to find a working solution.

I commend you on your quest to improve the racing scene.

Hey, what’s stopping you from fighting the good fight yourself? Justice and customer value needs YOU! Anyway, I’m on a temporary break where I have to ride outdoors (the horror!) but as soon as I get the chance I will continue to hone my skills in cruising Zwift races and to report on my cheating progress, if not here then at https://zwiftcruiser.blogspot.com/ which is under construction atm.

1 Like