Zwift classics 2021 - trofeo bologna bologna giro d'italia tt course

Hi Andreas, what do you mean by “50% time overlap” and is this something you’ve seen in two specific categories of one race in one time zone, or several?

Several. The overlap is too wide. There will always be overlaps, that’s natural, or progression couldn’t occur smoothly or at all. But they can’t be that wide. It undermines the whole purpose of categorization and must surely be below the standards the AutoKitten dev team set for themselves.

My point is that this is yet another sign that their project is doomed. And was doomed to begin with. You can’t make a fair and efficient performance based categorization. It will never be fair and it won’t work well enough even if you disregard the unfairness. It’s too complex and fuzzy. Whereas a results based categorization can be made mindnumbingly simple (think football or something) and still work decently.

It’s time to lay AutoKitten to rest. Sure, maybe they tweaked some knobs this week and tried some extremes in the parameters, and yada yada [insert arbitrary defense speech], but it still won’t work. It will perform worse than straight 20 min W/kg because there will be too many quirks because of adding too many features that don’t actually have anything to do with winning or losing races. They will never be able to calibrate the model to a decently high validity. The only thing in it that does work is the cat enforcement part, that C1 can’t join C3 or whatever. But then you don’t need an algorithm for that. You just enforce the cats, whatever they are. It is very simple. But we still don’t have it, except in these “beta” WTRL races.

1 Like

Yep finished pretty much mid pack 32nd in 50:03mins almost 5 mins behind 1st (45:48) and sort of 5 mins in front of the last group. I am not sure about how much of spread the Autocat system was trying to achieve but 10 minutes across 60+ riders seems quite large to me.

Of all the Classic races I have done, think I enjoyed this one the most. A big factor was making my way from 70th to around 40th on the first climb it felt so rewarding.

Looking at the last race being another crit style race, I think that’s me done with this series.

Look forward to reading the feedback and comments next week though.

Cheers.

Hi Andreas, what I meant is I’m interested in looking at the data you’re talking about, and I was hoping you’d be able to help point towards it.

I enjoyed it too. As a workout. It’s always easier for me to work hard in a race than in a solo workout, and I know I worked fairly hard tonight. So I’m reasonably happy. But I enjoyed it less as a race, looking at my start list before the race and browsing the results from various pens and time zones afterwards, same as last week. And the one before that.

1 Like

Looking at the West Oz series of races, if I was in the C4 race, I would’ve killed everyone by as much as the A riders killed me in the C3 race. There were 8 riders in C4, 37 riders in C3, 19 riders in C2 and 35 riders in C1. There was no C0 cat. Seems like Autocat would be dispersing according to the numbers in the various pens. Instead, we’ve had the same cast of characters in C3 with a spread of w/kg from about 3.1 to 4.5. And a race ranking of 600 to 125. I know the bottom end of C3 is pretty tired of this charade.

I finished 29th out of 37 and over 6:30 down on the A rider that’s been King of C3 for nearly all of the races. I will say, I did not have an acceptable performance on my part and was about 0.2w/kg below where I thought I would be but none of this makes any sense at all.

The other funny thing is riders saying they weigh 88kg so they suck at climbing but if they’re capable putting down 3.9w/kg for 20 minutes and I’m putting down 3.5w/kg with my skinny 66kg ass, they’re still gonna kick said ass up, down and across the Baloney course. Or any Zwift course.

The WTRL Autocat should be addressing this sh*t instead of telling us “Everybody is perfectly categorized” and “Nobody is complaining about the race grouping”.

3 Likes

It really is this simple isnt it . It is both frustrating and incredible that over and over again the mythical idea that you cant sort the simple problem with a simple solution . You can and the longer ZwiftHQ ( and others) continue to trade the idea we need to do anything else the more they just undermine there credibility .

3 Likes

The first point about overlap is tricky.

No 2-4 are easy. Too easy. Well, there are no precedents and no standards in studying Zwift, so you have to create the study design yourself and there is of course always the risk of pitfalls that would create tendencies in the results that has little to do with underlying data but rather the design. Or differently put, be careful what you ask for because you might just get it (and for the wrong reasons).

You could do 3-4 differently than what I did, but I’ll eat my cycling cap if you can come up with reasonably robust designs that yield very different results when looking at standard Zwift races. And I’m pretty sure the tendencies will still be there when looking at WTRL results too. Weight advantage for sure, except… (I’ll come to that).

Thing is when it comes to the weight advantage it is sooo misunderstood. People really can’t get into their heads that heavy riders are not at a disadvantage even in a race with several or heavy climbs in standard Zwift races (outdoors they are but not here). And they have a huge advantage on the flat (more so than outdoors… except in cat A). And it all has to do with W/kg cats. But when it comes to an AutoKitten race like tonight’s course, then to be honest I’m not totally sure what to expect. The reason is that the AutoKitten cats are so fuzzy and overlapping and the grounds for promotion within the system are a great unknown, so the heavy rider does not have that solid upper W/kg limit to lean against to equalize against a lighter rider in a climb. Hence it will be interesting to find out how much weight matters and in what direction on AutoKitten Bologna. As for the previous couple of weeks on flatter courses, it’s a different story. I’m totally convinced that you will see the same heavy weight advantage as in standard races. And I will check myself once the season is over.

When it comes to cruising… hmm… I could imagine a scenario where the WTRL races showed somewhat smaller amounts of cruisers than in standard races. And not because of AutoKitten but because of the setting, because WTRL might attract more hard workers or something… extraneous. Maybe. But the cruisers are still there. I know that. I have seen them. In some cases it’s even names I know since before (when I cruised myself). And I know exactly what to look for, regardless of whether I participate in a race or just look at data from the outside.

When it comes to a supposed difference between men and women, this is my gamble. I admit it. I have no study, no solid data to show for it. But it did seem that way to me the other week when taking a long look at people’s Companion App profiles. Those that weren’t set to private (mostly men’s), that is. A locked profile is a warning sign to me although I don’t draw conclusions from them of course, only from open profiles. But the abundance of locked profiles presents a methodological problem in a study of course, one you will have to deal with one way or the other. Zwift insiders with full access to data won’t have that problem. Zwift insiders actually have far fewer problems that me. I have had to scrape ZwiftPower in the past for data but the race history is still cut short for outsiders. My intention has been to write and set up a permanent automated scraper and just let it run, getting EVERYTHING, but I have so far never got around to it (I do actually have a life too although it may not seem that way). Anyway, I would do a study on the men-women thing myself once the WTRL season was finished, but without an automated scraper it won’t be possible as ZP only lets you peek back a week or so of race history, and I have been too busy and “distracted” by RL to gather data while it is still “there”.

The overlap thing can’t really be studied easily. I dunno, you’d have to run simulations against different tweaks of AutoKitten, which is a well-kept secret to begin with. You’d need a simulated league. You’d need to know the basis for upgrades and downgrades in the AutoKitty system, and I’m still not really seeing any of that. Were they supposed to kick in now already or are we all still just “calibrating”, perhaps even until a next presumed season? How fast or slow are upgrades supposed to happen really? Does Zwift even know themselves? I’m completely in the blinds here and couldn’t come up with a reasonable study for this even as a sketch on a paper napkin.

No, I’m mainly interested in hearing reasoning and motivations for why a 50% cat overlap time wise would be a reasonable thing within a game that tries to be a sport (but fails miserably).

Why is it good and helpful to the system that the winner of one cat would place midpack in the next higher cat in several cases? Why is it good and helpful to the system that the lanterne rouge would place midpack in the cat below in several cases? How does that provide efficient dynamics/mobility and desirable characteristics in the system? I want to hear convincing arguments. Because from my perspective it seems like total crap.

1 Like

I was almost the Lanterne Rouge in my C3 race and would’ve won the C4 race very handily. At least in West Oz. Looks like top 10 in Central Urp. Don’t have time to dig deeper. The winner of C3 was a cruiser as near as I can tell because his final w/kg were well below what he was capable of so I can’t really draw any conclusions about that relative to how he would’ve finished in C2.

1 Like

That’s just the thing about cruisers, isn’t it? :stuck_out_tongue: You simply can’t tell how good they are. You can only lament over their zone 2-3 efforts day in and day out. It’s a comfy existence they have, I tell you! (I know this.) And I’m supposed to be content with working hard and “improving myself”? Ok… yes, I’m really happy to be the stuffing of their ego turkey so that they may become fat and happy… I know my place…

I myself have placed solidly in midpack in all races in my post-covid cat. Because I actually belong there, for a change. But looking around, I would have to see my categorization mainly as a streak of luck.

2 Likes

Without knowing quite what the auto-cat criteria is, there is a far bit of consistency.

I did C2 in Western Europe.

Roughly:

  • C1: 34 finishers, top 15 ranged from 41:14 - 43:39
  • C2: 43 finishers, top 15 ranged from 43:25 - 45:27
  • C3: 79 finishers, top 15 ranged from 45:40 - 47:13

So at the front end of each group, the top 15 finished ahead of the winner in the next category down. So reasonably good. Not a lot of sandbagging going on.

If your goal was to have every rider in each category ahead of everyone in the next category, that didn’t happen. And I suspect would be fairly difficult.

Anecdotal. Have a look at America East C3-C2 and you have a different anecdote. And there are more cases like that in the European time zones. Too many cases. And why top 15? That’s arbitrary and not always a good choice since the pen sizes vary so much (although Europe West happened to be well-attended). Better off with percentages IMO.

Still, it’s beside the point. And that is my underlying point. With a results based system none of this would matter. It would sort itself out. The right riders would be promoted through the system. You wouldn’t have to worry about things like this. The only worry would be if there was a constant stream of new riders passing through several ranks and screwing up races for the regulars, so a decently proper entry point in the system as a new rider is of some significance. Other than that, not a problem. But they insist on reinventing wheels, square wheels, and somehow making them roll. If they don’t roll well, they experiment with new wheel dimensions. I can’t sugar coat it more than this: It’s stupid. I think they know that too by now. But they are obviously not allowed to change the rules and start using round wheels (or they would have already). Someone up the hierarchy says no.

2 Likes

You even can expand your examples to the Top 20 List in Central Europe across C1 - C4.
I like the AutoCat system a lot more than the A - D categorization. There’s room for improvement for sure, but it’s a start.

Think if you look at the spread of each class you can see where some improvements could be made though.

For example I imagine a fair portion of those C4 riders would have had a better race experience by being in C5.

Western

On anything other than a TT or pure hill climb, race times are a bit of a red herring IMO. They depend too much on race dynamics and pack sizes.

In NZ, C1 was actually faster than C0. I’d love to argue that this was due to the ridiculous categorisation error I’ve previously mentioned, and that did probably play a part, but there were also many more riders in C1 and even after the hill there was a front group being chased by a 2nd group which might have spurred them on to try harder. Or not, I don’t know for sure. But it makes time comparisons difficult.

This doesn’t however detract from the overall point, that there is huge overlap in ability and poor division of the riders into different fields, of which there can be little doubt. I see no evidence that the new system is any better than ABCD, but that’s not to say it is actually worse.

6 Likes

@_JamesA_ZSUNR you’re touching on a key point here. @stuart_lynne and @Anthro_Solipsist2507 are onto something, I don’t deny that. Of course “random” events in the races matter. So there was a breakaway and the 2nd group happened to give up and got slower finish times than they could have managed. Or some other scenario. Obviously things like that will influence time dispersion.

BUT… you can’t have deviations like that with AutoKitten. If you are to calibrate it well and make it useful and fair there cannot be wide discrepancies between pens like the ones @Andy_Med showed (I like that graph, more pretty data plx! :grin:). You’d need a massive data set where variance is known and under control, where random events cancel each other out. So what to do? You tweak AutoKitten more! And then some! And introduce additional variables! And tweak them too! And on you go. But the old data is now obsolete since you have changed the parameters so you’re still stuck with a minuscule data set. And the more you try to fix it, the worse it will perform.

They’re stuck with the square wheels. Or the ever more complicated proofs that the Earth is flat. Just give up! Please! Kill the project NOW and move on to solid ground, to something constructive, something that will last. Keep it stupidly simple, stupid, for all I care. It’s fine. It’s still going to be better than this and than what we have had for the last few years.

1 Like

I agree this is a tricky course for comparisons, also with regards to 20min wkg. Many riders top the climb within 20min only to be followed by a 0W super tuck. The faster you go, the longer the climbing effort is offset by the super tuck. The finisher time plot though useful would be more informative if it were violin shaped, showing percentiles or at least showing a median or something. Especially the tail can skew the impression a lot, because dropped solo riders typically lose a lot of time.

As to my own experience with auto-cat it was pretty much as I expected based on spectating earlier results. It did reasonably well for me. I did not survive the front ‘split’ and finished in the 2nd quartile of participants. Reaching the first quartile feels achievable with training, but I wouldn’t quite be able to win it either. One category higher would put me in the last quartile (probably around last places due to lack of draft); a category lower would make me feel out of place too. So autocat did a good job in my case. That said, it also did not change my experience much compared to ZRL wkg category enforcement in which I perform pretty similar (in the lower divisions). It’s a glass half-full half-empty sort of thing for me.

This actually happens to be (presumably) much less of a problem with Bologna than one might think (Innsbruckring is a MUCH trickier course and so is Lutscher). If 20 min WKG does still influence their model - and we all assume it does, right? - then the 20 min avg that AutoKitten will judge you by as a rider will most likely coincide with the first 20 min block in almost any pen. And once you’re through those first 20 min they will (in this case) have no bearing on the rest of the race. Regardless of whether you supertuck the first descent or not. Yes, a semi-high cat may show lower WKG for the first 20 min than the cat below where people barely make the climb in 20 min but as for finish time dispersion I don’t think it will matter.

It’s a bit hard to explain these quirks of ZP WKG (there are many others) in just a few words, but it is exactly this feature of the ZP cat system that you exploit when you are deliberately cruising races FTW. And it holds true even if you are not cruising. I have described it here.

1 Like

(Disclaimer - I haven’t participated in these test races - I couldn’t be bothered to sign up for another third party site after already signing up to ZwiftPower. I have however been lurking and reading all these threads assiduously).

The more of these various threads I read the more I wonder if all this Autocat/other testing was doomed from the start. The ‘human effect’ of races makes interpreting data hard…

For racers:

  • Good day/bad day for performance
  • Heavy Day/Light Day in regards to weight (I can vary +/- 2kg in 24 hours in water weight)
  • Race tactics
  • Cruisers/sandbaggers
  • Category entered
  • Size and ‘quality’ of race field
  • etc. (other things I’ve forgotten)

My intuition says you need as much data as possible to try to damp out those human effects.… but it appears WTRL are now confounding the data with Autocat changes as well. If you change what Autocat/other is doing each race you cut your data set down into very small sample sets (my intuition only - I don’t have access to any data to do any testing myself obviously). Surely any experiment that is aiming to change cat boundaries (which is what Autocat appears to be designed to do) should instead start by obtaining as much existing race data as possible and start making models and predictions around that. Not start by messing with cat boundaries.

Surely with Autocat changes happening you’ve cut down your data set to make hypotheses /conclusions on to just a single Autocat setting? So your human effects are going to swamp your ‘Autocat effect’? (Pesky humans!)

2 Likes

I don’t know if I understand you correctly. What I try to say is that on this course some riders will need 20 minutes of FTP effort to reach the top of the climb. Others go harder and reach it within 18 minutes of FTP effort. This means the latter group rides 2 minutes “ahead”, but due to the 2 additional minutes of 0W super tuck could still end up with a similar 20 minutes average as the first group. In my understanding this is the equivalent of cruising, except that it is now guided by the course and there’s no way to avoid it. If anything in defense of Autocat, I believe its original intention was to extrapolate that 18 minute effort to obtain a more accurate FTP estimate. This should theoretically reduce the opportunity for cruising, however I have no clue how many riders are actually being re-categorized based on such sub-20min efforts.

1 Like