Alternate to W/kg for race ranking

This is a bit long and maybe a bit technical but I think there’s serious issues with the W/kg metric used for ranking racers. But I like this sort of thing, and it seems to me there is room for a lot of improvement in what’s done presently, and computers are good at calculating complicated things, so why stick with overly simplistic formulas which don’t work well.

problems with present approach

One is the use of 20 minute power. Racing requires power over all durations, not just 20 minutes. 20 minutes is on the long end of efforts required to be competitive. Consider two riders, both with the same 20 minute power, but the other with a superior 5 minute and 30 second power. The first rider won’t really have a chance. However, if one rider has better 5 minute and 30 second power, but the other has a better 20 minute power, then the race gets interesting: it becomes a matter of tactics.

Another is the use of power per mass. Power per mass is a good metric up L’Alpe de Zwift, which is sustained steep, but on Fuego flats, it’s more power squared per mass, or similarly, power per square root of mass. So if you have two riders matched on power per mass, but one is heavier, the heavier rider will almost always have an advantage. IRL lighter riders are typically better climbers and worse sprinters, but in Zwift categories, the lighter riders are no better at climbing and worse at everything else.

So we can correct these two issues.

use flat speed and VAM, not W/kg

We know actual racing consists of a mix of climbing, descending, and flats. Fortunately, Zwift knows how to calculate speed (IRL this involves coefficients we don’t know). So what Zwift could do is, for a given power number (and mass and height), assuming a “standard bike” (Zwift frame + wheels), on smooth pavement, calculate flat speed and climbing VAM (VAM is the rate of vertical ascent on steep climbs).

speed rating = sqrt [ flat speed ⨉ VAM ]

With this formula, if you have two riders, one climbs 10% faster but is 10% slower on the flats, they will be in the same category, and that will make for a more interesting race than W/kg, in which they will climb the same but the heavier, more powerful rider will always be better on the flats.

what to use for “power”?

Now we need a power number. Presently 20 minute power is typically used (ramp test is 1 minute power). The problem here is that a rider may produce a really good 19 minute power, but then stop. This will result in a 5% lower 20 min power, or perhaps no 20 minute power at all. Or a rider may climb L’Alpe de Zwift and get a really good 50 minute power, but not as good a 20 minute power. The additional issue is that we want to include all powers over a range: 10 second, 3 minute, 20 minute, 60 minute in calculating race ability, since all contribute.

cleaning up maximal power curve

So we have a maximal power curve, which is the best effort for every duration from 1 second to 3600 seconds (for example). But we need to clean it up first. To clean it up, I’ll make two assumptions:

  1. if I can produce P average power for seconds 1 to t, then for second t+1, I can produce at least 2P/3.
  2. If I can produce P average power for t seconds, then I can produce P for any duration < t (sometimes maximal power curves have bumps due to interval efforts, when the duration encounters a second interval).

So this can be fixed:

  1. Step thru the curve from 1 to 3600 seconds. For each duration, if the maximal work (power times time) calculate a lower bound maximal work for the next duration (assume less than this is due to submaximal effort by the rider for this duration). The lower bound maximal work for duration t + 1 = the maximal work for duration t multiplied by (t + 2/3) / t. So unless the actual maximal work for this duration, from rider data, is more than this, then use this for the maximal work for duration t + 1. From maximal work, maximal power = work / duration. So riders are always assumed to be able to produce 2/3 the average power they sustained so far for one extra second. Then 2/3 of that new average power for the next second, etc. If rider data has a better result than this, we use rider data instead, and that rider sets the lower bound for the next second.

  2. Step thru the curve from 3600 seconds to 1 second. If the maximal power ever drops, don’t allow it to drop. Note I’m going in reverse duration direction. If you can produce a power for longer times you can produce the same power for shorter times.

With this algorithm, it is assumed that riders can hold at least 70% of their 20 minute power for an hour, which I feel is a safe lower bound even for extremely endurance-challenged riders. This is determined by the 2/3 coefficient, which could be increased if this is too conservative.

calculating effective power from maximal power curve

Now I have a cleaned up maximal power curve from 1 to 3600 seconds. I need to calculate an effective power from the whole curve, not just one arbitrary duration on the curve.

The following formula is one suggestion:

Pavg = exp ( [ sum from 5 to 3600 { ln | P(t) | / ( t + 10 ) } ] / 5.51725 )

I added 10 seconds to each power in the denominator to slightly de-weight very short powers in the sum.

BTW, I changed this from what I originally posted, since I think the original over-weighted sprinting powers. By averaging the natural logarithm, then taking the exponent at the end, it’s fractional differences in power which matter, not differences. So if one rider has twice the power at the sprint end, and another has twice at the endurance end, those will cancel out, rather than the larger difference in absolute watts in at the sprint end dominating. I’m assuming time here is in seconds in 1-second increments, and power is watts.

Summary:

  1. don’t divide by mass: calculate a flat speed and a climbing VAM from power and mass and height, assuming a standard bike and wheels, and take the geometric mean.

  2. calculate a “corrected” maximal power curve, to compensate for the fact there aren’t quality efforts at every duration.

  3. from the maximal power curve, calculate an “average” power for the curve, which is applied to the speed formula.

I like this approach. Would it be worthwhile to take a sample of riders’ data from zwiftpower.com and running your algorithm then seeing if/how much the current rankings differ from your newly calculated ones?

That’s an excellent idea. Of course a limitation is the algorithm expects to have maximal power data for every second from 5 up to 3600. But even given the few points publicly disclosed by ZP, I can “fill in” the missing data using my algorithm (next second is 2/3 of the average), and calculate from that.

I was afraid I’d get responses like “this is too complicated – W/kg is simpler!” But who cares about simple? Computers are fast, the maximal power curve needs to be calculated anyway, and it only needs to be done once per rider per activity. The ZwiftPower rider ratings are also complicated: you just trust they’ll do the right thing and they generally do.

Here’s some Perl code to test it. It may need to be tuned. I increased 15 second power 100%, 60 second power 60%, 3 minute power 20%, but kept the same 20 minute power. In Zwift, nothing would change, since only 20 minute power matters. But here my “effective power” increased 43%, which is close to the average of the increases in those four key powers (45%).

To reduce the influence of short-term power, and increase the influence of long term power, the weighting could be changed from 1 / (t + 10) to, for example, 1 / (t + 60). If I do that then the effective power increases from 241.4 to 317.1, a 31.3% increase.

#! /usr/bin/env perl                                                                                                                                                        
use strict;

# process the maximal power curve                                                                                                                                           
sub processMaximalPowerCurve {
  my $Pmax = shift;
  my $tmin = shift;
  my $tmax = shift;

  # first, go thru the curve and set a lower, over-riding the data if needed                                                                                                
  my $f = 2 / 3;
  for my $t ( $tmin .. $tmax ) {
    my $i = $t - 1;
    if ( $i >= 0 ) {
      my $lowerBound = $Pmax->[$i - 1] * ($i + $f) / ($i + 1);
      $Pmax->[$i] = $lowerBound if ($Pmax->[$i] < $lowerBound);
    }
  }

  # now go from end to beginning to check for local maxima                                                                                                                  
  for my $i ( 1 .. $#$Pmax ) {
    my $j = $#$Pmax - $i;
    $Pmax->[$j] = $Pmax->[$j + 1] if ( $Pmax->[$j] < $Pmax->[$j + 1] );
  }

  # return the revised list                                                                                                                                                 
  return $Pmax;
}

sub calcEffectivePower {
  my $P = shift;
  my $sum = 0;
  my $tmin = shift;
  my $tmax = shift;

  my $sum0 = 0;
  my $sum1 = 0;
  my $toffset = 10;
  for my $t ( $tmin .. $tmax ) {
    $sum0 += 1 / ($t + $toffset);
    $sum1 += log($P->[$t - 1]) / ($t + $toffset);
  }
  return exp($sum1 / $sum0);
}


# test with my data                                                                                                                                                         
my @P;
$P[15] = 428;
$P[60] = 289;
$P[300] = 254;
$P[1200] = 228;
my $P = processMaximalPowerCurve(\@P, 1, 3600);
my $Peff = calcEffectivePower($P, 5, 3600);

print "effective power = $Peff\n";
# result: 262.1                                                                                                                                                             

# now boost my short term power and see what happens (same FTP)                                                                                                             
$P[15]  *= 2;
$P[60]  *= 1.6;
$P[300] *= 1.2;
my $P = processMaximalPowerCurve(\@P, 1, 3600);
my $Peff = calcEffectivePower($P, 5, 3600);
print "effective power (boosted) = $Peff\n";
# result: 375.5      
4 Likes

If that’s too complicated… this is much simpler.

At present, there’s two limits for each category… W/kg, unless that’s below a certain fixed power, in which case that power is the limit. So no matter how high your W/kg, if it’s below 200W, you’re still a C.

This is the right idea, but it can be substantially improved.

A super-simple alternative would be an extension of the present idea that light riders have a disadvantage, but instead of a 200W floor (for C’s), then 3.2 W/kg beyond that, use a linear function, recognizing that lighter riders have a more difficult time in every situation other than a steep climb @ the same W/kg (even on a steep climb, IRL, because their bikes are a larger % of their weight: not sure if this is in Zwift physics).

So let’s assume the present W limit applies to 75 kg:

C limit: 3.2 W/kg @ 75 = 240 W

then let’s assign the 200 W to 45 kg:

C limit: 45 kg: 200 W

I can fit a simple linear function thru these points:

slope = 40 W / 30 kg =1.33 W/kg
intercept = 140 W

C limit: maximum FTP = 140 W + (4/3) W/kg ⨉ mass

Then scale the results for different categories. So using the 3:4:5:6 ratio in the minumum powers:

D: 105 W + 1 W/kg ⨉ mass
C: 140 W + (4/3) W/kg ⨉ mass
B: 175 W + (5/3) W/kg ⨉ mass
A: 210 W + 2 W/kg ⨉ mass

So for example, I’m 55 kg, my maximal W would be:

D: 160 W
C: 213 W
B: 266 W
A: 320 W

This would put me mid-range B’s (maximum 20 min power = 245W, approx, so 95% = 232 W). This corresponds fairly well to how well I do in flattish/mixed terrain races. If I find myself getting over a climb with an 80 kg rider @ the same W/kg I’ve got little chance in the sprint after the descent. But if I improve my W/kg to outclimb him, I get bumped into the A’s, against even more powerful 80 kg riders.

You might argue – “yeah, but then the lighter riders will tend to climb better in the same category”. Well, duh. That’s how it’s supposed to be. It’s the one chance lighter riders have to show strength. They get crushed in the mass sprints and the flats.

2 Likes

This is brilliant and would work much better than w/kg. Zwift should thank you for doing so much of their work for free.

3 Likes

This all reads very well. I’m struggling to hang on in flat courses and I’m not pulling away from 80-90kg on the rises. But after using Zwift for only a few months I am loving it.

2 Likes

Daniel, that would definitely be a nifty improvement over the current system…

But I think classifying riders based on power date is really not a way forward. There’s more to racing than power output profiles. It’s the same as classifying football teams at the end of the season based on how hard the players kicked the ball. Or how many passes they have given. Is there any sports-classification system that ranks players based on the equivalent of watt output? (perhaps martial arts do… although it’s not entirely comparable)

The only good classification system is one that uses race results. We really should be advocating that. Power-based classification system will always have perverse incentive effects.

I generally agree, but then pro-level newbies will be riding D races until they upgrade. Since there is a constant stream of newbies… true D level riders will never be competitive in their own events.

And the algorithm would need to be carefully chosen to allow riders some mobility without rewarding intentionally doing poorly in selected races to keep a score down. I do a lot of moderate training days and I could do those in races instead, finishing near DFL. For example, in USA Cycling upgrades are based on your best results for the year, not in any way your average results.

So if you are going to use power, there’s better approaches… two I suggest here are (1) use the full power profile rather than just 20 minutes, (2) rely less on W/kg and put more some emphasis on raw watts as well (ie flat-land speed). But if you want to use results instead we could propose algorithms, I’m sure.

2 Likes