Median instead of average in summary statistics

(Stephane) #1

Would it be possible to display median values for cadence, heart rate, and power instead of average? If a sensor drops out periodically (e.g., heart rate and cadence, as mine often do), then zeroes are recorded. A median value does a much better job of ignoring outliers. For example, the set {150, 149, 162, 155, 0, 153, 0, 160, 159, 0} has an average value of 108.8 and a median value of 151.5. The lowest non-zero value is 149, and a median provides a much better summary of “typical” values in the set. I assume the issue might be that a slightly more sophisticated algorithm is required to calculate a median efficiently (i.e., without sorting the data), as opposed to calculating the average, which requires only a single pass and can be computed by maintaining a running sum of the values during the activity. CS-types on Zwift will know that a median can be computed in time proportional to the number of data (which is the same as is required for computing the average) and, furthermore, simple algorithms exist with good expected running times, as well as good one-pass approximation algorithms.

(Chris) #2

why not just ignore zeros?

Median doesn’t really make sense - what if the middle value is zero?
mode almost makes sense but still not sure if it is that useful.

unless you have a very poor set up the drop outs shouldn’t affect the overall mean average too much.

(Stephane) #3

Chris: a median of a given set S is a value m such that half of the values in S are <= m and half are >= m. The only way to obtain a median of zero is if at least half of the values are zero, in which case zero arguably correctly summarizes the data. Mode also doesn’t work. See my previous example: the mode is zero. Zeroes and, more generally, any extreme outliers, can have a significant effect because they pull the mean far away from what its value would be if the outliers were omitted. E.g., say your power meter has an error spike of 2,5000 W in the following stream of power data: 150, 150, 150, 2500, 150. The mean power is 620, but the median is 150. Many data sets on Zwift (not just mine) have frequent drops in the recorded heart rate and cadence data. Using a median avoids having to decide what classifies as an outlier.

(Chris) #4

I just do not think it is workable - the values would have to be constantly added and reordered.

And if your most frequent reading is zero you really need to address the issue in your set up not in zwift itself.

Your example doesn’t work because the data is captured so frequently (every second probably) the set would much much bigger, so if you have a spike or drop for a second here and there this is not going to impact the average on, unless you plan on training for about a minute.

why not just ignore zeros in the calc?

(Stephane) #5

Hi Chris. I agree that ignoring zeroes might solve this problem for many cases. Perhaps Zwift already does this, but it’s difficult to know without seeing the actual data and comparing the averages with and without zeroes included against the average displayed by Zwift. You are correct that the issue might be solvable by ensuring sensors are charged, within range, etc., but the problem is that I’m connected at the start, when Zwift confirms signals from all my sensors; my sensors might subsequently connect and disconnect during my activity. If there’s an easy software update to eliminate the problem for many users having similar issues, then why not implement it in Zwift? If you examine other users’ plots, you can see that dropped sensor signals are common. I’m attaching plots from two of my recent activities. This is the same setup, same distance from the ANT+ sensor, same batteries, etc. For some reason the HR sensor dropped frequently one on day but remained well connected the next.

(Chris) #6

there is definitely something wrong with your setup or sensors, that second chart should not be happening very often at all.

What was the average for that second activity? that would tell you if it is an issue or not, wouldn’t it?