My inefficient .fit file processor

janoma · January 3, 2021, 11:43pm

It took me around 1 hour to create a very inefficient script in Python that processes a .fit file and obtains the normalized power, plus the best average for 15 seconds, 1 minute, 5 minutes and 20 minutes. When used, it looks something like this:

~/workspace/fit_process ❯ ./fit_process --file 2020-12-31-15-21-37.fit
Normalized power: 214W
15 seconds: 341W
1 minute: 313W
5 minute: 307W
20 minute: 298W

Now, believe me when I tell you it’s very inefficient, because I wanted something quick and dirty to have a rough idea.

The script in the example took a little less than 2 seconds, and the full workout was a little longer than 1 hour. My home computer has 12 cores, which (more or less) means I can process 12 similar .fit files every second. With this, I could process a full race with 240 riders in 40 seconds, or a ride with 1,000 riders in 3 minutes. And that’s with a single server.

Oh, and did I mention how inefficient this is? I’m repeating some calculations to save me the trouble of writing extra lines of code. An optimized version would take a fraction of the time. But again, it only took me one hour to write.

Of course Zwiftpower does more, like writing these results to a database, but next time they take hours or even days to report results, this will put the delay in perspective.

I’m planning to share the script, ugly and inefficient as it is, as soon as I make some small improvements to see how much better it can be if I put some thought and dedicate some more time to it.

Anna_Ronkainen · January 3, 2021, 11:58pm

That’s the spirit! Though I think ZP does more than just find those specific averages, as the analysis gives at least the complete critical power curve for the whole effort (though not quite perfectly, as you can see anomalies along the lines of the 20-second average being higher than the 19-second average…).

janoma · January 4, 2021, 12:05am

Of course! Those are things I would have to add to it, but the game for me (so to speak) was to have a rough estimation just to put things in perspective, which I hope to have achieved.

Now, keep in mind that the power curve is something that Zwift itself gives you immediately after a ride, so that alone could not justify the delays we’re seeing. Similarly for intervals.icu and other similar tools.

janoma · January 4, 2021, 12:30am

Also, the cases of “n+1” seconds average being higher than “n” seconds are perfectly possible, especially in the presence of sprints or short bursts, so don’t get mad next time you see one. Here’s an example, each number represents 1 second of power data, in Watts:

230,100,100,100,100,600

Here you have 6 seconds of data, where the best 5-second average is 200W, but the 6-second average is 205W. You can get larger differences by playing with the numbers, but the key is to have a sudden “bump” in power, which is exactly what happens when you sprint.

Lin_Alan · January 4, 2021, 12:32am

That’s not impossible. Actually, it’s quite common if you’re doing intervals or surging a lot. One would see the curve go up and down sawtooth like. An example:

Anna_Ronkainen · January 4, 2021, 12:37am

Yeah, that makes sense, considering it’s the average, now that I think of it.

isnogud77 · November 16, 2022, 9:58pm

Is the method ZP calculates these numbers documented somewhere? I had some issues recently with missing or nonsense data in the interval averages and wanted to build some script to investigate.

Here’s my first approach (which may be similar to yours)

import fitparse
import pandas as pd

fitfile = fitparse.FitFile("2022-11-14-18-00-06.fit")

d = dict()
for record in fitfile.get_messages("record"):
    d[record.get_value("timestamp")] = record.get_value("power")

s = pd.Series(data=d)

print("15s", round(s.rolling(window=15).mean().max()))
print("30s", round(s.rolling(window=30).mean().max()))
print("1m", round(s.rolling(window=60).mean().max()))
print("5m", round(s.rolling(window=5*60).mean().max()))
print("20m", round(s.rolling(window=20*60).mean().max()))

I tested this on a race for which ZP shows all averages and the results are close but not identical (except for the 20m, which is identical).