Stages Power Meter Compared (my best DC Rainmaker impersonation)

Back by popular demand, the general all-things Road forum!

Moderator: robbosmans

savechief
Posts: 354
Joined: Wed Aug 31, 2011 2:36 am

by savechief

I can't calculate a Stages percent accuracy from the data I have. To do so would assume that the numbers from the Drivo are 100% accurate, which they aren't.

Your % accuracy formula should be something more like ((Measured - Actual) / Actual) * 100
Time VXRS Ulteam (7.16 kg)
viewtopic.php?f=10&t=120268

by Weenie


Visit starbike.com Online Retailer for HighEnd cycling components
Great Prices ✓    Broad Selection ✓    Worldwide Delivery ✓

www.starbike.com



youngs_modulus
Posts: 668
Joined: Wed Sep 20, 2006 1:03 am
Location: Portland, OR USA

by youngs_modulus

savechief wrote:Finally, some data. Sorry for the delay, there was an issue getting the files imported with a Mac running Safari:

https://analyze.dcrainmaker.com/#/publi ... 8d57d6e08f

My Drivo is set via the Elite app to 10 second smoothing. I also used 10 second smoothing within Ray's DC Rainmaker Analyzer tool. Once you click on the link, you can use your mouse to zoom in.

Those data are very interesting...thanks for posting them! If you look at the plots, the Stages trace clearly has better resolution with respect to time. I can't help wondering whether the 10-second-smoothed Drivo trace isn't smoothed twice: once on the trainer and a second time in DCRainmaker's program. I'd be curious to see what would happen if you superimposed the plots of (a) the Stages trace with DCR's 10-second smoothing and (b) the Drivo's trace with no DCR smoothing. If you could get screenshots of each individually, I could overlay them myself and post them here.

It does seem that the Drivo reads high compared to the Stages, but it's hard to say how much of that is due to smoothing. It's also possible that the people who developed the Drivo added a correction factor to account for powertrain losses so that their trainer attempts to return power at the crank. That would be silly, but then again, cars manufacturers do that all the time. They report horsepower at the crank rather than at the wheels so the numbers look better. For example, my car makes about 200 HP at the crank but only about 155-160 at the wheels. Car transmissions are way less efficient than bike transmissions, but the effect is non-negligible.

Again, I don't think it's likely that the Drivo includes a correction factor. But I'd be really interested in seeing an apple-to-apples comparison with smoothing parity.

savechief
Posts: 354
Joined: Wed Aug 31, 2011 2:36 am

by savechief

Re: Drivo data smoothing, I will have to do another trainer session with the settings changed in the Elite app. I will hopefully do that either tonight or tomorrow.

I found the following two graphs interesting. I used 30s smoothing in the DC Rainmaker Analyzer for the graphs just to see overall trends. In the first interval, the Stages consistently reads lower than the Drivo. The same is true for the second interval. In the third interval, however, the Drivo reads consistently lower than the Stages. The only changes from the first two intervals to the third interval were that A) I was more fatigued and B) I changed gears (was using Resistance mode on the Drivo, not Erg mode) and had a lower cadence. For the first two intervals, my cadence was high 90's and into the low 100's. For the third interval, my cadence was generally in the low up to mid 90's.

1st Interval, high 90's cadence: Stages lower than Drivo (2nd interval is similar)
Image

3rd Interval, low 90's cadence: Drivo lower than Stages
Image
Time VXRS Ulteam (7.16 kg)
viewtopic.php?f=10&t=120268

pantelones
Posts: 35
Joined: Mon Jan 31, 2011 2:24 am

by pantelones

bilwit wrote:Everyone always complains about one-sided power meters (particularly Stages) but the fact is that it has a few Grand Tours under its belt so it's more than good enough for all-world riders/teams and regardless of "true" accuracy, the results are normalized to that individual PM so the only real issue with training with something like this is how its results translate to other PMs you might be using in other setups, in which case it would be better to keep separate FTP for each one (admittedly, would be a hassle). I guess for Zwift superstars this might be an issue since you would want the maximum amount of watts accounted for because it directly relates to everyone else's readings.



Just for the record, Team sky has not used single sided Stages exclusively. In fact, as soon as they could start using prototypes and transition to dual sided units they did.
https://www.dcrainmaker.com/2017/08/han ... meter.html

The Stages data quality debate is a pure circle jerk by people with Stages powermeters to make themselves feel good about their purchase. Stages is a decent product, but only records 50% of the rider data.

bilwit
Posts: 1526
Joined: Sun Apr 03, 2016 5:49 am
Location: Seattle, WA

by bilwit

pantelones wrote:Just for the record, Team sky has not used single sided Stages exclusively. In fact, as soon as they could start using prototypes and transition to dual sided units they did.
https://www.dcrainmaker.com/2017/08/han ... meter.html

The Stages data quality debate is a pure circle jerk by people with Stages powermeters to make themselves feel good about their purchase. Stages is a decent product, but only records 50% of the rider data.


Do you know what normalized means? It's pretty asinine to dismiss left-sided PM without actually knowing what that actually entails. It literally does not matter unless you are comparing the results with a different PM altogether or are trying to look good on Zwift. I'm not sure where all the anger is coming from anway, no one has ever said Stages or 4iii or whatever left-sided PM are the ultimate, end-all PM everyone should use.

youngs_modulus
Posts: 668
Joined: Wed Sep 20, 2006 1:03 am
Location: Portland, OR USA

by youngs_modulus

savechief wrote:Re: Drivo data smoothing, I will have to do another trainer session with the settings changed in the Elite app. I will hopefully do that either tonight or tomorrow.

I found the following two graphs interesting. <snip> For the first two intervals, my cadence was high 90's and into the low 100's. For the third interval, my cadence was generally in the low up to mid 90's.

1st Interval, high 90's cadence: Stages lower than Drivo (2nd interval is similar)

3rd Interval, low 90's cadence: Drivo lower than Stages


I really doubt that a ~6% reduction in cadence is causing the Stages meter to go from reading low to reading high. It looks like the difference in smoothing is still a factor in both the first and third intervals. Smoothing probably isn't the sole cause of the difference, but it makes any other hypotheses less certain.

I suspect the lower/higher switch is partly due to a difference in thermal compensation. I know Stages uses thermal compensation and I assume Drivo does too. But since your Stages is open to room air and your Drivo is (presumably) heating up as you ride, the Drivo's T-comp has to deal with a much higher thermal delta than the Stages meter does.

Thermal compensation is a big deal in instrument design, and I'm a little astonished that 4iii doesn't use it. I used to design Coriolis flow meters, which are incredibly precise...we were measuring mass flow at the rate of 0.5-1 gram per hour. Because we hadn't yet implemented thermal compensation, we had to seal off the HVAC vents in the room we used for testing. Prior to doing so, we could instantly see the effect of the HVAC system cycling on and off in our plots. We could even tell if someone got within about six feet of our prototype, since their breath would warm our testbed slightly. We joked that we hadn't developed a flowmeter so much as a curiosity detector.

So the meter-trace flip-flop in your graphs might have something to do with thermal compensation or it could be something else entirely. I'm looking forward to seeing the promised plots with equal smoothing applied.

pantelones
Posts: 35
Joined: Mon Jan 31, 2011 2:24 am

by pantelones

bilwit wrote:
pantelones wrote:Just for the record, Team sky has not used single sided Stages exclusively. In fact, as soon as they could start using prototypes and transition to dual sided units they did.
https://www.dcrainmaker.com/2017/08/han ... meter.html

The Stages data quality debate is a pure circle jerk by people with Stages powermeters to make themselves feel good about their purchase. Stages is a decent product, but only records 50% of the rider data.


Do you know what normalized means? It's pretty asinine to dismiss left-sided PM without actually knowing what that actually entails. It literally does not matter unless you are comparing the results with a different PM altogether or are trying to look good on Zwift. I'm not sure where all the anger is coming from anway, no one has ever said Stages or 4iii or whatever left-sided PM are the ultimate, end-all PM everyone should use.


Normalized power is method to find to physiological equivalency between a non steady state (crit) effort and a steady state (TT) effort. It is not a method to analyze and compare data between measurement systems (Stages vs SRM).

Sure stages works for some people, but the fact is there are a lot of people out there who don't know how poor their data is. Maybe those people also don't care. Those people should just use a speedometer and heart rate monitor for training.

From Matlab:
In cycling, a power meter is an indispensable tool to record power output (in Watts) and measure fitness gains and performance metrics. When analyzing the data though, many different workouts can yield approximately the same average power, despite major differences between workouts (e.g., a long steady effort vs. sprints or intervals). Normalized power (NP) is a method to measure the effect of more intense efforts on the overall workout. NP is calculated by the following four steps (from Training and Racing with a Power Meter by Allen and Coggan):

Calculate a 30-second rolling average of the power data
Raise these values to the fourth power
Average the resulting values
Take the fourth root of the result
You will be provided with the 30-second rolling average power data set (vector). Write a function to return the average power (using the rolling average data) and the normalized power using steps 2–4 above. Round the values to the nearest integer.

kulivontot
Posts: 1163
Joined: Sun May 16, 2010 7:28 pm

by kulivontot

This of course is irrelevant to the discussion of precision vs accuracy above. But cool. Didn't know that matlab had a section devoted to cycling power.

savechief
Posts: 354
Joined: Wed Aug 31, 2011 2:36 am

by savechief

@pantelones
I get it, you don't like Stages power meters. But claiming that Stages is no better for many than just a heart rate monitor and speedometer is just ridiculous.

@youngs_modulus
The Drivo uses an optical system to measure power and no temperature compensation is needed. See Ray's write-up under Accuracy Testing:

https://www.google.com/amp/s/www.dcrain ... w.html/amp
Last edited by savechief on Thu Oct 26, 2017 7:34 am, edited 2 times in total.
Time VXRS Ulteam (7.16 kg)
viewtopic.php?f=10&t=120268

pantelones
Posts: 35
Joined: Mon Jan 31, 2011 2:24 am

by pantelones

kulivontot wrote:This of course is irrelevant to the discussion of precision vs accuracy above. But cool. Didn't know that matlab had a section devoted to cycling power.


I was replying to someone who asked me if i knew what "normalized" was. Implying that because the normalized power between the two measurement systems was the same then the measurement systems are of equal accuracy/precision/validity. This is not true, just because two hikers end at the same place (avg power, or normalized power) doesn't mean they walked the same route.

The matlab reference was just the first search result with a decent explanation.

kulivontot
Posts: 1163
Joined: Sun May 16, 2010 7:28 pm

by kulivontot

I think you need to re-read Bilwift's post. The implication was that the actual number doesn't matter except for bragging rights on zwift or swapping between PM's as long as it's consistent.
That said, as with every power meter accuracy discussion the results are ¯\_(ツ)_/¯. Op's trainer and op's stages are inaccurate in different ways and measuring at different points in the drive train.

glepore
Posts: 1410
Joined: Thu Mar 28, 2013 4:42 pm
Location: Virginia USA

by glepore

kulivontot wrote:That said, as with every power meter accuracy discussion the results are ¯\_(ツ)_/¯. Op's trainer and op's stages are inaccurate in different ways and measuring at different points in the drive train.


This... . The nature of the device is subject to numerous external and internal factors, as well explained by youngs-modulus upthread. Its always been that way. Folks talk about "gold standard" this and that, and while some construction methods and algorithms may be better, absent true lab testing its all conjecture as to a given device. None are "awful" for training, despite what marketing might say, but there are reliability and ease-of-use considerations.
Cysco Ti custom Campy SR mechanical (6.9);Berk custom (5.6); Serotta Ottrott(6.8) ; Anvil Custom steel Etap;1996 Colnago Technos Record

savechief
Posts: 354
Joined: Wed Aug 31, 2011 2:36 am

by savechief

Did another ride last night:
https://analyze.dcrainmaker.com/#/publi ... cbae30914c

I had hoped to record the Stages power with my Edge 520 and Drivo power with my Edge 510, then do all smoothing within the DC Rainmaker Analyzer, but my Edge 510 had no battery. As a result, I had to record the Drivo only through Trainerroad. I did set smoothing within Trainerroad to "none", but was only able to reduce the smoothing within the Elite app to 2 seconds (it was previously 10 seconds). So again, the smoothing between the Stages and Drivo is not equal, but they're closer than they were before. The rest of the smoothing was done in the DC Rainmaker Analyzer, and is either 0 seconds or 10 seconds per the graphs below. I did 7 total intervals, but won't show the 4th since I had a trainer glitch that caused me to stop mid-interval, thus creating strange power spikes and a time offset between the two power meters.

First 3 Intervals, No Smoothing
Image

First 3 Intervals, 10 Second Smoothing
Image

2nd Interval, No Smoothing
Image

2nd Interval, 10 Second Smoothing
Image

Last 3 Intervals, No Smoothing
Image

Last 3 Intervals, 10 Second Smoothing
Image

7th Interval, No Smoothing
Image

7th Interval, 10 Second Smoothing
Image

Looking at the first 60 second interval, the Stages power meter averaged 3.1% lower than the Drivo. Over the course of that interval, the Stages varied from 6.5% lower to 0.7% higher.

And yes, I know that we've covered how comparing average and normalized powers can be very misleading, but here are the numbers anyway...

Average Power
Stages = 204W
Drivo = 206W

Normalized Power
Stages = 236W
Drivo = 241W
Time VXRS Ulteam (7.16 kg)
viewtopic.php?f=10&t=120268

youngs_modulus
Posts: 668
Joined: Wed Sep 20, 2006 1:03 am
Location: Portland, OR USA

by youngs_modulus

savechief wrote:The Drivo uses an optical system to measure power and no temperature compensation is needed. See Ray's write-up under Accuracy Testing:

https://www.google.com/amp/s/www.dcrain ... w.html/amp


I'm going to use this as an opportunity to explain why it makes little sense to fret about whether one's power meter is "accurate." I'm not saying you're doing this, Savechief; I think you're asking good questions. While I think the answers are interesting, they all point to the fact that all the strain-gage-based power meters currently on the market are just fine for training (even for pros). A meter that's more accurate than average is only really useful if you're running an exercise physiology lab or (more likely) using the Chung method to gauge aero gains without a wind tunnel.

Here's the tl;dr version:

  • There's way more uncertainty in power measurements than manufacturers admit.
  • Most people who know power well (e.g., athletes, coaches, marketers employed by power meter companies, even Ray Maker) misunderstand power meter accuracy and error.
  • It turns out that they don't need to understand those things, because accuracy and error matter a whole lot less than consistency.
  • Nearly all power meters return consistent results as long as their offset is set properly (i.e., they're zeroed regularly).
  • The variation in meter-on-meter plots comes largely from algorithms in the power meter and in the bike computer that writes out the power file.
  • Don't worry about accuracy. You can't validate manufacturers' claims, and besides, the meters are accurate enough. Buy based on features, price, service or whatever else matters to you.
  • Don't forget to set your offset (zero your meter).


Manufacturers tend to quote a single vague accuracy number, and that doesn't tell you much. No one mentions this to consumers, but accuracy is usually quoted as a percentage of full scale. "Full scale" basically means the largest reading the meter is rated to read. For a power meter, that might be 2000 watts. If your meter claims ±1%, that's ±20 watts no matter the current reading. That means if you're spinning along at a true 100 watts, your meter might read anywhere between 80 watts-120 watts. So accuracy that's ±1% of full scale can easily be ±20% of reading at low power outputs.

That sounds really bad, and it is. The assumption in the metrology field is that accuracies are quoted as %FS (percent of full scale) unless they're specifically claimed to be %R (percent of reading). So according to the manufacturers' claims, all power meters are pretty inaccurate at low power outputs.

But in reality, their strain gages are accurate to something like ±0.05%-±0.25% of full scale. If full scale is 2000 watts, that's more like ±5 watts in the worst case. But there's additional uncertainty added by thermal effects, hysteresis, material variations and other factors. That's why most instrument manufacturers quote accuracy as the sum of a small full-scale error plus a larger reading error. That lets you calculate the accuracy of your meter at any point in its operating range.

Reporting accuracy this way is mostly done when scientists design meters for other scientists. If SRM or Saris reported their accuracy this way, it would just confuse consumers. As a matter of fact, Stages reports their error as ±2% of reading. That's almost certainly a "dumbed-down" version of reality, but it shows that they know the difference between %FS and %R. Basically, power meter manufacturers fudge an accuracy number that's understandable for their target market. That's not necessarily a problem, either.

It can be taken too far, though. When Elite/Drivo asserts that their optical system has no use for thermal compensation, it's a triumph of marketing over science. They claim that they're not using strain gages, but of course they are. Theirs are optical rather than resistive: they're using optical occlusion to detect a phase shift* due to torsional strain in a shaft. It's true that this method is less sensitive to thermal effects than the more-common resistive strain gages. But temperature still matters a lot, and it's clear that whoever talks to the press for Elite doesn't have a great technical understanding of their device.*

The marketing BS spins out of control when Elite presents its fake "certificate" to Ray Maker. I don't doubt that they had their trainer tested by the lab they mention, and I don't doubt that it tested quite well. But if you read the "certificate," it's clear that it wasn't written by the German lab but rather by one of Elite's marketing staff, perhaps a 15-year-old intern. I mean, the tone is absurd. No credible lab anywhere is going to issue a report saying:

some adolescent Elite marketing person wrote:The current DRIVO [and all future DRIVO models] can be used as references for other powermeter systems in the market. ELITE DRIVOs are the new reference for precision power measurements.The question "who is right" is answered by ELITE today.
(Emphasis added). Seriously, that's what it says. See for yourself here: https://media.dcrainmaker.com/images/20 ... mage12.png

Whoever wrote this "certificate" could just as well have added "Giulio Bertolo [Elite's CEO] is both handsome and extraordinarily well-endowed."

I've had several of my devices certified by third-party labs, and coincidentally, it's usually a German one: TÜV Rheinland. The certs we get back from labs just attest that our product is claimed to meet XXX standard and, after performing X Y and Z tests, the lab attests that our product meets XXX standard. They're really boring. They'd never make claims about future models or weigh in on what instrument is "the new reference" in my field.

In the comments section of Ray Maker's article, someone named Robert (comment #40) picks up on the fact that that Elite's accuracy claim is incoherent and that the "certificate" is meaningless. He's a lot more reserved about it than I am, though.

It's telling that Ray Maker published that "certificate" on his web site, though he immediately explains that he's more interested in evaluating the Drivo himself. Maker is clearly a very smart guy, but he doesn't have a hard-science background at all. That's not a criticism, either. He does what he claims to (write about and evaluate power meters) very, very well. But quantifying error and accuracy for instruments like this is really hard...undergrad engineering curricula often don't address it, and I only got a little exposure to it in grad school. It wasn't until I started working in the metrology field that I wrapped my brain around it to any degree. It's not Ray Maker's fault that he doesn't know this stuff; it's esoteric. And I understand that, to his credit, Maker consults with Tom Anhalt and Robert Chung when getting into the details of data analysis. I'm not sure what more anyone could ask of him.

The point is that unless you're designing power meters, it doesn't matter whether you understand accuracy and error. There are two reasons why:
  • While powermeter accuracy numbers are fudged and somewhat made-up, most makers quote numbers that represent realistic error levels for a broad range of power inputs. The specs are technically wrong, but in a broader sense, they're basically right.
  • There are so many factors that affect power readings (altitude, fitness, muscle fatigue, injury, time discretization, meter algorithms, head-unit algorithms, etc.) that the raw error/accuracy of the meter just fades into the noise. Meters don't need great accuracy to be fully valuable training tools; they just need to be consistent. And all of the strain-gage-based meters I've seen (including left-side-only ones) meet the consistency criterion.

I'm defining "consistency" as "pretty precise" and "somewhat accurate." A "pretty precise" meter will give essentially the same output under the same conditions every time. It may be off by a bit (like a left-side-only meter for someone with a large L/R imbalance) but it's off by the same amount every time. A "somewhat accurate" meter will be off from the "true" value, but maybe only 7-10 watts at your FTP.

And you can get that consistency from nearly any power meter on the market. Stages meters do just fine. People have complained that "they only capture half the data," but that's not true; they capture most of the data. Most people's dominant limbs are a little stronger than their non-dominant ones, but so what? If, at FTP, your left leg puts out 150 watts and your right leg puts out 160 watts, you'll think your FTP is 300 watts instead of the true 310 watts. So what? As athletes, we're interested in relative improvement--how our speed now compares to our speed before--and not our absolute power output. For all intents and purposes, a one-sided Stages will capture a 5% improvement in FTP just as accurately a Quarq or a Powertap will. Your one-sided meter will report that your FTP went from 300 to 315 watts, while a Powertap hub, capturing both sides, would report an increase from 310 watts to 325.5 watts.

But wait! A Powertap hub measures power after the drivetrain, so it reads low compared to a crank-based meter. Reasonably well-maintained bike drivetrains are around 96% efficient, so if your "true" FTP is 310 watts, your Powertap will suggest that your FTP is 96% of that, or just under 298 watts. That's lower than the supposedly-awful left-side-only Stages at 300 watts. If you reject one-sided power meters, you should also reject hub-based power meters, because in most cases, the discrepancy in reported power is about the same. One might counter that "at least with the Powertap you know what you're not measuring." Well, sort of, but drivetrain losses vary with chain tension, gear selection and lubrication condition. So you don't really know; you've just got a rough idea. My point is that no matter how you measure, as long as your rough idea is consistent, you're producing perfectly useful data.

When you're talking about high-quality power meters, there's no point to claiming that one is useless while the other is not. Comparing data traces, while good fun, often comes down to the truism, "A man with a watch knows what time it is. A man with two watches is never sure."

With rare exceptions, SRMs, Powertaps and Stages power meters are equally useful as training tools. If you're in the market for a power meter, choose the one that fits your bike, budget and sensibilities best. You'll get the most useful results if you frequently set the offset/zero, and a $500 power meter that's properly zeroed will give much more useful, repeatable results than a $2000 meter that gets zeroed once a month.



* And since they're detecting phase shift, they'd better not be doing so in the time domain. Detecting phase differences is considerably more accurate in the frequency domain, though shifting from the time domain to the frequency domain requires some fancy math in the form of a fast Fourier transform (FFT). That's a computationally intensive operation best done with an FPGA or ASIC, which are specialized computer chips that can be dedicated to a single set of operations. They're not cheap, though.

** If we assume a 50°C operating range for the Drivo meter's internals, then its accuracy varies over an extra 2% range simply because for human-friendly temperatures, every 50°C rise in temperature makes chrome-moly steel 2% more flexible. So Elite could considerably improve the accuracy of their meter just by implementing temperature compensation. They seem to make a pretty accurate power meter, and it's fair to say that it's considerably less temperature-sensitive than resistive-strain-gage-based meters. But the statement Elite makes in Ray Maker's article--that thermal compensation doesn't apply to their technology--is just plain wrong. Again, I expect their marketing department's braggadocio is no reflection on the accuracy of the Drivo power meter.

by Weenie


Visit starbike.com Online Retailer for HighEnd cycling components
Great Prices ✓    Broad Selection ✓    Worldwide Delivery ✓

www.starbike.com



pantelones
Posts: 35
Joined: Mon Jan 31, 2011 2:24 am

by pantelones

kulivontot wrote:I think you need to re-read Bilwift's post. The implication was that the actual number doesn't matter except for bragging rights on zwift or swapping between PM's as long as it's consistent.


How do you determine that a power meter is actually consistent for the short and long term? :noidea: Most people I hear use very subjective means to validate their idea of consistency.

Post Reply