Hi everyone,
Due to your requests, I'm going to try and explain a bit about scientific standard procedures, as this is really useful stuff if you're dealing with aircraft tests, and how they apply to the Me 109G-2 discussion.
First of all, if we're talking about "errors", science usually discerns between "random" errors and "systematical" errors.
"Random" errors are due to inaccuracies both in the experiment itself as well as introduced by the observing instruments. Random errors mean that each repetition of the experiment will yield a different set of results which - depending on the accuracy of testing procedure and equipment - will be more or less similar to those from other repetitions of the same experiment. To keep the inaccuracies introduced by random error down to a minimum, experiments often have to be repeated multiple times to yield an average result that can be used with some sort of confidence. (There are mathematical concepts for this kind of confidence :-)
"Systematic" errors are errors that would be predictable and correctable if you figured out their reasons. The problem is, you never know if you have figured out all of the errors in your experiment. Unlike random errors, systematic errors don't yield varying results with repetitions of the experiment with "everything else being equal", but rather give the results an identical bias to one side or the other.
Let's have a look at the FAF Me 109G-2 data now:
http://www.x-plane.org/users/hohun/me109g-2b.jpgFirst, concentrate on the yellow graph. That's the data from Gripen's table, with the 10 km value taken from Gripen's narrative on his request. Below full throttle height, everything looks well, but from full throttle height up, you see a rather kinked graph.
Are the kinks due to systematic errors? Probably not. To begin with, the table provided by Gripen seems to be accounting for some systematical errors, like the compressibility error of the airspeed indicator, the error of the engine tachometer, the difference between standard and real atmosphere etc.
More importantly, if the kinks in the graph were due to systematic errors, the test aircraft's engine would have to be considered seriously screwed up because it loses power dramatically above full throttle height, then sustains and even increases power beyond that of a properly performing engine up to 9 km before dropping dramatically again. Such an aircraft obviously would be unsuitable for testing, and if tested, completely non-representative for the tested type.
While it's impossible to exclude that the FAF flight tests could have suffered from unrecognized systematical errors, the kinks in the curve you're seeing are probably due to random errors only. Remember that you reduce the impact of random errors by repeating an experiment multiple times. From the caption of the table Gripen provided, it seems that the Me 109G-2 data in that table was collected in just 50 min, which probably means that every data point was measured only once. That leaves the results wide open for random errors.
(It's no suprise that the random errors are recognizable so clearly above full throttle height since specific excess power is so poor that it takes a long time to accelerate to top speed, and minor disturbances of the experiment have the greatest effect up there.)
Now have a look at the red and blue graphs now that are labelled "lower/upper error boundary". They define a +/- 15 km/h envelope around the "5.4.43" data and include all FAF test data (except the 10 km value Gripen requested a downgrade for). The actual speed of the FAF Me 109G-2 probably (but not definitely) should be expected to be within the envelope defined by the red and the blue graph.
As you can see, my speed prediction is safely within these bounds. It's close to the upper boundary, but doesn't exceed them, so that the FAF data points do not contradict my prediction.
Considering the random error inherent in the FAF test, it's completely pointless to try and argue against my prediction based on a single data point because the accuracy for the FAF test simply doesn't allow it. All in all, the FAF data just provides five measurements of performance from full throttle height up, which is slightly better than a single data point but still insufficient to call any prediction within (or somewhat beyond) these bounds impossible, or even just improbable.
Judging by the FAF data, my prediction may not be the most probable one (which should be supposed to be the average graph the FAF provided, of course), but it's definitely not an unlikely one either.
The exact math would be rather complicated, but you might find it interesting that in the age of slide-rules, the performance graph often resulted from bending a rubber-encased lead rod into the approximate shape of a typical performance curve, and after using Eyeball Mk I to decide whether the optimum fit had been achieved, fusing the lead rod as a ruler to draw the curve :-)
With data sets like the one provided by Gripen, you wouldn't get much closer to the truth with complex math, anyway.
So I hope I've now managed to explain why I couldn't be any less impressed by the failure of my prediction to match the FAF data above full throttle height :-)
Only one data point from the FAF data has actually provided information to be used by my prediction, and that's top speed at full throttle height. The rest is based on a decent physics model and a DB605A power chart. Both might contain errors resulting in an unrealistic estimate, but comparison to this particular set of FAF data will not help to find out if that's the case. And the physics model definitely is unbiased - it was developed for analysis of the P-40 :-)
Regards,
Henning (HoHun)