DZ, if the GPU is not able to perform faster than 50% of the CPU's capability, then thier is no gain in anything. The GPU only needs to be about 50% faster at calculations then the CPU as they calculations are done in parallel with the CPU, thus to get more performance, it only needs to be half, or better, as fast than the CPU at the same calculations.
Now, that is just the calculation side. To perform these calculations the video card must have all vertex buffer data stored on the video card for the given object tobe rendered correctly.
If the card runs out of memory for all the vertex data (which can be very significant), then the card can either unload memory to load the new object or operate over the AGP bus to modify the data directly in CPU ram.
Either of the above can be expensive.
This is why most T&L cards come with at least 32MB of ram.
I can tell you, as I have tested it, the GF2 MX 32MB cards in a 900+Mhz CPU will perform much slower with T&L enabled versus no T&L.
And on all GF2 cards (have not yet tested the GF3), if there are more than 8 light sources, the performance goes into the bit-bucket with T&L enabled. This is usually not an issue for a flight sim, but worth noting.
One of the misunderstandings about T&L, is one you brought up. It will not enhance the visual quality of a 3D image. It only serves to move current operations from the CPU into the video card. In fact, it can have a detrimental effect on the quality of the image, depending on what lighting algorythm is used on the card.
I am speaking from an experience point of view as I have written a lot of code to test the various impact hardware T&L can have and not have.
The GF2 cards are more L than T in thier hardware implementation, while the Radeon is pretty balanced between the two. GF2 cards really did not add anything for the T side to enhance anything other than basic texture management, where the Radeon actually added cubic environment bump mapping.
Anyway,.. I guess that is enough for right now.