It's tough to compare a Sparc, Alpha, or Pa-risc processor to a standard desktop PC processor. Dual Athlons are a very powerful combination, but are only a 32 bit processor. A sparc, alpha, or pa-risc are 64 bit processors. There are some areas (working with very large, very small, or very high precision numbers) where a 32 bit processor just isn't good enough. I'm not familiar with what exactly you are doing Vermillion, but I'm guessing that absolute precision is probably not that important. Very rarely I have to run simulations which basically involve running the same calcuation on literally billions of data points. (For example, when using Silvaco tools to calculate the voltages at as many points as possible within a 2d (or even worse 3d) mosfet transistor model.) It's quite possible that this is flat out impossible on a 32 bit processor. To even do a 64 bit floating point operation it takes many more clock cycles on an Athlon or P4 than on one of the 64 bit processors mentioned above.
Remember 32 bits means that you only have a range of 2^32 or 4294967296 possible results (answers). 64 bits gives you a range of 18446744073709551616. You tell me which one is better for serious math. It's also worth noting that should you need to deal with both positive and negative numbers you must sacrifice one bit to represent the sign. [Either twos-complement or ones-complement format.]
AMD and Intel both know this. Intel teamed up with HP recently to develop Itanium, which can do 64 bit operations and can also run x86 instructions (though performance suffers severely from doing so). AMD is supposed to release "hammer" at the end of this year. It takes a different approach and extends the x86 instruction set to be able to do 64 bit operations (called x86-64) while still maintaining the ability to to 32 bit opeations without resorting to any sort of software based emulation mode like Itanium must do. That means it can run all current programs that can be run on a PC AND has the capability to do 64 bit operations with no loss in speed. (It also has the ability to do them
at the same time. Itanium (at least the current version of it) cannot do this.)
There are good reasons not to do 64 bit unless it is necessary. I won't go too far into this, because it would get very technical, but suffice it to say that building a "simple" CLA (carry lookahead adder) addition unit requires MANY MANY MANY more transistors than a 32 bit adder. If you resort to the simpler CRA (carry ripple adder) a 64 bit adder is MUCH MUCH slower than the 32 bit equivilant. The number of logic gates (each made of transistors) needed for each additional bit of a carry lookahead adder increases by the power of 2 if speed is of top priority. (Notice I said MANY MANY MANY more.

) If speed it less of a concern it still requires at least 2x as many gates (transistors) than a 32 bit model. Similar tradeoffs exist for multiplication and divide units and registers.
Basically the manufacturing cost associated with a 64 bit processor are inherently much higher (larger die size, lower yield, more pins needed) than a 32 bit processor.