First off, a USB bus connected sound system suffers enormous interrupt overhead, as each byte traversing the USB bus requires an interrupt to be serviced. No way around that one. Each time an IRQ happens, it invalidates the CPU cache as well. It also causes all cores to suspend until the IRQ is serviced.
Next, FFT code running on a CPU is going to be orders of magnitude slower than a dedicated hardware FFT chip. Those high end audio cards can parallel process up to 256 FFT instructions per cycle, versus the 4 a high end quad core CPU can and those hardware FFT chips can process those instructions with fewer cycles as well.
As far as the 3D positioning of sound, in a game, goes. That is actually done on the graphics card, not the CPU. When you use 3D positioning in a game, you set a vertex point as the source and a vector for the direction. It is translated by the GPU, not the CPU and you use the translation for the 3D positioning. Virtually no overhead at all. What overhead there is, is the same for a high end audio card, versus a dumb motherboard chip.
Now, if you are talking about Dolby, that is a different beast. Dolby support via their hardware chip, on a sound card, is much faster than the CPU can accomplish. Licensed sound cards use the hardware from Dolby labs for this. However, only a moron would use Dolby positioning in a game.