HT is mostly marketing hype. No different than Intel's marketing hype about quad-pumped memory access. HT and FSB are both full duplex.
If you use AMD's math, then Intel's FSB is currently 1.6Ghz. In serial (HT) versus parallel (FSB), there are advantages and disadvantges to each design. The current implementation of the HT bus has drawbacks. These will become clear when 64bit native applications ship. As per MS, they have already said most 64bit applications are going to run slower than the 32bit counterpart. It is not hard to see why.
AMD had no choice but to move the memory controller on die as no one could build an external memory controller worth a damn for thier CPU's.
Architectural differences between CPU designs dictate the best approach to handling memory access. The current serial memory bus (HT) has definate drawbacks. Until a serial bus has the flexibility to access datum types based on size, performance will always remain inconsistent. The easiest way to handle this would be the use of multiple serial channels which scale based on data being accessed. Or get the serial bus running at 12.8Ghz so that 64bit data accesses will not impede performance. Or, get multi-sample per clock implemented.
Of course, the memory bus itself (HT or FSB) is still the gating factor in performance. I see many AMD people talk about how speeding up the memory bus does not make a lot of difference with an AMD CPU.
Well, that is just wrong, as the speed of memory access will dictate the actual performance of a system. The only time this would not be true would occur when all the information needed to run the computer resides in the CPU cache. Not going to happen.
Addressing the dual core issue. The biggest single performance hit in a multi-CPU environment is bus contention. It is expensive. This happens whenever multiple CPU's need access to anything outside of thier own silicone.
Many (read most) operations in an operating system occur in a very synchronous manner as the operating system is simply acting on behalf of the application.
An application reading data will have zero gains in performance on a multi-CPU system and could actually degrade in peformance due to bus contention depending on the efficiency of the locking mechanism in the operating system and hardware. Even the most efficient mechanisms penalize the overall system performance.
If two applications need access to the same data, then it would be best handled by one CPU, not two. The memory contention and cache miss would hurt performance in this instance as well.
A multi-CPU system is most efficient when running, at least, two desparate applications which require no data between them and make little use of external shared libraries. That is a best case scenario for a mutli-CPU system.
In your scenario Kev, I do not see a gain in performance. The bus contention would probably negate any cache hit gains. Most of the time the operating system is just handling I/O and is bound by the performance of those devices.