Beetle, the AMD64 instruction set is a superset of the standard x86 ISA. It simply extends x86 to support 64 bit addressing and widens all GPRs to 64 bits. In addition AMD64 makes some enhancements to the x86 ISA by doubling the number of directly accessible GPRs and doubling the number of 128 bit XMM registers. (XMM registers are used for SSE, 3dNow, SSE2, etc.) The default operand size is still 32 bits.
The Athlon 64 CPUs have 2 64 bit operating modes, a true 64 bit mode and a compatibility mode. Compatibility mode allows 32 bit applications to run without recompiling them. Full 64 bit long mode will require an application recompile, which will allow the applications to make use of the additional registers AMD64 has added. The 64 bit version of Windows runs in long mode, but 32 bit applications use what Microsoft is calling WOW (Windows on Windows). Essentially the 32 bit applications still run under a 32 bit version of Windows, which itself is running as an application on the 64 bit version of Windows. There really isn't much of a performance hit in doing this as the enhancements to x86 that AMD64 has added make up for the small performance penalty of WOW.
Edit: Kev, the 64 bit version of Windows DOES run legacy 16/32 bit applications in a software emulation mode (WOW - Windows on Windows). The difference with the Athlon 64 vs Itanium is that AMD64 is a superset of x86 with enhancements, meaning it simply extends x86 to 64 bits. Thus there is little to no performance hit. All WOW is really doing is bit thunking (i.e. extending 32 bit arguments to 64 bit) to allow them to be compatible with a 64 bit OS. The Athlon 64 CPU does have a compatibility mode, allowing 32 bit apps to run without recompiling them, but that does not allow the additional GPR and XMM registers to be used.
The Itanium is completely different in that it actually translates x86 instructions to IA64 instructions. That is why it performs so poorly, as IA64 and x86 are vastly different instruction sets. There is a huge performance hit in translating x86 instructions to a string of IA64 instuctions to perform the same task. Itanium used to do x86 emulation on the CPU itself, which didn't work very well. It is now primarily done in software in an approach not unlike WOW, but even still the ISA differences result in a severe performance hit. Boiled down to the simplest possible terms: The IA64 architecture uses a large number of execution units running at low clockspeed in parallel to achieve high performance, versus a small number of execution units running at high speed like x86. Thus Itanium must attempt to group x86 instructions together in such a way to allow it to keep as many execution units as busy as possible. This has proven to be very difficult. Thus when Itanium is running x86 code it is only actually making use of a small percentage of its execution units as opposed to Athlon 64 which is still fundamentally x86. Sorry if that is confusing, but I really don't have the time to explain this any more clearly than that...