The 2.4 'C' type P4 is definately a very good deal and an excellent overclocker. I've done quite a few of these on Asus P4P800s. If you plan to overclock, it is certainly the best bang/buck ratio CPU.
The 3.2 GHz 'C' type P4 is well matched with the Athlon 64 3200+ (single channel) in a lot of gaming tests, but you said the Athlon 64 FX (dual channel memory and 200 MHz faster) in your post. The Athlon 64 FX is much faster (and a LOT more expensive) for gaming than the standard Athlon 64 3200+.
Utilization of memory is hardly a "second order" effect. Modern CPUs spend well almost 90% of their time waiting for data and/or instructions from DRAM. Memory latency (or the amount of time the CPU spends waiting for data from either DRAM or the hard drive once it is requested - measured in nanoseconds or CPU clockcycles) is BY FAR the number one most critical bottleneck on system performance. Bandwidth, or the maximum amount of data that can be transferred per given unit of time, is secondary to latency for almost all applications and certainly nearly every game. Video encoding is probably the one application where bandwidth is more critical than latency, as would be most vector operation intensive code.
Lets go into more detail here because I think this will be helpful for others to read later:
The bandwidth available from a single DDR400 memory channel during a burst mode transfer is 3200 MB/second and a single channel is 64 bits wide. Dual channel DDR400 offers a 128 bit wide memory path and 6400 MB/second maximum bandwidth. The P4 'C' type "800 MHz" effective FSB can offer 6400 MB/sec bandwidth and the Athlon XPs 400 MHz FSB can offer 3200 MB/sec bandwidth - BUT the latency is the same because both use a 200 MHz fundamental clock. Why is this important? If an Athlon and P4 issue a memory read request that request must travel over the FSB to the chipset Northbridge. That request is then processed by the Northbridge and translated to the physical location in main memory. The command for a read is sent to DRAM, the row is activated (RAS), and finally the column (CAS) is addressed and the data from memory (64 bits, 1 row, at a time from a single memory module) is placed into a Northbridge buffer. The contents of that buffer are then sent back over the FSB again to the CPU and placed in the CPUs L2 cache (usually) and finally the CPU can directly work with that data. Obviously this takes a considerable amount of time. How long? Depending on the data requested (32 bit, 64 bit doubles, 128 quad doubleword, or 256 bit vector) between 50 (32/64 bit - 32 bit is transfered as 64 bits and the top 32 bits are simply discarded) and 300 nanoseconds (256 bit). That's up to 900 clock cycles on a 3 GHz CPU for 256bit SSE2 vector data! Now, consider this, the P4 can issue 4 instructions per clock - 2 memory read/writes PER clock cycle and 2 more integer/floating point executes. The Athlon/Athlon 64 can do 6 per clock - 3 read/writes and 3 int/fp instruction. So, a 3 GHz P4 that issues 2 256 bit SSE2 vector data read requests could waste up to 1800 useful clockcycles doing essentially nothing. For the Athlon 64 at 3 GHz (doesn't exist yet) it could waste up to 2700 clock cycles if latency was the same for its main memory. Obviously memory latency is absolutely critical, and even more so for the Athlon. This is why the Athlon 64 (K8) CPUs have an on-die memory controller, because it offers vastly reduced latencies. What about bandwidth, well I kind of answered that already when I said the CPU can request 32 bit up to 256 bit (for SSE2) pieces of data. A single DDR400 channel can transfer a whole 64 bits per clock and a dual channel setup can send 128 bits per clock. Most memory accesses are 32 bit or 64 bit. That's why having dual channel memory doesn't always result in greatly improved performance. Dealing with dual channels can also slightly increase latencies. From what I said previously, I think you can now see why AMDs "400 MHz" Athlon XP FSB and the P4 'C's "800 MHz" FSB offering the same latencies are important. If you are dealing with 32 or 64 bit reads/writes they both have sufficient bandwidth and the extra bandwidth offered on the 800 MHz FSB is wasted.
You guys might be interested in this page of Aces Hardware P4 EE, Athlon 64 review:
http://www.aceshardware.com/read.jsp?id=60000258 Just look at how much the on-die memory controller helped the Athlon 64 CPUs in both latency and to a lesser extent bandwidth. This is the sole reason why the Athlon 64 series CPUs are so good for gaming, the reduced latency. It also explains why Quake 3 favored the P4 so much, look at the 64 bit latencies of the 'C' type P4 on i875 Canterwood versus the Athlon XP 3200+ on nForce 2. It's plainly obvious that the memory controller on the i875 is better than the memory controller on the nForce 2 chipset.