Deja, did you mistype or did you actually think the p3 benched worse than the p2? You should know that the p2 and p3 (at least the Katimi (sp) core) were totally identical except for SSE 1 support and the processor serial number. They benchmarked identically unless SSE was used.
I also find it hard to believe the P4 is specifically designed for video, etc. That's just putting a positive spin on the fact that the pipeline is very long, so the applications that get the least performance hit are going to be those that "behave well" with regards to branch prediction. Anyone wan't to guess what applications those would be?

Besides some SSE 2 instructions, I personally can't find a single architectural feature of the Willamette (current P4s) core that is "optimized for video." Nah, it's just that in general video applications perform the same algorithms time after time and generally the conditional statements are very easy to predict.
On the other hand the p4 does have a few very excellent features, but the current clock speeds the chip is running at prevents them from helping the performance much. (I am of course speaking of the ALUs running at 2x speed. As clock speed increases the integer performance difference between p4 and Athlon should shrink up some more.) Right now there is hardly anything the p4 does better than the Athlon, even if the p4 has a 200 - 400 Mhz clock speed advantage.
Personally I'd say if AMD falls behind, which I don't see happening for a while, AND once the 478 pin socket and the new .13 micron P4s (I think Northwood is the name being used) are in full scale use the p4 may eventually be OK in my book. Right now in it's current form it just stinks IMHO.

I'm really unhappy with both AMD and Intel right now, but at least I don't feel like I've violated some ethical code in telling someone the Athlon is better for the money than the p4. I'd have a pretty hard time telling ANYONE to buy a p4, mainly because of no upgrade path for all those who bought systems with Socket 423 and Rambus ram.