/// AMD uses macros to offer that instruction set and they will always suffer performance losses to any comparable Intel CPU when the SSE instructions are used. //// neither Sony, Adobe, nor Pinnacle use the Intel compiler.
a.) macros, by w/c you mean macro-ops? intel uses macro-ops too(see:
http://www.anandtech.com/show/1998/3), w/c are then fed into schedulers that emit the machine level instructions/assembly. i dont understand comparing SSE performance, one has to go down a bit further because SSE is not what's being executed by a cpu, but its equivalent assembly-level instruction. that is when one then looks at micro-op instruction latency and throughput tables for those many assembly-level instructions. for a comprehensive list see:
http://www.agner.org/optimize/instruction_tables.pdf. i suggest reading P.71 for intel SB and P.161 for amd K10(athlon/phenom/thuban). if it gets too tedious, a much more concise list is available for common instructions:
http://gmplib.org/~tege/x86-timing.pdf, starting at P.3. for intel SB and amd k10 comparison. as you can glean from the tables, it is no easy task to compare "SSE performance" between two modern uarchs. Even people who do this for a living don't say that because it is far more complex than anyone can imagine. One has to actually disassemble the app/workload into assembly, count the latency for each and every instruction, to arrive at a value for latency/throughput. no easy task, but if you have time do disassemble AH and the related DLL's... that would be some exercise! =)
the more likely explanation for this "better SSE performance" is not the hardware but the way software is optimized.
there are many ways to optimize software for a specific uarch, summarized in this post i made w/ a lecture slide from UC-Berkeley:
http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=138786&start=125#p210572even cache-management strategies alone can yield vastly differing results between amd(exclusive-inclusive) & intel(inclusive) cache systems.
but of course, the easiest way to optimize for intel is use ICC, since it automatically de-optimizes for via/amd. =)
also, about sony not being optimized for intel..
i haven't found any optimization info for Adobe/Pinnacle, but intel pays a lot of $$$ to developers for intel-specific software optimization.
one last thing about performance, and it's about cache, did you know amd's so-many-years old k10 architecture has faster/lower latency L1, L2 than even intel SB?:
http://images.hardwarecanucks.com/image/mac/reviews/AMD/Bulldozer/3.jpgthe bottom line for this long post really is, it's not SSE performance that intel uses to gain a lead, but intel-specific software optimization.
from using ICC(in some cases) to tailoring code to work best for one uarch over the other.
The Intel compiler is made by Intel. Yes it does favor Intel CPU's. Guess what? THEY WROTE IT! ////
b.) my issue is the use of ICC in the many windows benchmarks misleads the ordinary consumer into believing a certain CPU is far better than the other, when in actual workloads they work just the same if not even marginally better. i dont even see disclaimers. i would call that fraud & misrepresentation.
Most of your support information is from AMD, or AMD support sites. They are biased towards thier own product. //// If you chose to partake of only one side of a story it will, inherently, cause you to make potentially poor decisions.
c.) unfortunately, these are the only places you can read non-mainstream opinions that are de-popularized by big corporation marketing. where would you want me to read about amd's good points? anandtech? xD
///// Intel is just following thier lead.
d.) intel has more than cheating to worry about, like for example bringing their latest gen GPU in SB to 2011 standards. as it is, it performs only as well as a 2005 nv/ati GPU:
http://techreport.com/articles.x/21099/11, it has good fps in reviews mainly because it is rendering less work/worse picture. i'm not even going to mention drivers that are worse than ati/nV combined.
//// Basically it is saying the Intel CPU's execute certain instructions better than AMD and for some reason we are supposed to think that is a bad thing.
e.) it is a bad thing if it misleads people. again, running SSE2 for benchmarks if its an amd cpu and SSE3+, SSE4+ if intel is a fraudulent practice by benchmarketers.
//////// The only area AMD falls flat on its face is in the area of streaming video or anything that makes extensive use of the SSE family of instructions (most high end video editors). A bit of a shame as AMD has a better FPU than Intel does.
actually, majority of SSE instructions for media/video editing/transcoding are integer. not float. see 3-operand AVX, XOP.
/////// Still waiting for AMD's Bulldozer to see what it brings to the application party.
great for multithreaded workloads, but for single threads, the narrow cores dont do very well. =)
the 8-cores are targeted for people who run heavily-threaded desktops/multiple apps open simultaneously at the same time.
people who use lightly-threaded apps or use only one app at a time on desktop should be buying faster duals/quads, anyway.
-mainconcept
http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=102&Itemid=1&limit=1&limitstart=17-mediashow
http://www.guru3d.com/article/amd-fx-8150-processor-review/14-h.264
http://www.guru3d.com/article/amd-fx-8150-processor-review/14-vp8
http://www.guru3d.com/article/amd-fx-8150-processor-review/17-sha1
http://www.guru3d.com/article/amd-fx-8150-processor-review/17-photoshop cs5
http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=102&Itemid=1&limit=1&limitstart=14-photoshop cs5
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-15.html-winrar, faster than 2600k
http://www.techspot.com/review/452-amd-bulldozer-fx-cpus/page7.html-winrar, improves over x6
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-16.html-7-zip better than 2600k here:
http://images.anandtech.com/graphs/graph4955/41698.png http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/7-7-zip same perf as 2600k
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-16.html-POV-ray, faster than 2600k
http://www.legitreviews.com/article/1741/10/-POV-ray
http://www.nordichardware.se/test-lab-cpu-chipset/44360-amd-fx-8150-bulldozer-goer-entre-pa-marknaden-test.html?start=15#content-x264(2nd pass AVX enabled)
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/7-x264 (2nd pass, better overall than 2600k)
http://www.bjorn3d.com/read.php?cID=2125&pageID=11108-x264 (2nd pass +.3 than SB2600k)
http://www.legitreviews.com/article/1741/7/-handbrake;
http://www.legitreviews.com/article/1741/9/-truecrypt;
http://www.bjorn3d.com/read.php?cID=2125&pageID=11111-solidworks; faster than 2600k
http://www.techspot.com/review/452-amd-bulldozer-fx-cpus/page7.html-abbyy filereader
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-16.html-C-Ray, as fast as $1k i7-990X,
http://i664.photobucket.com/albums/vv4/wuttzi/c-rayir38.png