Diit.cz - Novinky a informace o hardware, software a internetu

AMD Bulldozer v podrobném testu od SiSoftware

8-Core Bulldozer CPU
Lidé ze SiSoftu nedopatřením zveřejnili výsledky testů procesoru AMD FX se čtyřmi Bulldozer moduly, tedy s osmi jádry. Výsledky již z webu zmizely, zavčasu je ale zachytila Google Cache a my jsme se rozhodli, že vám celý článek tak, jak se 3. září objevil, ukážeme (pro případ, že by z Google Cache zmizel)…

My si samozřejmě tabulky vypůjčíme, přičemž nemůžeme minout hned tu první, z níž je zřejmé, co bylo porovnáváno: desktopový Zambezi na 2,8 až 3,8 GHz s desktopovým Sandy Bridge na 3,0 až 3,6 GHz.

Specs. CPU AMD "Zambezi" Intel "Sandy Bridge"
Speed / Turbo 2800MHz (14x200) / 3800MHz (19x200) 3000MHz (30x100) / 3600MHz (36x100)
CU / Cores / Threads 4CU / 8C / 8T 4C / 8T
Caches L1/L2/L3 8x16kB / 4x2MB / 8MB 4x32kB / 4x256kB / 6MB
Power (TDP) 95W 95W
Cost (USD)    
Memory 2x DDR3 PC3-10700 2x DDR3 PC3-10700
Speed/Timing 1333MHz 9-9-9-24 4-33-10-5 1333MHz 9-9-9-25 4-34-10-5
Memory Controller Speed 2200MHz (11x200) 3000MHz (30x100)

Budeme předstírat, že nevidíme překlep v max. taktu u Zambezi, pokud by tam místo 3,8 GHz bylo 3,7 GHz, pak se jedná o model FX-8100, tedy nejpomalejší chystané osmijádro postavené na mikroarchitektuře Bulldozer. Intel Sandy Bridge tak, jak je představen (s uvedenými takty), neexistuje, Intel nemá ze Sandy Bridge nic, co by mělo základní takt na rovných 3 GHz a v Turbu to šlo na 3,6 GHz. Teoreticky by mohlo jít např. o Xeon E3-1235 podtaktovaný v základu z 3,2 na 3,0 GHz, Turbo ponecháno na 3,6 GHz. Nebo to může být jiný podobný ekvivalent s upravenými násobiči. Přijde nám však hloupé dělat takovéto modifikace, protože se pak netestuje vůči reálným produktům. Na mysl se nám též dere úvaha, že je to vtípek někoho ze SiSoftu a že to také může být jeden velký podvrh.

Nicméně výsledky si prohlédněte v původních tabulkách, které i s průvodním textem v původním znění najdete níže jako screenshot ke stažení.

CPU Processing Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
Native Dhrystone/Whetstone (GOPS) 54.64 (34% lower) 85.15
(baseline)
The native CPU benchmark reveals that Intel had done a very job with "Sandy Bridge" and it is hard for AMD, even with a fresh design, to keep up with its very latest. If it came earlier it could have defeated the older "Lynnfield/Nehalem" (Core gen 1).
Java Dhrystone/Whetstone (GOPS) 37 (49% lower) 72.7
(baseline)
Java VM is for sure not Zambezi's strong point, as it scores 50% lower. Considering Sun Microsystems (now part of Oracle) sold AMD Opteron servers you would expect the JVM x64 to be tuned for AMD.
.net Dhrystone/Whetstone (GOPS) 17.66 (33% lower) 26.55
(baseline)
In the .Net environment, AMD catches up a little bit, but still not up to par.
Multi-Media Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
CPU Multi-Media SSE/128-bit (Mpix/s) 132.3 (15% lower) 155 AMD CPUs were always good performers in Multi-Media tasks and this is a respectable result. We hope that the gap will not be wider when "Ivy Bridge" launches (Core gen 3).
CPU Multi-Media AVX/256-bit (Mpix/s) 147.4 (33% lower) 217.5
(baseline)
Using AVX, "Sandy Bridge" is 50% faster than SSE(x) while Zambezi does not improve much (most likely due to the shared FPU within CUs); the performance gap has thus doubled. At least you don't need to pay to upgrade your software to AVX.
GPU Multi-Media (MPix/s) - 20.13 A pretty woeful result from Sandy Bridge's so-called GPU, any GPU AMD will include in the APU version should blow this away.
GPCPU OpenCL (MPix/s) 46 (17% lower) 55.6
(baseline)
A decent score coming from the Zambezi chip, however a difference that cannot be ignored remains between the two.
Java Multi-Media (Mpix/s) 22.86 (10% lower) 25.32
(baseline)
Java VM is a Multi-Media scenario that's quite favourable to the AMD CPU and with optimisations it could overtake its competition.
.Net Multi-Media (Mpix/s) 15.73 (22% lower) 20.15
(baseline)
The .Net environment is not as kind to the AMD CPU, we're seeing twice the gap of Java.
Cryptography Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
Crypto (MB/s) 925 (16% higher) 797
(baseline)
In ALU mode (no AES and no AVX) the AMD chip scores its first win by a good margin (16% faster)! Finally the "true core" design shines through!
Crypto AES/AVX (MB/s) 1277 (44% lower) 2270
(baseline)
Despite having both AES and AVX support, Zambezi is almost half as fast; looking at the individual results, it seems its AVX hashing perfomance is the issue (337MB/s vs. 943MB/s) - again most likely due to its shared FPU design.
GPGPU Crypto (MB/s) - 417
(baseline)
The integrated GPU in Sandy Bridge just manages 50% of non-accelerated CPU performance and 20% of accelerated AES/AVX performance. Why bother?
GPCPU OpenCL Crypto (MB/s) 583 (18% higher) 494 (baseline) In OpenCL AMD takes the lead by almost 20%, though to be fair it is AMD's OpenCL run-time, they are not going to optimise for the competition.
Memory Bandwidth Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
Memory Bandwidth (GB/s) 15.33 (13% lower) 17.58
(baseline)
While the memory controller in the AMD CPU run 36% slower, it is only ~10% thus it looks to be more efficient. If only it could squeeze more bandwidth out of the memory but at least it does better than previous Core CPUs.
Memory Bandwidth AVX 15.41 (12% lower) 17.56
(baseline)
With AVX, Zambezi improves a bit though not enough to be noticeable.
Memory Bandwidth AVX, Internal GPU enabled - 17.3
(baseline)
Unlike many designs, Sandy Bridge does not lose performance when the GPU is enabled. We shall have to see how the AMD APU behaves.
GPCPU OpenCL Memory Bandwidth 8.44 (34% lower) 12.73
(baseline)
Even though we're using AMD's own OpenCL runtime, memory transfers are 35% slower, 3x the gap we saw in native memory bandwidth benchmarks. It seems AMD's OpenCL team has some optimisations to make for Zambezi!
Transcoding Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
CPU Transcode (kB/s) 657 (22% lower) 837
(baseline)
Even with its 8 real cores and improved multi-threading, Zambezi still cannot match Sandy Bridge even in software mode.
GPU Transcode (kB/s) - 4800
(baseline)
It is not fair to compare the transcoding capabilities found in a modern GPU with those of a CPU, but we cannot help it: Intel's QuickSync is a "killer feature" even for high-end CPUs. You'd need 6-times more cores (24 = 4x6) to match the performance in software! Zambezi would need ~8-times more cores (64 = 8x8), so keep an eye for that 8-way Opteron system.
Cache and Memory Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
Cache & Memory SSE-128 (GB/s) 66.45 (25% lower) 88.7
(baseline)
The newly approach with the shared L2 cache in the Zambezi may not be the best given the bandwidth measured.
Cache & Memory AVX-256 (GB/s) 69.7 (25% lower) 93.16
(baseline)
We see the same difference in bandwidth when the AVX instructions are used; both systems improve marginally by using AVX.
Cache & Memory int. GPU AVX-256 (GB/s) - 88.63
(baseline)
It remains to be seen whether how much the APU version of Zambezi is affected when the internal GPU is enabled; Sandy Bridge is not affected much - unlike older systems with shared graphics.
Latency Benchmarks AMD "Zambezi" Intel "Sandy Bridge" Comments
Linear (ns) 12.3 (73% higher) 7.1
(baseline)
Zambezi's result is not satisfactory in latency terms: another line in the list of improvements that need to be made in future revisions or perhaps a BIOS optmisation is enough.
Random (ns) 98.4 (34% higher) 73.5
(baseline)
The random latency test fares better for AMD but it is still too high considering we use the same memory speed/timings. The memory controller needs some optimisations to be competitive.
Power Efficiency (this measures the efficiency of power design, or TDP) AMD "Zambezi" Intel "Sandy Bridge" Comments
Standard Performance vs. Power 54.64GOPS 95W 59.52MOPS/W (34% lower) 85.15GOPS 95W
89.63MOPS/W
The performance gap in the native processing benchmark reflects here as both chips have the same TDP, with Zambezi fielding 4 CUs with 8 cores against Sandy Bridge's 4 cores and 8 threads. At least Zambezi is more efficient than previous Core CPUs ("Nehalem"/"Westmere" - these are ~40-50% less efficient than "Sandy Bridge" while Zambezi is "only" 34% less efficient).
Multi-Media Performance vs. Power 147Mpix/s 95W 1.55Mpix/W (32% lower) 217Mpix/s 95W
2.28Mpix/W
Similar difference in AVX mode, but in SSE(x)/128-bit mode it's "only" 15% less efficient; so for current and older software the gap is not significant. It also beats previous Core CPUs as they don't support AVX anyway.
Cryptographic Performance vs. Power 1277MB/s 95W 13.4MB/w (44% lower) 2270MB/s 95W
23.9MB/w
The highest difference yet, even though Zambezi has 8 real cores not 4 like Sandy Bridge (and AES & AVX support).
Speed Efficiency (how performance scales with speed and how they perform at the same speed) AMD "Zambezi" Intel "Sandy Bridge" Comments
Standard Performance vs. Speed 54.64GOPS 2800MHz 2.02MOPS/MHz (29% lower) 85.15GOPS 3000MHz
2.84MOPS/MHz
Similar to the power efficiency, Zambezi is 30% less efficient clock for clock; this means it is less efficient than even previous Core CPUs (which are only ~10-15% less efficient than Sandy Bridge).
Multi-Media Performance vs. Speed 147Mpix/s 2800MHz 53kpix/MHz (26% lower) 217Mpix/s 3000MHz
72kpix/MHz
Similar difference in power efficiencies, Zambezi is 26% less efficient in AVX mode clock for clock.
Cryptographic Performance vs. Speed 1277MB/s 2800MHz 0.45MB/MHz (40% lower) 2270MB/s 3000MHz
0.75MB/MHz
The highest diference of all tests, 40% less is pretty significant clock for clock.

Závěr

Závěr lidí, kteří tento test spáchali, je, že AMD je od pohledu na výsledky celou generaci za Intelem. Je to lepší než kdysi K8 a v některých testech dokonce lepší než konkurence (Cryptography, Multi-Media), ale třeba 256bit. výkon (AVX/FMA) je prostě pomalý.

My k tomu jen dodejme, že jde o syntetické testy a praxe může vypadat jinak (lépe, ale i hůře). Raději bychom také viděli srovnání nejrychlejšího procesoru, který bude (snad) zanedlouho uveden (FX-8150 na 3,6 až 4,2 GHz) s nejrychlejším mainstreamem od Intelu, jímž je Core i7-2600K na 3,4 až 3,8 GHz.

WIFT "WIFT" WIFT

Bývalý dlouholetý redaktor internetového magazínu CDR-Server / Deep in IT, který se věnoval psaní článků o IT a souvisejících věcech téměř od založení CD-R serveru. Od roku 2014 už psaní článků fakticky pověsil na hřebík.

více článků, blogů a informací o autorovi

Diskuse ke článku AMD Bulldozer v podrobném testu od SiSoftware

Pátek, 9 Září 2011 - 22:07 | Radim Horácek | Já už jsem pohřbený, já jsem začínal když u nás...
Pátek, 9 Září 2011 - 15:32 | JAMes | Jak Obr na pctuningu ukázal, Bulldozer 8jádro je...
Pátek, 9 Září 2011 - 15:20 | JAMes | S tím nesouhlasím, v dnešní době stačí malinká...
Pátek, 9 Září 2011 - 10:36 | Tomas Kopecky | taky pamatuju dobu, kdy jsem si "nahrával...
Pátek, 9 Září 2011 - 10:00 | webwalker | Tak. A ještě upřesním, že ten "výkon na...
Pátek, 9 Září 2011 - 09:28 | maruširi | Tak o tom byl snad můj první příspěvek, ne?
Čtvrtek, 8 Září 2011 - 22:53 | Razee | Tady bych si dovolil tvrdit, že tvrzení "...
Čtvrtek, 8 Září 2011 - 19:23 | MiMo007 | mam pocit, ze no-X mel na mysli, ze HD video do H...
Čtvrtek, 8 Září 2011 - 19:05 | no-X | Člověk nikdy neví, jestli s vyladěným BIOSem,...
Čtvrtek, 8 Září 2011 - 18:19 | siddhi | Třebas všechny hry běžící na Unreal enginu (a že...

Zobrazit diskusi