AMD Bulldozer - FX 8150 Performance Review

Product: AMD FX 8150 / Asus Crosshair V
Company: AMD
Author: James Prior
Editor: Charles Oliver
Date: October 11th, 2011

Product Strategy

The overall impression from AMD's Zambezi tech day was that the Bulldozer microarchitecture was designed for low power, high scalability and high performance/watt. This is the x86 server market, where for the last decade the more CPUs in a box, the better; higher density and efficiency meant more performance per dollar and performance per watt was delivered. Pretty soon we saw 8 socket servers, and environments that leveraged them. On the desktop, this cycle took longer, waiting for the multi-core revolution to really kick off and pretty soon one CPU core became two, became hyperthreaded, became four and then six.

With the design team focused on a scalable design for throughput and with a server workload bias under an all encompassing performance-per-watt mantra, it's no wonder that the resulting product is very wide - eight x86 cores, with lots of cache: 2MB L2 per module and 8MB of L3 shared between all four modules. A Bulldozer module consists of two integer thread processors, a separate floating point thread processor, and instruction decode, scheduler and prediction capabilities.

AMD's focus on performance/watt for the CPU design mirrors the shift in focus we saw for GPUs. In 2009, AMD was careful to point out how the Evergreen and Cypress designs were both good gaming products and useful compute products. Nearly two years later, AMD's flagship professional and server products based on GPU technology all use Cypress as their core, as AMD attempts to gain marketshare in the parallel compute market. Bulldozer is more of the same strategy, designed to combine with a discrete class or better GPU, either as an APU (Trinity) or as a platform design in servers and workstations.

Cypress, and its successor Cayman, turned out to be great enthusiast consumer products. They weren't the fastest ever, as NVIDIA's GF100 and GF110 outperformed them, but the AMD cards were priced appropriately and more importantly, lower. Performance/watt and performance/$ was a clear win for the Markham boys. The problem is the enthusiast consumer market is shrinking, becoming more and more of a niche. As the market shrinks, so does AMD's incentive to design, build and market a product specifically into that segment. For GPUs, the requirements for GPU compute and gaming are pretty close and are actually converging as new techniques and API functions leverage that compute power aim to make software development and design more consistent and easier. The design that can be executed successfully into the compute and professional markets works well in the consumer space as well, with minimal changes. There are obvious concessions to the gaming needs, like GCN's inclusion of fixed function hardware for rasterization operations, texture operations, etc.

For the CPU market, there are differences between what is useful for a server and the popular desktop PC use model, and also between the normal desktop and the enthusiast consumer platform and workstation. The differences in needs would require three designs, where today AMD are offering two - the APU platform, and the new Bulldozer-based FX platform. For the server side, drop in upgrades on verified platforms are the order of the day (with a BIOS update, naturally). This isn't really for the end-purchaser's benefit, but more for the OEM and system builder partners who can now validate and specify Bulldozer-based products far quicker than if they were testing a whole new platform.

AMD considers the Bobcat architecture to consist of 'small cores', and Bulldozer to be 'big cores'. Server workloads are either megathreaded or megatasking, and need more 'big cores' offering integer performance and throughput in a fixed TDP. Interlagos, the Opteron with multi-chip module design (two Orochi dies on a single package with quad memory controllers) has that covered, in spades. Although, it might be a matter of debate as to how 'big' those 'big cores' are considering improving IPC wasn't the highest priority, even if the IPC vs. STARS/K10.5 is increased and shared functionality does a good job of reducing die size and thus power use.

For the consumer side, the 8-series chipset was re-launched with a new CPU socket, the 9-series boards featuring AM3+ socket. The socket is backwards compatible, meaning existing AM3 processors will work in it, but AM3 is not forward compatible - this is an electrical upgrade offering support for the new power states and requirements of Zambezi products. The requirements of consumer enthusiast platforms is less about scalability and more about n threading (where n is between 2 and 4), and single task performance. There are notable cases for more cores, like video editing/transcoding, image processing, compression, encryption but the problem is that these functions tend to be more efficiently and swiftly accelerated by GPU compute instead, so that future problem is being addressed with AMD's Fusion System Architecture.

AMD's concessions to the consumer market are clock speed, and lots of it. The outgoing Phenom II X4 980 is clocked at 3.7GHz, and the Phenom II X6 1100T at 3.3GHz with 3.7GHz Turbo mode. The top FX processor runs at 3.6Ghz on all 8 cores, except when it doesn't and runs at 3.9Ghz courtesy of the first stage of Turbo boost. The second stage takes it to 4.2GHz on four cores (any four, regardless of module). Introduced at ~$245USD, this is aimed square at the top end of the market for enthusiast CPUs.

Bulldozer will be part of the Fusion family soon - very soon if reports of a Q1 introduction of Trinity are correct - and this will address some of the concerns. Trinity is part of the Piledriver family, which is the next Bulldozer evolution - AMD is ramping up how changes are rolled into designs, and pushed into products. Each iteration is aimed to provide at least a 10% per core performance increase, delivered through design and process improvements as well as clock speed. AMD have working Piledrivers in various forms so they can be fairly confident in what they're saying, although we hope it won't take near as many stepping revisions to get final silicon as this first FX did (B0 samples were the first seen out in the wild, and AMD are launching with B2[G]). What we're hearing is that while Trinity is coming in 2012, featuring Piledriver modules, the FX replacement isn't coming until 2013; and when it does, it has Intel's Haswell architecture to compete with.