AMD Hot Chips 2012

Company: AMD
Author: James Prior
Editor: Charles Oliver
Date: August 28th, 2012

Hot Chips 2012

AMD CTO Mark Papermaster in back in his old stomping grounds of Cupertino to deliver the opening keynote of the Hot Chips 2012 symposium, spilling the beans on what AMD is calling the Surround Computing Era. He is joined at Hot Chips by not one but four AMD Fellows Mike Mantor, GCN Architect; Bryan Black, Jaguar core Architect; Jeff Rupley, Jaguar Architect, and Sebastien Nussbaum, Trinity SoC Architect.

The Surround Computing concept is simple, the premise is that computing is everywhere and pervasive throughout our lives and no longer a distinct or distance activity; we're not stuck next to power substations or chained to desks or even tethered to electrical outlets, we're carrying computing everywhere with us and expecting what we have experienced to be back at home or in the office when we get there, too. Computing is entering new form factors at a furious rate, and we want, expect and demand it to be done in a natural, human manner for interfacing with our lives - a need for natural user interface paradigms that learn our way of communicating, not constraining us within the boundaries of the technology. This requires more, much more, compute power.

The compute power is needed in two places, both the front end natural UI level in your pocket or on your person, and in the amorphous cloud where things need to be processed to get to you. Data demand over the internet is exploding, and database structures like hadoop are changing what is stored from structured, relational data to unstructured, event driven data that requires superior index and search capabilities to deliver on the promise of asking new questions and finding new information.

This analysis drives AMD to consider how the previous interactive computing revolution made possible by accelerated graphics will evolve into the interface revolution - client devices that interact with people in an increasingly natural manner. We're seeing innovations from vendors like Synaptics with new touch and gesture hardware, but the visual and audible aspects also need developing beyond Kinect, Move, Siri and Google Voice. This is the end to which the HSA Foundation was created, bringing the power to interpret natural human actions into something that can be assisted by computers to simplify life - context aware augmentation of daily tasks, from simple things like helping you stick to your diet to being a valuable resource in collating and sorting non-obviously related data. Fundamentally this drives AMD to consider where the computing enhancements will be needed, and it is in two places - the client and the cloud.

For the cloud, AMD is talking about their first evolution of the Bulldozer architecture, codenamed Piledriver. This architecture is in the wild, found in APUs alongside the VLIW4 architecture graphics first seen in 2010's Cayman graphics processor series, and is codenamed Trinity. Being the first revision of Bulldozer and a direct replacement for the STARS architecture, most focus is on the improvements from STARS to Piledriver in the APU space, but there are improvements from Bulldozer on the pure CPU side of things, namely in the instruction predictor and scheduler, as well as attention to the L2 cache efficiency. AMD are starting to address the shortfalls the enthusiast performance market found in the FX series of processors while continuing to address the improvements needed for the primary purposes of the architecture, cloud computing and APU integration.

After Piledriver comes Steamroller, the third revision of Bulldozer architecture and somewhat of a deviation for AMD design. While the focus is still on performance per watt and computational efficiency, the Steamroller design goes away from hand placed, manually optimized for density and speed to using a trick from the GPU design team - placement via high density library. Process tools are becoming more and more important as density increases, and the new HDL design of Steamroller offers 15%-30% area and power reductions; the type of gains normally seen with a process node change. This manifests itself as lower energy per operation for power constrained devices - i.e. HSA platform SoCs, a critical market. This also allows tuning for performance, now AMD is claiming a 30% improvement in operations per cycle and 'no compromises' for two thread performance inside a module.

When AMD bought ATI they wanted to bring the GPU into the CPU and make it a peer level device, recognizing its importance and key place in the whole system and leverage it for more than vector graphics. As this process begins with HSA, there's no reason to think that GPUs are the only devices that need to be included and elevated to this position of equality; indeed, the foundation for truly super computing is constrained in how we describe our data, our problems, to machines and how they work together to gives us more answers and a new set of questions to ponder. To that end AMD are talking about die stacking and how different process technologies suit different aspects of an SoC better than others.

Large monolithic designs with high powered mainboards may have provided high performance in the past, but at high cost. As the data moves to be less structured and the questions asked of computing getting broader, the compute needs change to being more parallel. Making CPU's into HSA platforms with high bandwidth interconnects is the next step, and AMD are there with the oddly enthusiastically named Freedom Fabric. This off chip interconnect allows massive numbers of SoC modules to share I/O and aggregate bandwidth for the benefit of all, and gives a hint to the direction AMD are heading towards.

While FX may not be the processor choice for desktop enthusiasts, the Bulldozer based Opterons have enjoyed considerable success and there's no reason to suggest that this year's Piledriver revision and next year's Steamroller won't capitalize on that. Trinity has already startled the world with its performance against the behemoth Intel, and the evolution into Steamroller with (presumably) GCN architecture graphics integrated even closer to the CPU can only get better. The second generation 'small' core APUs are based on Jaguar architecture, which gets 28nm and the ability to be a quad core design, with added instruction set support and architecture improvements delivered in a smaller package at lower or similar power. Designs using Jaguar cores have the unenviable task of improving on AMD's most successful product to date, and also take AMD into new markets. AMD claims a typical IPC improvement of over 15% for Jaguar vs. Bobcat, and a >10% frequency improvement, too. That's a healthy bump considering the move to four cores from two in Brazos, and the new chip reduces space and power too. Without knowing details of the graphics architecture paired with the little kitten that could, it's hard to say how this will play out but hopefully is are going to push for the most recent graphics and latest HSA features to make it the most attractive APU yet.

The fundamental approach here is to bring smart designs for low power and high efficiency. As a performance desktop enthusiast it's hard to not be discouraged by statements like 'pure speeds and feeds race is over' - FX wasn't what many desired in the competition against Sandy Bridge or Ivy Bridge. But the correct focus is on what those designs are making possible, and making that better. Better gaming, better video consumption, better interaction with computers is the end game for AMD. It just might not be delivered in the same way, like when PC desktops rose to gaming prominence over cartridge consoles, and how digital delivery is changing how we buy and enjoy music, TV, movies and games. With the flexible design strategy HSA offers, AMD may be in a unique position to provide the horsepower for many different types of device - tablets, ultra thin and light notebooks, desktop replacement notebooks, set top boxes, network security appliances, cloud infrastructure, smart TVs - next gen game consoles?

AMD Goes Red