AFDS 2012 Day 1

Company: AMD
Author: James Prior
Editor: Charles Oliver
Date: June 13th, 2012

AFDS Day 1: Monday, June 11th

Monday was arrival day, with a few interesting sessions happening while everyone checked in and got registration taken care of. I sat in on a session titled GPU Accelerated Rigid Body and Soft Body Game Physics, presented by Erwin Coumans, AMD's Bullet Physics Architect and Takahiro Harada, also an AMD employee.

The session discussed and demonstrated how GPGPU facilitates accelerating physics calculations to make meaningful impacts on gaming design. Where a CPU becomes bottlenecked around 30,000 objects, a discrete GPU like the AMD Radeon HD 7970 can process 100,000 objects - in less time than the CPU, at around 10ms. The session discussed how the Bullet physics library can achieve this, along with how Heterogeneous Systems Architecture platforms (formerly known as Fusion) can speed things up leveraging the memory pointer abilities. Letting the GPU and the CPU address the same memory space meant that processing rigid bodies on the APU was now feasible, despite the large performance disparity between the APU GPU power and a discrete GPU like the 7970. When you consider the latencies from copying the data back and forth, before and after the GPU does the physics calculation, you can take that time and run a decent amount of objects through the APU graphics core without exceeding that total and still be faster than the CPU alone - and free up the CPU to do other work at the same time.

The second part of the session discussed soft body physics, typically represented as cloth or water interactions. Previous cloth demos we've seen on Bullet have been basic, simple scenes. For more complex scenes, more features are needed - more constrained types to help define mesh shapes. With this in mind, new versions of the Bullet API were introduced in the previous few months. Secondly, a new approach is needed for the parallelism in GPUs as simple series processes don't work well on GPUs so a simple dispatch of a kernel to solve a batch - a naive implementation of serial to parallel - will thrash the caches and memory causing a performance bottleneck and low efficiency; counter to the purpose of using the GPU in the first place.

Instead, the technique demonstrated was to dispatch not by batch but dispatch per soft-body and batch inside the SIMD, reducing reads by using a single SIMD to solve a softbody job computation. Processing on GCN architecture this can leverage local data stores for accumulation and pass onto constraint processing; solving the vertices using atomics and global memory. This serial-inside-parallel approach reduces power use and speeds up computation throughtput, becoming a more efficient method of solving the problem.

On APUs this can be spread between the CPU and GPU, allowing techniques for building volume hierarchy (needed for soft body collision detection) to be processed on the CPU and then passed to the GPU for processing the transversal of vertices. On a traditional CPU with discrete GPU system this approach kills performance by moving both the compute and the data, but on the APU shared pointers allow the CPU and GPU to overlap and run concurrently to process their side of the work

HP Sleekbook with AMD 'Trinity' APU

After the session, the assembled press were brought together for a mini-briefing of what's coming over the next three days and given weird little pouches. Some kind of decentralized dance party is coming to Bellevue, and AMD in involved in getting people in the streets for party time. After the briefing the experience zone was opened up where iBuyPower desktop systems and HP notebooks were on display, as well as stands from the main sponsors of AFDS - HP, Penguin Computing and Multicoreware.