NAVI No Longer MCM

and the only trick is if this multi GPU approach is completely transparent to the O/S and developers

We had this.
It was called 'SLI with driver profiles'.

And seriously, did you just ignore the bit where I pointed out that games (and indeed other high performance apps) don't treat the CPU as some kind of black box and have some awareness of what they are running on?

What you are suggesting is the GPU version of telling every app they are running on a single core, single thread machine and hope some magic lets them scale up via the OS or hardware just 'rescheduling work' behind the scenes...
 
At least the bandwidth issue is easily solved with the use of HBM as the bus itself passes inside the GPU die itself and no longer in the video card PCB, so it can be made as wide as it needs to be.....Current Vega chips use an HBM memory bus 1024 bits wide for each memory chip which is undoable on a PCB in the traditional way, and their memory runs at a pretty low clock speed, so the next generation coming up next year passing the 1 TB/sec mark wouldn't be the least bit surprising and doesn't even require a new memory type either.....It's all in how wide the memory bus is.

Awwww, bless, you think there is such as thing as 'enough bandwidth', how adorable :heart:

You put a 1TB/s bus on a single GPU and someone will find a way to use it to increase the quality of what is being rendered - that's basically the history of graphics development right there.

It's also not just about the total bandwidth, but about usage - you don't want one GPU to be doing a heavy bandwidth process at the same time as the other GPU, so you start going down the route of cross GPU sink/stalls and communications which just had a whole host of New **** That Can Go Wrong.

And this is without taking in to account the overhead for all the new signaling and various other factors which means as soon as you start hammering the bandwidth from two locations, in a non-coherent fashion your overall bandwidth availability drops faster than you expect. (Early PS4 docs, for example, had some bandwith numbers when the GPU and CPU where both touching the bus at the same time... the fall off was non-linear. Two GPUs only make the problem worse.)

Frankly trying to say that 'Oh, that can do it on a CPU, so a GPU is easy...' is naïve at best, utterly foolish at worse...
 
We had this.
It was called 'SLI with driver profiles'.

And seriously, did you just ignore the bit where I pointed out that games (and indeed other high performance apps) don't treat the CPU as some kind of black box and have some awareness of what they are running on?

What you are suggesting is the GPU version of telling every app they are running on a single core, single thread machine and hope some magic lets them scale up via the OS or hardware just 'rescheduling work' behind the scenes...



And we seem to have multi GPU support in the latest version of direct X12, so that driver profiles for either SLI or Crossfire are no longer needed, so it seems to suggest that hardware resource mangement is handled directly by the API with no driver help required any longer.

Awwww, bless, you think there is such as thing as 'enough bandwidth', how adorable :heart:

You put a 1TB/s bus on a single GPU and someone will find a way to use it to increase the quality of what is being rendered - that's basically the history of graphics development right there.

It's also not just about the total bandwidth, but about usage - you don't want one GPU to be doing a heavy bandwidth process at the same time as the other GPU, so you start going down the route of cross GPU sink/stalls and communications which just had a whole host of New **** That Can Go Wrong.

And this is without taking in to account the overhead for all the new signaling and various other factors which means as soon as you start hammering the bandwidth from two locations, in a non-coherent fashion your overall bandwidth availability drops faster than you expect. (Early PS4 docs, for example, had some bandwith numbers when the GPU and CPU where both touching the bus at the same time... the fall off was non-linear. Two GPUs only make the problem worse.)

Frankly trying to say that 'Oh, that can do it on a CPU, so a GPU is easy...' is naïve at best, utterly foolish at worse...


I never said that, I said that whatever bandwidth is required is easily achieved thru HBM simply by making the bus as wide as it needs to be, and me stating the 1TB figure is simply to keep it simple by just making the bus twice as wide, and all the other variables stay the same....There's no major re engineering involved.


I'd have a hard time believing that if we take the GDDR6 route, and instead of the usual 384 bit memory bus a GTX1080TI uses, or other previous high end Nvidia cards have in the past, that they could go to a 768 bit bus to double bandwidth overall and somehow find room for 24 GDDR6 memory modules in the process on a card that has to be the same physical size to be PCI-e certified.


See just like GDDR5, GDDR6 uses a 32 bit memory interface for each individual memory module, so 24 of them would be needed to be fitted to the PCB, powered up and the traces on all of them being the exact same length to the GPU for timing issues being identical, hence HBM not only making that connection within the GPU package, which also gets the modules as close as they can physically be to the GPU and as wide as needed, but also stacking the memory modules vertically like a high rise building.....Kills 3 birds with one stone.


As for the last part, given that CPU's run hundreds of different programs that behave differently in every case and they went MCM with it, going the same route with a GPU where the primary goal is running graphics and supporting the graphics features of DX12 and Vulkan, doesn't sound like a big deal when you look at the big picture.
 
And we seem to have multi GPU support in the latest version of direct X12, so that driver profiles for either SLI or Crossfire are no longer needed, so it seems to suggest that hardware resource mangement is handled directly by the API with no driver help required any longer.

Yes, I know, and it does this by not hiding anything behind any kind of abstraction and letting you query the system for all the hardware available meaning that isn't "completely transparent to the O/S and developers".

Which is really the two ends;
- multiple bits of hardware hidden = 'profiles'
- everything can be queried = requires dev support

There is no magic here.
 
As for the last part, given that CPU's run hundreds of different programs that behave differently in every case and they went MCM with it, going the same route with a GPU where the primary goal is running graphics and supporting the graphics features of DX12 and Vulkan, doesn't sound like a big deal when you look at the big picture.

You didn't repeat the mantra did you.

Getting MCM to work and getting MCM to work well are two totally different problems but as you seem to have such a hard-on for the CPU comparison then let me point out that when CPUs started to either come in pairs (ah, the dual CPU Celeron boards) or picked up extra cores it took time for games to make use of it and it took even longer to make anything approaching optimal use of it as core counts climbed.

However, I guess I shouldn't be too hard on you.. the fact you seem to think you can just slap these things on an interconnect and make the bus a bit wider and everything will be great is to be expected... I mean, if you had any real knowledge or training in this area you wouldn't be making these arguments.

Edit:
And this is my last post on the subject... I really REALLY cba going around in circles on this subject any more, I'm bored of it already frankly...
 
You didn't repeat the mantra did you.

Getting MCM to work and getting MCM to work well are two totally different problems but as you seem to have such a hard-on for the CPU comparison then let me point out that when CPUs started to either come in pairs (ah, the dual CPU Celeron boards) or picked up extra cores it took time for games to make use of it and it took even longer to make anything approaching optimal use of it as core counts climbed.

However, I guess I shouldn't be too hard on you.. the fact you seem to think you can just slap these things on an interconnect and make the bus a bit wider and everything will be great is to be expected... I mean, if you had any real knowledge or training in this area you wouldn't be making these arguments.

Edit:
And this is my last post on the subject... I really REALLY cba going around in circles on this subject any more, I'm bored of it already frankly...



My arguments are based on technical feasibility, design development cost and mass volume production cost, so you have to do better than it being more trouble to program for on your end of things as the only major drawback here.


I'm done too.....:rolleyes:
 
My arguments are based on technical feasibility, design development cost and mass volume production cost, so you have to do better than it being more trouble to program for on your end of things as the only major drawback here.


I'm done too.....:rolleyes:

well if both nv and AMD do get it working and both go that way and they are both talking about it

game programmers will just have to live with the extra work if need be

or see loss of sales as both 4k and 8k gaming are going to need more than 7nm+ can give I think
 
well if both nv and AMD do get it working and both go that way and they are both talking about it

game programmers will just have to live with the extra work if need be

or see loss of sales as both 4k and 8k gaming are going to need more than 7nm+ can give I think


It's a sure thing it won't be enough, especially at 8K where to render each frame and assuming the scene in question is completely GPU limited and CPU or I/O has nothing to do with it, then it's 4x more demanding than rendering at 4K and there's plenty of titles right now that with a single GPU just barely hit 60 Fps even on a high end GPU, so running the exact same game in that completely GPU limited scenario at 8K resolution means doing no better than 15 Fps....It's a slide show, and not even putting 2 GPU's together and overclocking them to hell and back, might get close to 30 Fps....Still far from smooth.
 
Ok... I've been weighing this thought for a while...

What if MCM is an idea that instead of making multiple whole GPUs... Why not just split the compute clusters?

Say 2 compute clusters connected to 1 "interface die" with L3 cache maybe and the core's memory controller. So the OS only see's one "GPU" since its only interfacing with 1 chip?
 
Ok... I've been weighing this thought for a while...

What if MCM is an idea that instead of making multiple whole GPUs... Why not just split the compute clusters?

Say 2 compute clusters connected to 1 "interface die" with L3 cache maybe and the core's memory controller. So the OS only see's one "GPU" since its only interfacing with 1 chip?


Not totally certain, but i think it comes down to achievable clock speeds if they did it that way, since the path between individual units with everything arranged as a MCM, is quite a bit longer than having it all built into the same die, relatively speaking so hitting Ghz level clock speeds between them would be hard.
 
Say 2 compute clusters connected to 1 "interface die" with L3 cache maybe and the core's memory controller. So the OS only see's one "GPU" since its only interfacing with 1 chip?

I think that is the goal of MCM. But for gaming I think it has latency and other issues. Though AMD has stated its not going to be an issue for pro compute workloads.... so I would say its no ready for gaming *yet*

Though I think the split is not for memory controllers as gpu's connect memory directly to compute units, my guess it would be processing units tied to hbm stack and then a group of these controlled by one or more other chips which do the higher level stuff and farm it out to the processing units. And if the could use a separate chunk of HBM and a controller chip for texture memory you would not have to duplicate so much information as in crossfire.
This would allow smaller easily fabricated chips that scale.


The thing is the super wide memory interface of HBM allows huge chip to chip bandwidth so what was impossible before is do-able now they just have to work out how.
 
According to lots of rumours Navi will be a Polaris refresh which makes sense as the Polaris cards have been out at least two years. It looks like "next gen" is going to be AMD's next enthusiast card which is probably Q4 2019 at earliest.

TSMC are saying 7nm is looking like early 2019 at best so it fits in with that timeline. The Polaris refresh might be as fast as a Vega 64 but at a lower price point so will sell like hot cakes IMHO. TBH I'm in no rush so that timescale suits me fine.
 
If it’s as fast as a Vega64 but a little cheaper that will not sell like hot cakes.
 
You probably won't see a card faster then the Vega64 till 2020. If they do release something it will probably a less power hungry version of the Vega64 which from the rumors that its a "Polaris" refresh would then sound correct.
 
Back
Top