• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

HD 5870 Discussion thread.

Status
Not open for further replies.

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,753 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
I don't think they are bandwidth limited to any real degree, I think we've played with the bandwidth enough, and lost so little performance, that must be apparent.

I also think doubling the bandwidth for a (maybe) ~20% gain is in no way worth ATi's time making that happen.
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Ok, here it is the graph that shows what I think it's the reality of HD5870 in relation to its memoery bandwidth:




I think it's self-explanatory, but here are some hints:

- In red I'm showing a representation of wolf's results, dot lines represent the low and high of those results (1000-1300 Mhz). The relative performance is based on this one, with performance at 950 core/1000 Mhz memory being the baseline (100%).
- In green, it's the card at stock, with dot lines representing the card's stock bandwidth (1200Mhz) and 2x bandwidth with 512 bit. As you can see this line is always lower than the red line, but on lower bandwidth both show pretty similar performance, it's there where bottleneck happens.
- Red and green lines should be closer to each other, but I've done that way so that the lines don't mix up, I was specially afraid of horizontal dot lines "blending" together (I thought resolution would be much worse).
- In order to create the red/green lines I first drew the axis, then I drew the two red dot lines where they should be and then I drew a logarithmic progression that would be intercepted at those two points. IMO it must be pretty accurate.

That's been the overall picture of the situation in my head. Different reviews and tests back up that graph IMO. As you can see the 1000 Mhz to 1300 Mhz gap is already above the point where performance scales linearly. I've done it that way, because that's the only explanation I can find for the card not being slowed down linearly when memory was downclocked. Now that I hink about it, that's a point I might have failed to clarify until now.


and even Benetanegia (a green man) is willing to concede

:laugh: I find that funny. Then you (and many others with same reaction towards me) complain when I call you fanboys. Only a red man can interpret my presence and contribution in this thread like a green man conceding. As if for once my mind had been touched by the Holly Light of God AMD, Master of the smarts and the honests!

Sorry man, but this is just a man with no brand loyalties defending his view based on proofs and actual data, just as he always does in every topic. Just because you (and others) are more willing to accept some BS as proof in other topics and thus I have to defend the side with real proofs (i.e the thread about Lucid Hydra), that doesn't make me a green man with a momentary revelation...

I guess that's my fate on these forums...
 
Joined
May 4, 2009
Messages
1,970 (0.36/day)
Location
Bulgaria
System Name penguin
Processor R7 5700G
Motherboard Asrock B450M Pro4
Cooling Some CM tower cooler that will fit my case
Memory 4 x 8GB Kingston HyperX Fury 2666MHz
Video Card(s) IGP
Storage ADATA SU800 512GB
Display(s) 27' LG
Case Zalman
Audio Device(s) stock
Power Supply Seasonic SS-620GM
Software win10
Greaat work Benetanegia! What software did you use for the graph?
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Greaat work Benetanegia! What software did you use for the graph?

Thanks mate. I used Freehand, it's actually easier for me to do it there, since I work with that program and PS a lot. :)
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Ok then, do a graph that shows how half of 5870's core (in this case, a 4890) still benefits from having 5870's full bandwidth all to its own.

I already provided sources with several benchmark tests (dozens, in fact). Those numbers are not from my own ass, so you'd be able to touch them with a 10-foot pole! Hehe...

I mean, you are basing your convictions on just one test. That other theoretical fillrate test with the memory downclocked is such an incomplete test that if you tried downclocking the memory much further, like 50% or more, you'd still be getting the same results, therefore it would have wreaked havoc with the graph that you just created. It'd make your graph look like fireworks that have gone haywire!

The red line in the graph, once it reaches 1300MHz, might already be in the new error-correcting slow-down territory. It could also be plagued with more than one step of latency increase when overclocking the memory. That's why it's better to not be too absolute in little-developed convictions.

Steevo, if there is a white paper saying anything about how a 5870 is not bandwidth/latency-bottlenecked, I'd appreciate it. Also, I'd appreciate it if any of you guys with a 5870 could do some more tests on this issue. However, in order for those tests to have any validity, it must be shown that there is not any kind of error-correcting work going on with overclocked memory.. not a tiny little bit. How do we do this? Perhaps we could ask ATI for a special program that the engineers use (like the one that Nvidia released to the public that shows the usage percentage of the ROP's).

I'm basing mine on the trend of a 4890 and a 5770 and a 5870, and also 4890's/5770's in Crossfire. It's just that two halves of a 5870 does not scale as badly as you would think with diminishing returns that are far worse than Crossfire scaling with two 5770's. A 5870 core is still a massively multi-core processor that needs to be fed gobs and gobs of bandwidth in order to function.

I know that the analogy of 512-bit to 1GB is not exactly the same thing, but in some scenarios, there is a HUGE benefit from having twice the buffer. The same goes for bandwidth.

If you truly think it should be such a linear world, please try to think about the effects of reduced bandwidth. It does not reduce performance linearly, even if you cut the bandwidth in half. Most games do not give linear predictions. One game would need to be fed 200Gb/s of bandwidth fast enough if running at 1920x1200 with 4xAA to show a large boost in performance compared to 190Gb/s. Enabling Adaptive AA can take up so much more bandwidth especially if there are lots of alpha textures. Another game would do just fine with simple polygonal fillrate and simple AA that are handled by the shaders, and only experience linearity in relation to the speed of the TMU's/shaders without needing gobs of bandwidth.

Sometimes, overclocking the memory by 5% would give a 10-15% boost in a certain scenario. I'll have to find a few of them, but after looking at millions of benchmarks for more than a decade, yes, I'll have to backtrack through a lot, heh!


It's like having a construction worker who just turned into an 8-foot giant, but now he has only one arm to do all of the work. Do you ignore that missing arm and say that it's all because of the drivers program in his brain that is not quite-so-efficient? Me, I'd just point the obvious and say that he'd do whole of a heck lots better with two arms, dammit! :D
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Ok then, do a graph that shows how half of 5870's core (in this case, a 4890) still benefits from having 5870's full bandwidth all to its own.

I already provided sources with several benchmark tests (dozens, in fact). Those numbers are not from my own ass, so you'd be able to touch them with a 10-foot pole! Hehe...

I mean, you are basing your convictions on just one test. That other theoretical fillrate test with the memory downclocked is such an incomplete test that if you tried downclocking the memory much further, like 50% or more, you'd still be getting the same results, therefore it would have wreaked havoc with the graph that you just created. It'd make your graph look like fireworks that have gone haywire!

The red line in the graph, once it reaches 1300MHz, might already be in the error-correcting slow-down territory. It could also be plagued with more than one step of latency increase when overclocking the memory. That's why it's better to not be too absolute in little-developed convictions.

I'm basing mine on the trend of a 4890 and a 5770 and a 5870, and also 4890's/5770's in Crossfire. It's just that two halves of a 5870 does not scale as badly as you would think with diminishing returns that are far worse than Crossfire scaling with two 5770's. A 5870 core is still a massively multi-core processor that needs to be fed gobs and gobs of bandwidth in order to function.

I know that the analogy of 512-bit to 1GB is not exactly the same thing, but in some scenarios, there is a HUGE benefit from having twice the buffer. The same goes for bandwidth.

If you truly think it should be such a linear world, please try to think about the effects of reduced bandwidth. It does not reduce performance linearly, even if you cut the bandwidth in half. Most games do not give linear predictions. One game would need to be fed 200Gb/s of bandwidth fast enough if running at 1920x1200 with 4xAA to show a large boost in performance compared to 190Gb/s. Enabling Adaptive AA can take up so much more bandwidth especially if there are lots of alpha textures. Another game would do just fine with simple polygonal fillrate and simple AA that are handled by the shaders, and only experience linearity in relation to the speed of the TMU's/shaders without needing gobs of bandwidth.

Sometimes, overclocking the memory by 5% would give a 10-15% boost in a certain scenario. I'll have to find a few of them, but after looking at millions of benchmarks for more than a decade, yes, I'll have to backtrack through a lot, heh!


It's like having a construction worker who just turned into an 8-foot giant, but now he has only one arm to do all of the work. Do you ignore that missing arm and say that it's all because of the drivers program in his brain that is not quite-so-efficient? Me, I'd just point the obvious and say that he'd do whole of a heck lots better with two arms, dammit! :D

I'm not basing that graph or my opinion in wolf's test alone I'm basing it in many tests and reviews* (12+ games if you take all reviews in consideration), some of them I have linked them before in one post. I told you to read them and you said you did, but truth is you didn't, it's clear right now considering your response. First, read them, then come back with your close mind if you wish.

There's enough proof already that the HD5870 is not bottlenecked. No one gives crap about what happens with the HD4890 or the HD5770, those are not the HD5870 and what may be true for the formers, is definately not true for the latter. I have yet to see A SINGLE proof of that magical improvement that happens in the HD4890 BTW, because I've seen none yet and I read every review linked in TPU's Today's Reviews section (which ammount to 20++ for each card) as well as reading any thread about GPU performance I see on my everyday internet. Not even the HD5770 shows the kind of improvement you are trying to say it happens, all the reviews I've seen defy that improvement you talk about.

So before attempting to reply again with vague estimations that do seem to come from your ass, why don't you provide some proofs?

And yes, please show me the proof of a 5% increase in memory opening up a 15% performance increase too, because in 12 years of reading tech, 3 years studying computer tech at the uni and 5 years being a die-hard overclocker (now retired), I have never seen such a thing. As it stands now, that's a blatant lie for me. As Mussels said
if your design has a bottleneck, you get linear progression as you up computational power/bandwidth.

Once you get past that bottleneck, you get diminishing returns

And that's all that I have seen in 12 years. Show me proofs and I'll consider them.

* In fact, after reading them, especially the last one, pay attention to my graph and you will see how my grph shows the same 2-3% on performance increase in the same 1200 to 1318 Mhz jump.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Sorry, but no, that's not math at all. That's just some random numbers and correlations that came out right from your ass, as I tried to say with some humor in my first reply to that post.

I'm not basing that graph or my opinion in wolf's test alone I'm basing it in many tests and reviews* (12+ games if you take all reviews in consideration), some of them I have linked them before in one post. I told you to read them and you said you did, but truth is you didn't, it's clear right now considering your response. First, read them, then come back with your close mind if you wish.

There's enough proof already that the HD5870 is not bottlenecked. No one gives crap about what happens with the HD4890 or the HD5770, those are not the HD5870 and what may be true for the formers, is definately not true for the latter. I have yet to see A SINGLE proof of that magical improvement that happens in the HD4890 BTW, because I've seen none yet and I read every review linked in TPU's Today's Reviews section (which ammount to 20++ for each card) as well as reading any thread about GPU performance I see on my everyday internet. Not even the HD5770 shows the kind of improvement you are trying to say it happens, all the reviews I've seen defy that improvement you talk about.

So before attempting to reply again with vague estimations that do seem to come from your ass, why don't you provide some proofs?

And yes, please show me the proof of a 5% increase in memory opening up a 15% performance increase too, because in 12 years of reading tech, 3 years studying computer tech at the uni and 5 years being a die-hard overclocker (now retired), I have never seen such a thing. As it stands now, that's a blatant lie for me. As Mussels said

And that's all that I have seen in 12 years. Show me proofs and I'll consider them.

* In fact, after reading them, especially the last one, pay attention to my graph and you will see how my grph shows the same 2-3% on performance increase in the same 1200 to 1318 Mhz jump.



12+ games?? I do not recall you linking to 12 benchies. Hmm? I disregarded the Firingsquad review because FS utterly failed to ever mention the new error-correcting algorithm that could kick in. Batman Asylum was still more bandwidth-limited than core-limited anyways. We'd be seeing more of those games, but the true increase in bandwidth is potentially giving "diminishing returns" due to hidden latency increases and also due to a few errors being produced and then corrected. Firingsquad actually said that they just upped the memory clock "once" so that it matches the percentage gain of the core overclock % (9% clock increase). It could be like overclocking your 800MHz CAS5 memory all the way to 1200MHz CAS6, but a few errors are being produced so that it only gives a 1% boost in performance. Anyways I think I already mentioned that about Firingsquad in this thread long before you linked to it.

Doubling the bus bandwidth does not automatically increase latency like with the internal driver program that overclocks the memory at your command without letting you know what the latency changes are. That is another "plus" about simply doubling the bus width. You are right about the bandwidth not having doubled quite as often as the fillrate performance of those GPU's over the past several years (since the 9800 Pro). After a 4890 showed consistent gains with 5870's bandwidth, a similar chip with 2x the shaders and TMU's would be even farther behind in needing greater bandwidth (50% or more would have been nice, at least). Latency also matters, giving undisclosed speed benefits (which has also improved over the years), and doubling the bus width should pretty much preserve the latency advantage.

Sometimes, you do not know if you have already gotten past a bottleneck. In fact, it could be so far ahead, in spite of your assumption that you already got past it. When you get past that critical point, there is sometimes a large, sudden jump in performance and then now you get diminishing returns. (Not to mention the new error-correcting algorithm that kicks in at some point...) In terms of linearity, the core (TMU's, shaders) are the most linear, the memory bandwidth performance is less linear, and the performance due to memory buffer size is even less linear.

It's like being in a mach 1+ plane, reaching nearly 700+ mph. As it approaches the speed of sound, there is more and more turbulence. Once it gets past the speed of sound, such turbulence goes away, and the plane is now cruising along smoothly, creating sonic boom.

I've already shown you 20+ benchmarks with 20+ different games that benefit from 800 shaders and 40 TMU's in combination with 3.9Gbps bandwidth (4890) and 4.8Gbps bandwidth (4890 with oc'ed memory only), and 2.4Gbps bandwidth (5770 performing ~20% worse than a stock 4890).

If a 5770 would benefit so much, I would go further to say that each unit in a 5870 (yes, a 5870) would each benefit as a whole.

Otherwise, nobody would be buying a 5870 and everybody would be buying 2x 5770's to avoid those prematurely diminishing returns that you claim.

Hey, quit talking about my ass! :D Let's keep it civil, ok?

EDIT: Same here, I've majored in computer-engineering, computer science, computer engineering tech, and information tech, and had more than a decade of closely following GPU architecture!
 
Last edited:

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,753 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
Well guys this is really inspiring, and thanks heaps for that graph Basque (that was you nickname right?)

anyways I'm going to provide you with a whoooooooooooole lot more data if your prepared to compile it, and at the very least we will have lots of data to discuss and mull over.

tests will be, memory speed varying from 900mhz to 1300mhz in 50 mhz incrememnts, so 900/950/1000/1050/1100/1150/1200/1250/1300, these speeds will ALL be tested at every core speed.

core speeds will move from 650mhz to 950mhz in 75mhz increments, so 650/725/800/875/950.

I do apologize, I will be missing stock core speed, but this will give us 45 points of data to use.

thoughts? (testing right now)
 
Joined
Nov 21, 2007
Messages
3,688 (0.61/day)
Location
Ohio
System Name Felix777
Processor Core i5-3570k@stock
Motherboard Biostar H61
Memory 8gb
Video Card(s) XFX RX 470
Storage WD 500GB BLK
Display(s) Acer p236h bd
Case Haf 912
Audio Device(s) onboard
Power Supply Rosewill CAPSTONE 450watt
Software Win 10 x64
hells yea wolf, this should be interesting to view and if benetangia will compile the data into a badass graph like he posted above..well that'd just be sexy. :toast:
 

wolf

Performance Enthusiast
Joined
May 7, 2007
Messages
7,753 (1.25/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
well, 4 hours later here are my results, with the original test I showed a few pages back also re-ran for consistency, however the results are identical, hence only 1 run every time, no repeats.

Unigine Heaven Benchmark - 1680x1050 - 4xAA, 16xAF - max w/tesselation - Cat 9.10 official 5800 support.

core vs mem

this shows the relationship between keeping the core speed constant, and increasing the memory speed/bandwidth in 50mhz incrememnts.

first number shown is core speed in mhz.

second number shown is memory speed in mhz before its 4x multiplication for GDDR5.

650/900 = 23.9 fps
650/950 = 24.4 fps
650/1000 = 24.7 fps
650/1050 = 24.9 fps
650/1100 = 25.1 fps
650/1150 = 25.2 fps
650/1200 = 25.5 fps
650/1250 = 25.7 fps
650/1300 = 25.8 fps

725/900 = 26.0 fps
725/950 = 26.2 fps
725/1000 = 26.7 fps
725/1050 = 27.0 fps
725/1100 = 27.3 fps
725/1150 = 27.4 fps
725/1200 = 27.8 fps
725/1250 = 28.0 fps
725/1300 = 28.2 fps

800/900 = 27.7 fps
800/950 = 28.2 fps
800/1000 = 28.6 fps
800/1050 = 28.9 fps
800/1100 = 29.3 fps
800/1150 = 29.6 fps
800/1200 = 29.8 fps
800/1250 = 30.1 fps
800/1300 = 30.4 fps

875/900 = 29.4 fps
875/950 = 29.9 fps
875/1000 = 30.2 fps
875/1050 = 30.7 fps
875/1100 = 31.1 fps
875/1150 = 31.4 fps
875/1200 = 31.8 fps
875/1250 = 32.1 fps
875/1300 = 32.4 fps

950/900 = 30.7 fps
950/950 = 31.1 fps
950/1000 = 31.7 fps
950/1050 = 32.1 fps
950/1100 = 32.6 fps
950/1150 = 33.3 fps
950/1200 = 33.6 fps
950/1250 = 33.8 fps
950/1300 = 34.4 fps

mem vs core

this shows the relationship between keeping memory speed/bandwidth constant, and increasing the core speed in 75mhz incrememnts.

first number shown is memory speed in mhz before its 4x multiplication for GDDR5.

second number shown is core speed in mhz.

900/650 = 23.9 fps
900/725 = 26.0 fps
900/800 = 27.7 fps
900/875 = 29.4 fps
900/950 = 30.7 fps

950/650 = 24.4 fps
950/725 = 26.2 fps
950/800 = 28.2 fps
950/875 = 29.9 fps
950/950 = 31.1 fps

1000/650 = 24.7 fps
1000/725 = 26.7 fps
1000/800 = 28.6 fps
1000/875 = 30.2 fps
1000/950 = 31.7 fps

1050/650 = 24.9 fps
1050/725 = 27.0 fps
1050/800 = 28.9 fps
1050/875 = 30.7 fps
1050/950 = 32.1 fps

1100/650 = 25.1 fps
1100/725 = 27.3 fps
1100/800 = 29.3 fps
1100/875 = 31.1 fps
1100/950 = 32.6 fps

1150/650 = 25.2 fps
1150/725 = 27.4 fps
1150/800 = 29.6 fps
1150/875 = 31.4 fps
1150/950 = 33.3 fps

1200/650 = 25.5 fps
1200/725 = 27.8 fps
1200/800 = 29.8 fps
1200/875 = 31.8 fps
1200/950 = 33.6 fps

1250/650 = 25.7 fps
1250/725 = 28.0 fps
1250/800 = 30.1 fps
1250/875 = 32.1 fps
1250/950 = 33.8 fps

1300/650 = 25.8 fps
1300/725 = 28.2 fps
1300/800 = 30.4 fps
1300/875 = 32.4 fps
1300/950 = 34.4 fps
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Great work wolf!! That must have taken you a lot of time!

OK, I'll steal some of your merit and effort by doing some eaaasy graphs out from your data. :p

Too many numbers this time, and no need to paint my idea, so I used Excel this time. Here we go.

- First the chart and graph of memory scaling:



- GPU core scaling:



- Same as above, but with results normalized to show percentual increment. Baseline is 650/900 Mhz:




- Percentual increase normalized for every GPU clock (baseline 900Mhz):



- Percentage of change from one memory clock to the next, for every GPU clock. i.e first line shows the results for 650 Mhz GPU clock:
0 = no change, baseline
2.1 = change from 900 Mhz memory to 950.
1.2 = cnahge from 950 mhz to 1000.
0.8 = change from 1000 mhz to 1050
and so on.



In this last graph results are quite caotic in the first glimpse TBH. I included an average of the changes, in order to see if there was some logarithmic progression going on, and it seems that on average it does: i.e. numers in the left are higher than on the right of the chart.

All in all, these results do fall within my first graph quite accurately (surprisingly), but sadly they do not confirm them at all. They do match almost to perfection my curve in the 900 to 1300 Mhz range (I'm really :eek: about that) , but basically we don't know what happens below and above our data range, and although there is a slight logarithmic factor into the results, they are not conclusive. They do not confirm nor deny Bo Fox's idea either, since we don't know how linearly/logarithmically that would continue as we went up.

@ Bo Fox, so you disregard the most important of the ""official" tests that have been done regarding HD5870 memory overclocking? weeeeeeeell...
Regarding the sudden jump, that never happens, not consistently at least. In one game/app at one settings with one hardware setup, probably you can find an example, but it's probably because of the software and not the hardware. Just like in Enemy Territory:QW with the GTX295 using first drivers, the performance gap between that card and te rest was bigger and biffer as resolution and AA settings went up and suddenly at 2560x1600 4xAA it produced 10 fps, just to jump to 30-40 when 2560x1600 8xAA was used. If you can prove that it happens then show me the proof. Show me the moneeeeey!!!!

Also you keep extrapolating results of cards with half the power to the HD5870 and that will never happen. You don't need a saloon bar twice as big to serve twice the people, a bar 50% bigger will serve them well, because of the spatial organization. With memory is similar, but with spatial and time organization.
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Nicely done!!! Double thanks to you, Wolf (actually, rabid Koala)! As we can see, the results are hardly diminishing--one would be hard pressed to see that it's logarithmically diminishing rather than linear.

Ok, I did not play all of my cards at once with the links (sources) that Benetanegia provided.

Firingsquad was disregarded because their overclocking article never even mentioned the new error-correcting algorithm, even if it showed that Batman was more bandwidth-limited than core-limited. In their original review article of a 5870, Firingsquad did not mention the algorithm either. The way that FS stated that the memory was overclocked once just to match the core o/c percentage said a lot (in that they did not overclock with care).

You (Benetanegia) also linked to the new 5850 article over at Xbitlabs. Now, I'll show how your claims of diminishing returns will actually be used against your side of argument. Apparently, you did not read the comments on that article over at Xbit (at the end of the article, you can see my comment that was there before you linked to it in this thread.) Whoohoo, I called dibs on that article first!!! Hehe.. Here it is:



source: http://www.xbitlabs.com/articles/video/display/radeon-hd5850_10.html#sect0

Let's look at the 1920x1200 one on the right side. The green color represents a 5850 with its memory overclocked to 5870's speed. 5850's core is also overclocked to 850MHz, the same clock as a 5870.

The only difference between this green stuff and a 5870 is that a 5870 has 160 more shader processors and 8 more TMU's.

That is a 11% difference over this green 5850.

In the graph, we are seeing an average of less than 3% performance difference, not 11%. If there were enough memory bandwidth, we would certainly be seeing a 11% difference in performance as the 5870 stays 11% better, but no, a 5870 is less than 3% better. Diminishing or not, the core is supposed to be quite linear in telling you so.

In three games, there are 0.0% difference in performance. One of them is Batman, which is consistent with the Firingsquad results; however, in this case, PhysX was being used this time around for Batman, so it was CPU-limited by only 1 thread on the CPU when tested by Xbit. Although I disregarded Firingsquad's article, I'll go ahead and consider Batman one of the games that is more bandwidth-limited than core-limited, according to Firingsquad's exact words.

The biggest difference was in Left 4 Dead, a game that uses the oldest architecture of all (simplest stuff), yet it still only shows half of the 11% difference that we should be seeing. It is closer to a theoretical fill-rate test, but it is still quite bandwidth-limited.

Cheers, guys. :toast:
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Nicely done!!! Double thanks to you, Wolf (actually, rabid Koala)! As we can see, the results are hardly diminishing--one would be hard pressed to see that it's logarithmically diminishing rather than linear.

Ok, I did not play all of my cards at once with the links (sources) that Benetanegia provided.

Firingsquad was disregarded because their overclocking article never even mentioned the new error-correcting algorithm, even if it showed that Batman was more bandwidth-limited than core-limited. In their original review article of a 5870, Firingsquad did not mention the algorithm either.

You (Benetanegia) also linked to the new 5850 article over at Xbitlabs. Now, I'll show how your claims of diminishing returns will actually be used against your side of argument. Apparently, you did not read the comments on that article over at Xbit (at the end of the article, you can see my comment that was there before you linked to it in this thread.) Whoohoo, I called dibs on that article first!!! Hehe.. Here it is:

http://www.xbitlabs.com/images/video/radeon-hd5850/diagr/22_585vs587aa_big.png

source: http://www.xbitlabs.com/articles/video/display/radeon-hd5850_10.html#sect0

Let's look at the 1920x1200 one on the right side. The green color represents a 5850 with its memory overclocked to 5870's speed. 5850's core is also overclocked to 850MHz, the same clock as a 5870.

The only difference between this green stuff and a 5870 is that a 5870 has 160 more shader processors and 8 more TMU's.

That is a 11% difference over this green 5850.

In the graph, we are seeing an average of less than 3% performance difference, not 11%. If there were enough memory bandwidth, we would certainly be seeing a 11% difference in performance as the 5870 stays 11% better, but no, a 5870 is less than 3% better. Diminishing or not, the core is supposed to be quite linear in telling you so.

In three games, there are 0.0% difference in performance. One of them is Batman, which is consistent with the Firingsquad results; however, in this case, PhysX was being used this time around for Batman, so it was CPU-limited by only 1 thread on the CPU when tested by Xbit.

The biggest difference was in Left 4 Dead, a game that uses the oldest architecture of all (simplest stuff), yet it still only shows half of the 11% difference that we should be seeing. It is closer to a theoretical fill-rate test, but it is still quite bandwidth-limited.

Cheers, guys. :toast:

OR the core might be limited on its ability to feed all those shaders with enough threads, which is what some of us are saying (problem on setup engine/thread dispatcher). In fact the HD5850 proofs exactly that rather than your point, which doesn't proof at all, because overclocking the core further with a very limted memory overclock (4960 mhz, 3%) also yields massive performance increase across the board*. So the bottleneck is undoubtely somewhere inside the core. Memory is NOT the limiting factor, because there's as much change going from 725/4000 to 850/4800 (18%/20% OC) as there is going from 850/4800 to 1010/4960 (18%/3% OC). Memory is completely out of the question, something else must be the reason when 11% less shaders offer the same performance when clocked at the same mhz as the HD5870. i.e. the thread dispatcher.

EDIT: To be more clear, overclocking the core to 850 is making the HD5850 as fast as the HD5870, not overclocking the memory.

*The smallest improvement happens in Warhammer, which goes from -5.4% to 0.8%, that is a 6.2% increase, much more than that 3% OC on the memory. The rest are usually well over 10%.

EDIT:

BTW - http://forums.techpowerup.com/showpost.php?p=1634710&postcount=321

5- In the Xbitlabs HD5850 review in which you posted, we can see huge performance increase across the board when moving from 850/4800 to 1010/4960 which supposes an overclock of 19% on core and only 3% on memory: http://www.xbitlabs.com/misc/picture/?src=/images/video/radeon-hd5850/diagr/21_585vs587_big.png&1=1 - compare green and red, if it was anywhere close to being memory bottlenecked we wouldn't see an increase greater than 3% as we are seeing there across the board.
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Wrong, the average gain with 1010MHz over 850Mhz (19% clock difference or 160MHz) is much less than the gain due to 850MHz over 725MHz (17% clock difference or 125MHz).

I do not feel like calculating the exact average, but it's obvious by looking at the graph.
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Wrong, the average gain with 1010MHz over 850Mhz (19% clock difference) is much less than the gain due to 850MHz over 725MHz (17% clock difference).

I do not feel like calculating the exact average, but it's obvious by looking at the graph.

Want to bet?

Have you extracted the green result to the blue result AND added the green to the red? Because that's what you have to do in order to calculate the diff...

For example, World in conflict
Blue -13.7% // Green -3.9 // Red 7.8%
Blue-to-Green 9.8%
Green-to-Red 11.7%

Crysis

Blue -17.2% // Green -3.4 // Red 6.9%
Blue-to-Green 13.8%
Green-to-Red 10.3%

Left 4 Dead

Blue -16.5% // Green -6.1% // Red 8.7%
Blue-to-Green 10.5%
Green-to-Red 14.8%

I bet that an average would give pretty much the same in both cases. Anyway the green-to-red results are waaaaay higher than 3%, so no memory bottleneck at all.
 
Joined
Sep 1, 2009
Messages
1,183 (0.22/day)
Location
CO
System Name 4k
Processor AMD 5800x3D
Motherboard MSI MAG b550m Mortar Wifi
Cooling Corsair H100i
Memory 4x8Gb Crucial Ballistix 3600 CL16 bl8g36c16u4b.m8fe1
Video Card(s) Nvidia Reference 3080Ti
Storage ADATA XPG SX8200 Pro 1TB
Display(s) LG 48" C1
Case CORSAIR Carbide AIR 240 Micro-ATX
Audio Device(s) Asus Xonar STX
Power Supply EVGA SuperNOVA 650W
Software Microsoft Windows10 Pro x64
So whats holding the 5870 back Drivers? Game Coding? or something physical on the core?
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
So whats holding the 5870 back Drivers? Game Coding? or something physical on the core?

Aaahh my friend, that's the 1 Billion Dollar question we've been debating over 15 pages... :D

IMO something in the core: setup engine or thread dispatcher or much less likely (IMO) the intercommunication/caches/registers/etc.
 
Joined
Oct 1, 2006
Messages
4,884 (0.76/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
So whats holding the 5870 back Drivers? Game Coding? or something physical on the core?
That is what the two gentlemen are trying to figure out up there. ;)
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Wrong, the average gain with 1010MHz over 850Mhz (19% clock difference) is much less than the gain due to 850MHz over 725MHz (17% clock difference).

I do not feel like calculating the exact average, but it's obvious by looking at the graph.

Want to bet?

I like to back up my claims (and had some spare 5 mins to waste) so:



11.5% versus 9.6% not a whole lot different.

How much money did I win? :D
 
Joined
May 4, 2009
Messages
1,970 (0.36/day)
Location
Bulgaria
System Name penguin
Processor R7 5700G
Motherboard Asrock B450M Pro4
Cooling Some CM tower cooler that will fit my case
Memory 4 x 8GB Kingston HyperX Fury 2666MHz
Video Card(s) IGP
Storage ADATA SU800 512GB
Display(s) 27' LG
Case Zalman
Audio Device(s) stock
Power Supply Seasonic SS-620GM
Software win10
Bo_Fox you're speculating again that the games utilise 100% of the shaders...
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Bo_Fox you're speculating again that the games utilise 100% of the shaders...

0.5Hz: Well, when some of the relatively new games do utilize all of the shaders, especially with 4x/8x AAA at 1920x1200 or above....

But you are right that some games cannot use 100% of the shaders if the core is not being fed enough bandwidth. :p
 
Last edited:

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
I like to back up my claims (and had some spare 5 mins to waste) so:

http://img.techpowerup.org/091116/diff.jpg

11.5% versus 9.6% not a whole lot different.

How much money did I win? :D

More so for only 125MHz clock (11.5% performance difference)

and for 160MHz clock (9.6% performance difference).

If the memory gain was 0% instead of a further 3.3% (160MHz memory clock increase with perhaps a tiny bump in latency) along with that 160MHz clock further increase for the core, you'd be seeing more like a 7.5% performance difference instead of 9.6% that sounds so wonderful.

How much money did I win, at your expense, for real? :p
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
More so for only 125MHz clock (11.5% performance difference)

and for 160MHz clock (9.6% performance difference).

Yeah and my 25 Mhz 486 DX was 33% faster with a mere 8Mhz boost, turbo mode enabled 33 Mhz.

8 mhz --> 33% faster, so I guess I win?? :laugh:


You can't mix percentages with, numeric values...

If the memory gain was 0% instead of a further 3.3% (160MHz memory clock increase with perhaps a tiny bump in latency) along with that 160MHz clock further increase for the core, you'd be seeing more like a 7.5% performance difference instead of 9.6% that sounds so wonderful.

No offense, but you start to sound like a silly child asking for attention.

Seriously you constantly fail to see that if there was a bottleneck or anything close to a bottleneck, there would be NO difference at all beyond that 3%...
 

Bo_Fox

New Member
Joined
May 29, 2009
Messages
480 (0.09/day)
Location
Barack Hussein Obama-Biden's Nation
System Name Flame Vortec Fatal1ty (rig1), UV Tourmaline Confexia (rig2)
Processor 2 x Core i7's 4+Gigahertzzies
Motherboard BL00DR4G3 and DFI UT-X58 T3eH8
Cooling Thermalright IFX-14 (better than TRUE) 2x push-push, Customized TT Big Typhoon
Memory 6GB OCZ DDR3-1600 CAS7-7-7-1T, 6GB for 2nd rig
Video Card(s) 8800GTX for "free" S3D (mtbs3d.com), 4870 1GB, HDTV Wonder (DRM-free)
Storage WD RE3 1TB, Caviar Black 1TB 7.2k, 500GB 7.2k, Raptor X 10k
Display(s) Sony GDM-FW900 24" CRT oc'ed to 2560x1600@68Hz, Dell 2405FPW 24" PVA (HDCP-free)
Case custom gutted-out painted black case, silver UV case, lots of aesthetics-souped stuff
Audio Device(s) Sonar X-Fi MB, Bernstein audio riser.. what??
Power Supply OCZ Fatal1ty 700W, Iceberg 680W, Fortron Booster X3 300W for GPU
Software 2 partitions WinXP-32 on 2 drives per rig, 2 of Vista64 on 2 drives per rig
Benchmark Scores 5.9 Vista Experience Index... yay!!! What??? :)
Those percentages that you talk about are "skewed" percentages. The graph from Xbitlabs does not take that into account.

Let's look at the percentage increase of 1010MHz over 850MHz. It's a 19% increase, but with respect to 850MHz itself.

850MHz over 725 MHz is a 17% increase, but that's with respect to 750MHz.

It's just more appropriate to use simple numerical differences rather than percentages, when talking about Xbitlab's graph.

Please let's keep it civil, assuming that you have already taken some Statistics in college, or at least have some common sense with how the percentages do not do exact justice when talking about this graph. We could use percentage "away" from one same baseline (let's say, from 725MHz, like 1010MHz is 39.3% more than 725MHz, not 19% plus 17% which is only 36% total, which is misleading).
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.50/day)
Location
Reaching your left retina.
Those percentages that you talk about are "skewed" percentages. The graph from Xbitlabs does not take that into account.

Let's look at the percentage increase of 1010MHz over 850MHz. It's a 19% increase, but with respect to 850MHz itself.

850MHz over 725 MHz is a 17% increase, but that's with respect to 750MHz.

It's just more appropriate to use simple numerical differences rather than percentages, when talking about Xbitlab's graph.

Once again, please let's keep it civil, assuming that you have already taken some Statistics in college, or at least have some common sense with how the percentages do not do exact justice when talking about this graph. We could use percentage "away" from one same baseline (let's say, from 725MHz, like 1010MHz is 39.3% more than 725MHz, not 19% plus 17% which is only 36% total, which is misleading.

Have a beer, dude.

No using the total percentages or values is misleading. It's obvious that the Xbit labs are basing it in the 17+19% model because 850 mhz is the baseline, they wouldn't be using negative percentages otherwise.
 
Status
Not open for further replies.
Top