View Full Version : 2 x 4MB or 6MB cache? ? ?
Easy Rhino
04-30-2008, 03:18 AM
which will perform better for games and video rendering? a 2x4MB cache or 6MB ?
kwchang007
04-30-2008, 03:22 AM
One's a quad and one's a dual right? Depends on the apps, if it can take the quad core, that will preform better, but if it can't the faster dual probably will do better.
farlex85
04-30-2008, 03:24 AM
Hey, more the merrier right? I think 2x4mb, but the quad-core aspect of that is the most important I think.
ex_reven
04-30-2008, 03:29 AM
Cache size can be quite important, especially when working with larger amounts of data.
Im assuming 2x4mb means there are two L2 caches. In which case Id go for that because it has a larger overall cache size. But only if we are talking about similarly specced processors.
lemonadesoda
04-30-2008, 09:00 AM
It depends on the application.
Separate caches might only be duplicating data. So one application running as two threads basically has the 2 caches willed with similar data. In which case the 6MB cache is better.
Unfortunately, the thread scheduler in windows isnt clever enough to set or manage "affinity" knowledgeable on the number of cores and their caches. So there really is NO optimum. Its a bit hit or miss. If MS wrote a "professional scheduler" where you could set priorities for applications and threads across cores and caches... then one could optimise the system for each CPU architecture. But that would be very complicated.
ex_reven
04-30-2008, 09:01 AM
It depends on the application.
Separate caches might only be duplicating data. So one application running as two threads basically has the 2 caches willed with similar data. In which case the 6MB cache is better.
Unfortunately, the thread scheduler in windows isnt clever enough to set or manage "affinity" knowledgeable on the number of cores and their caches. So there really is NO optimum. Its a bit hit or miss.
See thats what I was wondering. Are there any true dual L2 caches? And If there are, what the hell is the point of separating them??? Why cant it just be one big cache? Why is it split (other than for the reason of having the same data in both?)
Because cache is built into the core. 2 cores, 2 caches.
DaedalusHelios
04-30-2008, 09:07 AM
Its a shared cache on Core 2's. Its only split when it needs to be.;)
ex_reven
04-30-2008, 09:08 AM
Because cache is built into the core. 2 cores, 2 caches.
I see, I thought it was on the wafer itself.
reviewhunter
04-30-2008, 09:43 AM
It should be 2 dies with 2 cores each,
each die having a shared L2 cache between the 2 cores.
ex_reven
04-30-2008, 09:46 AM
Yay now im confused.
btarunr
04-30-2008, 09:50 AM
Intel quads are built in a way that two cores share an L2 cache. So, if your encoder is multi-threaded, no thread can have access to more half the amount of total L2. If I'm using a Q9300, I have 2x 3MB cache, Q9450 gives 2x 6MB, Q6600 gives 2x 4MB.
A lot depends on the encoder, which instruction sets it uses, if it could use 4 threads, etc. I'd pick 2x 4MB.
ex_reven
04-30-2008, 09:53 AM
That makes sense.
Darknova
04-30-2008, 10:22 AM
It also depends on the architecture :)
Intel uses such huge cache amounts because it's still using the aging FSB which just can't cope with the massive amounts of bandwidth the Core 2 architecture asks for, so it uses a large cache to keep a lot stored there.
AMD uses HyperTransport, which is much faster, and dedicated to the CPU - NB link, so it can afford to use a smaller cache size. :)
lemonadesoda
04-30-2008, 01:34 PM
AMD uses HyperTransport, which is much faster, and dedicated to the CPU - NB link, so it can afford to use a smaller cache size. :)
Not trying to split hairs, but since this is a technical discussion, let me put a few facts out:
1./ Hypertransport is NOT dedicated to the CPU. It is a communications interface/protocol that is used on a mainboard between chips. An analogy would be "PCIe", or "ethernet". On AMD it is used between north and south, just as between north and CPU. On other architectures it is used for interface cards, just like PCIe on PCs. HT is also used on routers and switches.
2./ HT (hypertransport) is an industry STANDARD. What does that mean? There are LOTS of chips that support the protocol
3./ Look at a typical intel schematic.
http://upload.wikimedia.org/wikipedia/en/thumb/9/98/Motherboard_diagram.png/382px-
Intel uses a proprietary communications protocol between the CPU and the northbridge, and between the north and southbridge. Using HT, replace "internal bus" and "fsb bus" with HT. One standard. No chips to make... they are out there already... and CPU, or mainboard, can be designed INDEPENDENTLY of each other, and of manufacturer of the HT components. "Plug-and-go".
4./ HT works at the following speeds in consumer situations (there are other versions of HT being used in very expensive situations that are faster, e.g. Cray), wth clock and transfer rates on HyperTransport 1.x (i.e. available on socket 754 processors), HT 2.x (s939):
200 MHz = 400 MT/s = 800 MB/s
400 MHz = 800 MT/s = 1,600 MB/s
600 MHz = 1,200 MT/s = 2,400 MB/s
800 MHz = 1,600 MT/s = 3,200 MB/s
1,000 MHz = 2,000 MT/s = 4,000 MB/s
Only 1,000 Mhz is FASTER than FSB 800, but not a lot. But on newer Intel FSB 1066, 1333 and 1600, the Intel FSB is faster.
However, unless using VERY VERY VERY expensive RAM, we use FSB/memory dividers on Intel to get back to 800 or 1066 RAM. Same on AMD. So they are approximately the same. Although in theory, Intel is better if you get faster RAM.
5./ Latency is the issue. The HT bus has lower inherent latency than the FSB-memory bus. Depending on the type of application, this can be a benefit. Intel compensates higher latency with a bigger cache. The cache is lower latency than HT, so, in practice, the intel wins out on averages. (Cache misses, slower, cache hits, faster, average, faster if big cache).
6./ How to go faster? Intel is going down the route of triple and quad channel memory, to double bandwidth further. This means that data can be ripped at 64.bit words rather than 32.bit words which is the max of HT and existing intel FSB. It mean redesigning the CPU-northbridge to be wider than 32bits. Given an overhaul of the CPU and interface protocol, why not use that also in the north-south bridge interface, just like HT uses the same bus across all chips. This is what intels calls "quick path".
Quick-path = intel version of HT, but with a WIDER bus (64bit) than HT (32bit). In theory, it will be faster, at comsumer prices. (There is a HT 3.x which is used in very expensive equipments not practicable for consumer boxes).
Apologies for any errors in the above. I'm only an "armchair" expert :pimp:
btarunr
04-30-2008, 02:36 PM
One point missed perhaps is that 'internal bus' between the NB and SB in Intel chipset models is DMI (Direct Media Interface). By today's standards that is a very weak chipset bus. In fact to overcome the bottleneck of DMI, Intel connected the NB to SB in the 5400 XS (Skulltrail) using DMI + PCI-E x4. NVidia uses a 8 or 16 bit HyperTransport link as chipset bus. Ironically, AMD chipsets are different. They're reminiscent of the chipset division of ATI. The NB connects to the SB using 2~4 PCI-E lanes (depending on load), what ATI used to call A-Link. It's still in use with the 7-series chipsets. But hey, PCI-E x4 is better than DMI.
Intel is innovating a technology to rival HyperTransport. They should use it to build a strong chipset bus first.
Morgoth
04-30-2008, 03:06 PM
Now i jump in and show you a demostration of Quikpath
http://www.intel.com/technology/quickpath/index.htm?iid=tech_arch_nextgen+body_quickpath_bul let
---
Cache Demostration
http://www.intel.com/technology/architecture-silicon/next-gen/demo/demo.htm
lemonadesoda
04-30-2008, 03:50 PM
That demo is aweful. I hope its not true! (that is, it is some graphics designed for the consumer smoke and mirrors... rather than to a technically savvy audience).
1./ Exclusive memory for each CPU? Well thats going to require duplication, just like it does in crossfire. Terrible from a memory efficiency standpoint. (Same asset loaded in both banks).
2./ CRC 72bit rather than 32bit. WOW big deal. Since the data is 64bit word rather than 32bit word, you need a bigger 64bit CRC anyway.
3./ Better "self-repair" of bad memory transfer. Big deal, I dont need my 0.0001% of memory errors being repaired quicker. #1, it makes no practical performance difference, #2, if it does, then there is a serious problem with memory errors that needs to be fixed by a better design, not a better band-aid.
4./ CRC error correction: WAIT, I dont need such a complicated error management system. This is not ethernet. We are INSIDE the PC. If there are so many errors, we have a FAULT and we need a FAULT to be shown, not the PC crawling at a snails pace by rerouting data. Not unless this is in the "shutdown/suspend" situation, where the PC survives a crash more politely. But I dont want the machine to continue going, without notice, and to be performing at a snails pace.
5./ Cache snoop. Just smoke and mirrors. There's no reason all CPUs couldnt share a single bus, rather than having separate interconnects. If it takes 4 hops, then you might as well take 1 hop at 25% of the speed, and have the same net result, but with a much simpler, and much more scaleable, architecture. I understand the reason for it... that you could have simultaneous transfer between CPU 1 and 2, and, 3 and 4. But the number of permutations where you get that true independence, and the number of occasions you actually need it, is going to be small. IMO this is like the new "netburst". Its addiing a lot of complexity for very little gain. For the system to work... the thread scheduler would have to be completely redesigned. And how would it work if there were 8 CPUs? The number of interconnects goes up like a fibonacci sequence. So either the demo is wrong. Or it is an unscaleable design.
>> Number of interconnects between CPUs = 2 CPUs, 1 interconnect, 3 CPUs, 3 interconnects, 4 CPUs, 6 interconnects, 5 CPUs, 10 interconnects, 6 CPUs, 15 interconnects, 7 CPU, 21 interconnects, 8 CPU, 28 interconnects. WTF. This bus system is not tractable.
No. of interconnects for x CPUs = no. interconnects for (x-1) CPUs + (x-1), and no interconnects (1) = 0
Morgoth
04-30-2008, 03:56 PM
its more for multie socket cpu's
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.