Announcement

Collapse
No announcement yet.

This guy does NOT like the Pentium 4

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • This guy does NOT like the Pentium 4

    Link

    Ouch, he's really steamed
    "That's right fool! Now I'm a flying talking donkey!"

    P4 2.66, 512 mb PC2700, ATI Radeon 9000, Seagate Barracude IV 80 gb, Acer Al 732 17" TFT

  • #2
    This thing is quite old.
    But then though he is a bitof an arguable guy he's also a quite gifted programmer and definitely knows what he talks about. You should especially read the thing about the P-IVs missing piplines to feed all it's integer/fpu cores at once - now that's really a triple D'oh for Intel: giving the CPU those extra cores and then scrapping out the needed pipelines for them all to really operate at once....
    But we named the *dog* Indiana...
    My System
    2nd System (not for Windows lovers )
    German ATI-forum

    Comment


    • #3
      Well, I would be pissed to if all my hard work suddenly didn't look so good anymore due to a new technology.
      Here is a guy who has been optimizing his emulators for a specific architecture and suddenly a new one comes along which is totally different !

      About the claims he makes .... there is a lot of different ways of doing / programming things and somewhere on the Web IO have seen an article where he and Tim Sweeney debates about ways to optimize things !

      I'm not saying the current P4 is perfect, far from, but it's a step in the right direction and hopefully NorthWood will be another step in the right direction.
      Fear, Makes Wise Men Foolish !
      incentivize transparent paradigms

      Comment


      • #4
        But no optimization can make up for the fact that Intel spent the PIV with all those mighty (anc costly!) CPU/FPU cores and then afterwards stripped out the needed pipelines to feed them all at once (btw, those pipelines apparently were there in the early tech-papers....).

        So if you really optimize your code making sure that the FPU/INT cores get used in an optimal way, you still might be trapped due to the pipeline issues (and pipeline stalls are a MAJOR slowdown on the PIV).
        Again: The current implementation of the PIV seems a bit "beta" to me. Maybe they get the real thing out soon.
        But we named the *dog* Indiana...
        My System
        2nd System (not for Windows lovers )
        German ATI-forum

        Comment


        • #5
          If Intel decide to put two FPU's in Northwood instead of the one they currently have in Willamatte, things could look very different and pipeline stalls could be less of a problem.If they don't then I think the only difference could be in the trace cache + large L1 + L2

          And yes .. the current P4 looks much like the Pentium Pro which was replaced by the PII.

          If he so dislikes this issue then why dosen't he write code to take advantage of SSE2 ?
          Fear, Makes Wise Men Foolish !
          incentivize transparent paradigms

          Comment


          • #6
            SSE2 (Screaming Sandy Extensions twice) Is not a magic bullet!

            It is'nt the glorified solution to all speed problems with P4!!
            If there's artificial intelligence, there's bound to be some artificial stupidity.

            Jeremy Clarkson "806 brake horsepower..and that on that limp wrist faerie liquid the Americans call petrol, if you run it on the more explosive jungle juice we have in Europe you'd be getting 850 brake horsepower..."

            Comment


            • #7
              Never said it was, but it can fix the specific problem with the FPU in this case.
              Fear, Makes Wise Men Foolish !
              incentivize transparent paradigms

              Comment


              • #8
                It's just that this reminds me so much of K6-2 days:

                "If everybody uses "3DNOW" Our K6-2 will be faster than a P2 on the same MHz speed" Said AMD

                "A real CPU, as our Has a real FPU" said Intel!

                Joe Average:
                "Duh.. I haven't seen one game that has 3dDown optimising or what they calll it, I's cheating anyway"

                Now Intel comes out with a FPU weak CPU and they say the same thing but I havent heard one reviewer that makes the conection that Intel actualy uses the same argument that AMD has stopped using.

                Many (but not all) just screams " But why don't they use SSE2 optimised applications?

                Intel condemned that aproach at the time but is now embrasing it!

                And optimizing for SSE2 would in his case only benefit a very tiny part of his userbase.

                I have A big dislike for "spiffy new instructions that need special compailing or optimising" because it means that I could be left out if I use the wrong CPU!

                And the above is aimed at Both Intel and AMD!!
                If there's artificial intelligence, there's bound to be some artificial stupidity.

                Jeremy Clarkson "806 brake horsepower..and that on that limp wrist faerie liquid the Americans call petrol, if you run it on the more explosive jungle juice we have in Europe you'd be getting 850 brake horsepower..."

                Comment


                • #9
                  AMD might adopt SSE2 for their next CPUs, then we'll quite likely see SSE2 optimizations.
                  But if you hear game-developpers talk, they don't care that much about SSE2 ("too complicated to use", "not the instructions needed for games",...), nearly all of them prefer and really like NVidias programmable vertex shaders.
                  But we named the *dog* Indiana...
                  My System
                  2nd System (not for Windows lovers )
                  German ATI-forum

                  Comment


                  • #10
                    I don't dislike new things ... normally new things means progress.
                    In the case of the FPU vs ANYTHING ELSE, x86's was getting to a point where the only way to get a better FPU was to increase the number of FPU's ( 3 in AMD's) so instead *ntel decided to use another approch -> SSE2 which also means that more instructions can be executed simultaneously. ( under the right circumstances )

                    Because one of the big differences on RISC vs x86 is FPU performance, but not anymore.

                    I also remembers the 3DNow debate.

                    And yes, game developers are saying that it's not the CPU which is holding a game back but the performance of the GFX card ( G550 )
                    Fear, Makes Wise Men Foolish !
                    incentivize transparent paradigms

                    Comment


                    • #11
                      The bright side of Pentium IV is that it´s a real evolution on the memory bandwidth requirements. I can dig also the extended SIMD instructions because they are the future. Pure x86 fpu has severe limitations. You´d need a major brute force approach to have the same power with x86 fpu as with SIMD intructions.

                      What I don´t feel so confortable with is the weak x86 fpu power, compared with the Athlon. Legacy support anyone? It wouldn´t hurt to give it roughly the same fpu power as the PIII. What saves the PIV is the insane clock speeds their running at. But I still find amusing reviewers raving about PIV 1800 Mhz finally scoring better than the Athlon 1400 in some benchmarks. Yes, big thing for a chip running 400 mhz faster and being 3 times more expensive.

                      RAMBUS was also a very very bad choice. Intel is finnaly getting it, though. Dual channel DDR SDRAM (a la nForce) for the PIV, anyone?

                      As a side note, why oh why such a low amount of L1 cache??? Unbelievable. How hard could it be to throw at it at least 32 Kb of L1? Not a million more transistors required, for sure.

                      Comment


                      • #12
                        About the L1 Cache .... The smaller the size of the L1 cache the LOWER the latency.

                        The P4 can boast THE lowest latency to date on it's L1/L2 cache.

                        It's all a matter of design decisions ... Larger L1 cache -> Higher latency... fewer cache misses yes, but it really doesn't matter because the P4 is not as dependant on x86 decoder registers as other x86 CPU's due to it's trace cache.

                        The P4 L1 cache is not divided into the normal data and instruction parts, instead there is a small Data cache and a special trace cache.
                        The Trache cache sorts instructions so they lie in a sequential order.Furthermore it doesn't store instructions but micro-ops and it doesn't store addresses of instructions as normal CPU's do but it stores the EXPECTED program-flow.
                        The BIG thing here is, that the micro-ops stored in the trace cache can be executed an infinite number of times without the need of tracing them every time.This gives the P4 a very good performance in loops as compared to the P3.Furthermore the trace cache makes the number of concurrent instructions the P4 can execute independant of the number of x86 decoders.

                        The L1 cache reads data in 128 bit chunks ( P3..32 bit ) this means that the content of the cache is changing rapidly.This is a great advantage when processing huge amounts of sequential data but a disadvantage when the content changes constantly ( 128 bit need to be changed at a time ).
                        So Intel needed to design a L1 which wouldn't hold the rest of the CPU back and to do so they needed a L1 with a low Latency and ended up with a 8Kb L1-data with a latency of 2 clocks and since data can be delivered on every clock cycle the bandwidth of the P4 L1-cache is 48 GB/S at 1.5 Ghz.This bandwidth can only be achieved when using 128 Bit sse2 instructions otherwise it's about half of that.
                        The L2 cache has a 45 GB/S bandwidth at 1.4 GHz with a Latency of 7 and it can also transfer data on every clock cycle.So L1 Latency + L2 Latency = 9 clock latency as compared to a 20 clock latency on The Athlon.
                        Last edited by Kosh Naranek; 15 July 2001, 18:08.
                        Fear, Makes Wise Men Foolish !
                        incentivize transparent paradigms

                        Comment


                        • #13
                          Reaching 3 Ghz with a L1 cache size of 64 Kb is almost impossible without holding the rest of the CPU back.
                          Double pumping .... 1.5 Ghz cpu -> 3Ghz ALU ... 3Ghz cpu -> 6 Ghz ALU.
                          Fear, Makes Wise Men Foolish !
                          incentivize transparent paradigms

                          Comment

                          Working...
                          X