PDA

View Full Version : T&L doesn't help at low-res says nVidia?!



Pace
12th May 2000, 05:00
I was over at VE (http://www.voodooextreme.com) and saw a link to this article (http://www.nvnews.net/articles/clrespondsto3dfx.shtml) on nVNews (http://www.nvnews.net). It is an argument between 3dfx/Voodoo5 and Creative Labs/NVIDIA/GeForce about the (lack) of benefits from T&L.

The Creative guy (Steve Mosher) shows off this graph:
http://www.nvnews.net/articles/clv3dfx/cpubound2.gif
and says this about it:

How is this possible? Isn't the GeForce2 much faster? Well it's simple. At this resolution you are CPU bound. In the case of the Geforce 2 the graphics happens twice as fast, but the frame rate is still determined by the CPU speed. In fact the graphics could happen INFINTELY FAST, and at 512*384 your frame rate would still be 119.
Surely the low-res scores are limited by polygon throughput/transform, lighting and clipping? Which is what CPU limited used to mean, yes? So if the T&L is now done on the graphics card then surely upping the CPU speed should make no difference whatsoever or very little. So then, upping the CPU for Voodoo 5 or GPU for GeForce whould up polygon/T&L performance and hence, low-resolution performance and frame rates.

3DMark2000 is very polygon intensive and runs at resolutions where fill-rate is not the limiting factor and says (I think) that T&L will be very helpful in this situation.

RSVP, Paul.

Corwin_Brute
12th May 2000, 06:45
Well, yes and no. A T&L engine has to get data to work properly. This data comes in the form of vertices. The vertices are calculated by the CPU. Then, they are transformed and lit by either the CPU or the GPU. So if the CPU can't compute vertices faster, the T&L engine sits there, idle.
BTW, another bottleneck is the AGP bus (all vertices travel from the CPU to the GPU via the AGP bus). This should change with DirectX 8 and higher order surfaces (which both the GeForce and the GeForce 2 can't handle).

------------------
Corwin the Brute

Corwin_Brute
12th May 2000, 06:48
Oh, and BTW, I found (on the argument you first mention), that 3DFx's marketing is on a rampage, these days... They have the right to be cocky with their last product (I know that many people here don't agree with me, but the VSA-100 is a very nice design), and they are on war. On the other hand, NVidia came with a very, very disappointing product and are playing damage control... Interesting turn of events, indeed...

------------------
Corwin the Brute

Snake-Eyes
12th May 2000, 06:50
Reading the article, and thinking about the reply, what he says seems to make sense. While it is true that the cards CAN perform more transforms than they are at the 512 resolution, they are never getting the chance, as the processor is busy performing other tasks for the game- tasks which take the same amount of time no matter which card is installed (that's why he mentions AI, etc.).

So, you have to take into account that the GeForce 1 already able to process more transforms than it was able to get at (has to wait for the processor to finish other tasks for the given scene- again, AI, etc.), so the GTS, even with a higher capability, is still waiting at the same points, resulting in the same framerates.

Now, as you increase the resolutions, you increase the poly count, without increasing the calculcations needed for AI (a bot is still gonna do the same kinds of things, regardless of how clearly you can see him, right? http://forums.murc.ws/ubb/wink.gif ). If the framerates stay consistent over several resolutions, from the lowest up, it's probably safe to say that whatever added tranform count is being done is still well within the overall capability of the card in question.

When you finally reach a resolution where the framerate drops noticeably from the other lower resolutions, it can be assumed that you've hit some sort of limit in the video card (either throughput or T&L limit). Unfortunately, as he points out, the changes made to the GTS from the original GeForce chip make it hard to determine whether the drop in performance is due to the T&L or to the throughput at this range, although it is most likely a memory bandwidth problem (figuring that it went a resolution or so higher than the original GeForce before losing some performance over the CPU limited lower resolutions).

If the memory bandwith wasn't a factor, the core clock increase alone should have allowed the GTS to climb to very likely the highest resolutions without hitting the card's peak (since it is clocked around 66% faster). However, the memory is only 10% faster than the GeForce256 DDR's memory speed, so in many situations the memory limit will be hit before the chip's limit (the GeForce256 DDR was already limited by the memory throughput, too). Maybe nVidia needs to talk to Matrox to get some ideas on how to increase their memory throughput without faster memory, hehe.

What will really have my interest is what benchmark they finally accept to test the true T&L limits of the cards, without the benchmark being affected by either the CPU's speed for those miscellaneous functions or the card's bandwidth limit. As far as I have seen yet, the original GeForce chip has never been able to show its true performance potential because of the bandwidth problem, and the GTS is likely to be no exception. Seems like nVidia needs to start concentrating more on the bandwidth side of the house now, if they want to continue to improve their performance.

------------------
Ace

Himself
12th May 2000, 08:16
1. If the cpu isn't fast enough, the card won't be able to process data it doesn't have. The cpu has to send along all the data required for each frame in time for it to be rendered, if a 1GHz cpu executes one flop per second because you only ask it to do so every second, it's your fault. http://forums.murc.ws/ubb/smile.gif

2. Number of polygons doesn't necessarily relate to screen size. The number of vertices remains the same unless there is some LOD stuff going on. All that changes is the number of operations it takes to fill in the polygons and other operations.

3. There isn't enough bandwidth for the T&L using at higher 32 bit resolutions and/or FSAA modes, this will limit the card not matter what cpu you have. If the card is choked on pixel data, there isn't enough power left over to get 25MP/s of T&L.

4. You will never be able to see all the hardware working at peak performance, you will not get bump mapping, FSAA, and T&L perforamance all at the same time without extreme memory bandwidth and much higher T&L performance per clock rate. Just think of your cpu's performance when Windows decides to resize your swap file, or while you are defragging your 14GB partition. http://forums.murc.ws/ubb/smile.gif