Carmack donne son sentiment...

John Carmack, développeur emblématique du très attendu Doom III a aujourd'hui donné son point de vue sur la récente polémique entourant 3DMark 2003 et les Drivers optimisés de chacun des fabricants. Rappelez-vous : en fin de semaine dernière, FutureMark publiait un patch qui faisait fondre les performances des GeForce FX de près de 50% (voir cette news)...

Dans un papier publié par Slashdot, Carmack déclare aujourd'hui que réécrire un shader dans le dos de l'application qui l'exploite de façon à modifier la sortie est absolument indéfendable. Par contre réécrire un shader de façon à ce qu'il fasse la même chose mais d'une manière plus efficace est selon lui acceptable dans la mesure où cette optimisation touche tous les programmes (NDLR : ce qu'a fait NVIDIA avec les Detonator FX).

Carmack revient également sur la différence d'architecture entre les VPU ATI et NVIDIA. Le GeForce FX du NVIDIA peut travailler avec une précision de 12bit (seulement sur des entiers), 16 bit et 32 bit alors que le Radeon 9700 d'ATI ne fonctionne qu'en 24 bit (qui est la spécification officielle Microsoft DirectX 9.0). Partant de ce constat Carmack explique qu'il n'est pas possible d'établir une comparaison honnête entre les deux Processeurs. Selon lui un GPU NVIDIA opérant en 16 bit (floating) sera plus rapide qu'une puce ATI, et inversement lorsque le GPU NVIDIA fonctionne en 32 bit (floating).

Enfin Carmack assène que ce que NVIDIA a pu faire en convertissant des opérations 32 bits en opérations 16 ou 12 bit est totalement acceptable, tant qu'aucune perte de fonctionnalité ou de qualité n'est à regretter. Vous trouverez ci-dessous la prose originale de Carmack :

"Rewriting shaders behind an application's back in a way that changes the output under non-controlled circumstances is absolutely, positively wrong and indefensible.

Rewriting a shader so that it does exactly the same thing, but in a more efficient way, is GeneRally acceptable compiler optimization, but there is a range of defensibility from completely generic instruction scheduling that helps almost everyone, to exact shader comparisons that only help one specific application. Full shader comparisons are morally grungy, but not deeply evil.

The significant issue that Clouds current ATI / Nvidia comparisons is fragment shader precision. Nvidia can work at 12 bit integer, 16 bit float, and 32 bit float. ATI works only at 24 bit float. There isn't actually a mode where they can be exactly compared. DX9 and ARB_fragment_program assume 32 bit float Operation, and ATI just converts everything to 24 bit. For just about any given set of operations, the Nvidia card operating at 16 bit float will be faster than the ATI, while the Nvidia operating at 32 bit float will be slower. When DOOM runs the NV30 specific fragment shader, it is faster than the ATI, while if they both run the ARB2 shader, the ATI is faster.

When the output goes to a normal 32 bit framebuffer, as all current tests do, it is possible for Nvidia to analyze data FlOw from textures, constants, and attributes, and change many 32 bit operations to 16 or even 12 bit operations with absolutely no loss of quality or functionality. This is completely acceptable, and will benefit all applications, but will almost certainly induce hard to find bugs in the shader compiler. You can really go overboard with this -- if you wanted every last possible precision savings, you would need to examine texture dimensions and track vertex buffer data ranges for each shader binding. That would be a really poor architectural decision, but Benchmark pressure pushes vendors to such lengths if they avoid outright cheating. If really aggressive compiler optimizations are implemented, I hope they include a hint or pragma for "debug mode" that skips all the optimizations."