Draw Call Performance, OpenCL Batching & Shader Questions

about everything
  • Author
  • Message
Offline
User avatar
*blah-blah-blah maniac*
Posts: 17427
Joined: 27 Dec 2011, 08:53
Location: Rather not to say

Re: Draw Call Performance, OpenCL Batching & Shader Question

Per vertex lighting will be converted to normal map g-buffer anyway, result will be equal to per pixel lighting, without interpolation between vertices. So the answer is no, you can't make the same look as with per vertex lighting, using deferred.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7

Offline
User avatar
*sensei*
Posts: 316
Joined: 12 Aug 2013, 18:55
Location: Scotland

Re: Draw Call Performance, OpenCL Batching & Shader Question

Argh, that's a ruddy shame. Are there any other rendering methods that'd work, for having the actual polys shaded whilst allowing for many lights?
_________________
Intel i7 6700k | AMD Vega 56 8GB | 2x16GB DDR4 @ 3000mhz | Windows 7 64bit | Creative Soundblaster X-Fi Titanium Fatal1ty Pro | Asus z170 Pro Gaming

Offline
User avatar
*blah-blah-blah maniac*
Posts: 17427
Joined: 27 Dec 2011, 08:53
Location: Rather not to say

Re: Draw Call Performance, OpenCL Batching & Shader Question

Everything is possible with extra tricks, but they are not very useful on practice, so nobody makes per vertex lighting for deferred rendering. Most of the games now do not have per vertex lighting at all.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7

Offline
User avatar
*sensei*
Posts: 316
Joined: 12 Aug 2013, 18:55
Location: Scotland

Re: Draw Call Performance, OpenCL Batching & Shader Question

Ohohoho! Is it possible to get a quick summary of these tricks (or a url link that explains them)? I'd imagine that it's some sort of trick where you render all the objects, take the depth data, and perform trigonometry-based pixel lighting on the UV Maps of the mesh, and somehow superimpose that on the buffers?



On a different subject, I did a bit of talking with Zilav and Sheson, about batching static objects (e.g, a boulder, a house, the rock objects used to make a mountain, etc.) that aren't referenced by scripts and aren't linked to any enable/disable functions, so as to reduce draw calls and increase performance, when used in conjunction with TES5LOD's atlas code.

I'll copy paste what I sent to Sheson. Basically explains, in detail, how the batching would work:
Essentially, it would go as follows, using the information and pre-existing code used by LODGen:


Make a list of all object references in the game, sans the directly interactable objects. For example, the model of a house, a road sign, the cobbled roads, Whiterun's walls, etc., would make it into the list. This list would be organized by the objects in a cell (or make a new list per cell?).


What wouldn't make it into the list, would be things such as a bottle of mead (an item the player will "remove" from the game world when picking it up), a door to a house (since it's animated), a movable static skull (responds to player interaction), that sort of thing.


Then the objects in the list are scanned, to see if they are "used" in any shape or form, so as to prevent things such as a house object being batched, when that house is disabled/enabled by a script, since that'd royally mess things up.


Once those are expunged, the final list will be processed. The objects in a cell would have a mesh instance arranged from the center point of the cell being 0x 0y and 0z. The mesh instances will have copy the position of the original object reference, with the positions changed to accommodate the center point in the new batched mesh; exactly the same way that the LOD meshes are generated.


Then the textures are atlases, again, exactly in the same way that the LOD textures are atlased. Perhaps, rather than having a unique texture atlas for each cell, several master texture atlases are generated, that the batched meshes will use? Might be of benefit in terms of GPU "device changes", as well as RAM/VRAM usage.


Once that's done, the batched mesh is placed in the appropriate cell, and the original objects are moved to a dummy cell, to prevent them from being rendered along with the new, batched mesh, as to have them being rendered would defeat the whole purpose of this.


The main roadblock, is that an object can only have four lights affecting it. Here's what he had to say on the matter:
The question would be if Boris or anyone else could fix the lighting engine limitation of 4 lights per shape. Without that, lights shining on these large meshes will flicker if there is more than 4 per shape - the larger they are the more likely that will happen.

Any insight on whether or not that can be resolved? Or is it too ingrained to be patched?
_________________
Intel i7 6700k | AMD Vega 56 8GB | 2x16GB DDR4 @ 3000mhz | Windows 7 64bit | Creative Soundblaster X-Fi Titanium Fatal1ty Pro | Asus z170 Pro Gaming

Offline
User avatar
*blah-blah-blah maniac*
Posts: 17427
Joined: 27 Dec 2011, 08:53
Location: Rather not to say

Re: Draw Call Performance, OpenCL Batching & Shader Question

I can't add anything new to what already said.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7

Offline
User avatar
*sensei*
Posts: 316
Joined: 12 Aug 2013, 18:55
Location: Scotland

Re: Draw Call Performance, OpenCL Batching & Shader Question

Might as well keep my questions in the same thread, to keep all this lovely info together.

*Achem*

Why is it that AMD CPUs are 3-4x worse than their Intel equivalents (in performance), at draw calls? Is it due to the CPU <-> Motherboard tech being relatively unchanged for legacy purposes, whereas Intel debuts new sockets for their architectures that break backwards compatibility?

Or has it always been like that, even in the days of Pentium 4?
_________________
Intel i7 6700k | AMD Vega 56 8GB | 2x16GB DDR4 @ 3000mhz | Windows 7 64bit | Creative Soundblaster X-Fi Titanium Fatal1ty Pro | Asus z170 Pro Gaming

Offline
User avatar
*blah-blah-blah maniac*
Posts: 17427
Joined: 27 Dec 2011, 08:53
Location: Rather not to say

Re: Draw Call Performance, OpenCL Batching & Shader Question

If i remember, this was not the same in times of Pentium4, Core architecture did these improvements and each generation usually do better. According to the tests, increasing frequency of AMD cpu have very little impact on draw calls performance, while increasing cpu frequency of Intel affect it almost linear. Also different motherboards for Intel play some role too (up to 10-20%). Not just draw calls performance, but triange count per second (in dx9 at least) limited for AMD cpu. Very stupid that AMD Athlon x2 gives about same bottleneck as tested more recent releases (for 3 or 4 years ago tested). In dx11 things should be different, but still i think the problem exists.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7

Offline
User avatar
*sensei*
Posts: 316
Joined: 12 Aug 2013, 18:55
Location: Scotland

Re: Draw Call Performance, OpenCL Batching & Shader Question

Aye, I think DX11 isn't much better on AMD's current CPUs.

I think it was Marcurios that I tested Fallout 4 with, at the top of the Corvega factory (shadow distances maxed, resolution set to 600x480, UsePatchSpeedhackWithoutGraphics=true, etc.). I was pulling ~15 fps, and he was getting a solid 60. Phenom II x4 965 BE vs i7 920. Something like 10k draw calls were being issued.

Similar in number crunching performance, but with draw calls? Nah. Here's hoping that AMD's Zen processors will fix the deficit, otherwise they're dead in the water.
_________________
Intel i7 6700k | AMD Vega 56 8GB | 2x16GB DDR4 @ 3000mhz | Windows 7 64bit | Creative Soundblaster X-Fi Titanium Fatal1ty Pro | Asus z170 Pro Gaming

Offline
User avatar
*blah-blah-blah maniac*
Posts: 17427
Joined: 27 Dec 2011, 08:53
Location: Rather not to say

Re: Draw Call Performance, OpenCL Batching & Shader Question

I don't believe something will change, AMD ignores this problem so many years, they don't care, only targeted to GPU on CPU performance and good heating for russians in Syberia.
_________________
i9-9900k, 64Gb RAM, RTX 3060 12Gb, Win7

Offline
User avatar
*sensei*
Posts: 316
Joined: 12 Aug 2013, 18:55
Location: Scotland

Re: Draw Call Performance, OpenCL Batching & Shader Question

I've managed to gain the interest of a few users at Anandtech, over AMD's draw call performance, and got a couple graphs. These stats are from that NVidia instancing demo you pointed out.

Image

Image

Are there any particular values you'd recommend testing, to get a better idea about AMD's draw call deficit?
_________________
Intel i7 6700k | AMD Vega 56 8GB | 2x16GB DDR4 @ 3000mhz | Windows 7 64bit | Creative Soundblaster X-Fi Titanium Fatal1ty Pro | Asus z170 Pro Gaming
Post Reply