Search this blog

26 April, 2014

Smoothen your functions

Do you have an "if", "step" or such? Replace with a saturate(multiply-add(x)).
Do you have a mad-saturate? Replace with a smoothstep.
Do you have a smoothstep? Replace with smootherstep...

Ok, kidding, but sort-of, I actually do often ed up replacing ramps (saturate/mad... the fairy dust of shading, I love sprinkling mads in shader code) instead of steps, I remember years ago turning a pretty much by-the-book crysis 1-style SSAO into a much better SSAO by just "feathering" the hard in/out tests (which is kinda what line sampling SSAO does btw).

If you think about it, it's a bit of "code smell". What shading functions should be discontinuous? True, most lighting has a max or saturate right? But why? Really we're considering infinitesimal lights, for physically realistic lights we would have an area of emission, and that area would be fractionally shadowed by a surface, so even there, the shadowing function wouldn't just be a step of the dot product. This might not be evident on diffuse, but already when you're trying to use half-angle based specular attention has to be taken when handling transition to the "nightside".

And of course even when reasonable, any "step" function (well -any- function!) in a shader should be anti-aliased... And of course everybody knows what's the convolution of a step with a box (pixel footprint) is... Texturing and Modeling, a Procedural Approach is the canonical text for this, but it's funny, googling around one of the first hits is this documentation page of Renderman on antialiasing, whose slides are horribly aliased. The OpenGL Orange Book also has examples, and I really want to mention this IQ's article on ray differentials even if it doesn't do analytic convolution...

Many times the continuity of derivatives is not that important (visible), that's why we can use saturated ramps (discontinuous in the first derivative) or saturated smoothsteps (discontinuous in the second), with the big exception of manipulating inputs to specular shading. In that case, even second-derivative discontinuities can very clearly show, thus the need of the famous "smootherstep".

Anyhow. I usually have a bunch of functions around to help with ramps, triangle ramps, smoothsteps and so on, most of them are trivial and can be derived on paper in a second or so. Lately I had to use a few I didn't know before, so I'll be writing them down here.

Yes, all this introduction was useless. :)

- Smooth Min/Max

log(pow(pow(exp(x),s) + pow(exp(y),s),1/s))

This will result is a smooth "min" between x and y for negative values of s (which controls the smoothness of the transition), "max" for positive values.

For s=-1 this results in the "smoothest" min:

log(exp(x+y)/(exp(x)+exp(y))

If you know that x,y are always positive a simpler formulation can be employed, as we don't need to go through the exponential mapping:

pow(pow(x,s) + pow(y,s),1/s)


Note also that if you need a soft minimum of more than two values, your expressions simplify, e.g. pow(pow(pow(pow(x,s) + pow(y,s),1/s),s)  + pow(z,s),1/s) = pow(pow(x,s) + pow(y,s)  + pow(z,s) ... ,1/s).

Note also the link between softmax and norm-infinity.

- A few notes on smoothsteps

Deriving smoothstep and smootherstep is trivial, just create a polynomial of the right degree (cubic or quintic) and impose f(0)=0, f(1)=1 and f'(0)=0, f'(1)=0 (and the same for f'' in case of smootherstep), solve and voila'.


Once you do that, it's equally trivial to start toying around and derive polynomials with other properties. E.g. imposing derivatives only at one extreme:


You can have a "smoothstep" with non-zero derivatives at the extremes:


Or a quartic that shifts the midpoint:


It would seem that the more "properties" you need to have the higher degree polynomial you need to craft. Until you remember that you can do everything piecewise...
Which is basically making small, specialized splines. For example, a quadric smoothstep can look like this:


This is helpful also because there are certain tradeoffs based on applications, especially as having continuous derivatives don't mean automatically that it will be nice looking...
You can make functions that impose more and more derivatives (and do you know that smoothsteps can be chained? smoothstep(smoothstep(x))...) but that doesn't mean the derivatives will "behave", as they can vary wildly in the domain and result in visible "wobbling" in shading.


Another thing that you might not have noticed is how close smoothstep is to a (shifted) cosine, I didn't before a coworker or mine, the all-knowing Paul Edelstein, mentioned it. Probably not too useful, but never know, in certain situations it might be applicable and cheaper.


- Sigmoid functions

Another class of functions that are widely useful are sigmoids, "s shaped functions"

Smooth Sigmoid: x/pow((pow(abs(x),s)+1),1/s)
Logistic: 1/(1+exp(-x))

Sigmoids are similar to smoothsteps, but usually reach zero derivatives at infinity instead at 0,1 endpoints.


They make nice "replacements" for "step" as they approach nicely their limits as they go to infinity:


But also for saturated ramps, especially the smooth sigmoid as it has f'(0)=1 as we have shown before.


Another sigmoid is the Gompertz function, which has nice and clear parameters:

asymptote*exp(-displacement*exp(-rate*x))

Beware though, it's not symmetric around its midpoint:


There are a ton more, but I'd say not as generic. If you look at the various tonemapping curves, most of them are sigmoids, but most of them are in exponential space and not symmetric.
In fact at a given point I made tonemapping curves out of sigmoids, piecewise sigmoids or other weird things glued together :)



- Bias and Gain (thanks to Steve Worley for reminding me of these)

Bias pow(x,(-log2(a))
Gain if x < 0.5 then 0.5*bias(2*x, a) else 1-0.5*bias(2-2*x, a) 

Schlick's Bias x/((1/a-2)*(1-x)+1)
Schlick's Gain if x < 0.5 then SBias(2*x,a)*0.5 else 1-0.5*SBias(2-2*x,a))

Bias is just a power (-log2(a) only maps 0...1 to the power), and Gain maps one power next to a mirrored copy around the midpoint, the easiest way you can construct a piecewise sigmoid (without imposing conditions on the derivatives and so on).

Schlick's versions were published in Graphics Gems IV, and are not only an optimization of the original Bias/Gain formulas (credited to Perlin's Hypertexture paper), but are symmetric over the diagonal, which is a nifty property (it also means that for parameter a the inverse curve is given by the same formula with 1-a)



- Smooth Abs


Obviously if you have any "smoothstep" you can shift it around zero to create a "smoothsign" and multiply by the original value to get a smoothed absolute. The rational polynomial sigmoid works quite well for that:


SmoothAbsZero d*x*x/sqrt(1+d*d*x*x)

If you don't need to reach zero at x=0 then you can simply add an epsilon to the square root of the square of your input, yielding this


SmoothAbs sqrt(x*x+e)


And that's all what I have for now, if you encountered other nifty functions for modelling and tinkering with procedurals and so on, let me know in the comments! 
I'm always looking for nifty functions that can be useful for sculpting mathematical shapes :)

- Bonus example: Soft conditional assignment


Some links:

08 April, 2014

How to make a rendering engine

Today I was chatting on twitter about an engine and some smart guys posted a few links, that I want to record here for posterity. Or really to have a page I can point to every time someone uses a scene graph.

Remember kids, adding pointer indirections in your rendering loops makes kitten sad. More seriously, if in DirectX 11 and lower your rendering code performance is not bound by the GPU driver then probably your code sucks. On a related note, you should find that multithreaded command buffers in DX11 make your code slower, not faster (as they can be used to improve the engine parallelism but they are currently only slower for the driver to use, and your bottleneck should be the driver).


Links below are all about the idea of ditching state machines for rendering, encoding all state for each draw and having fixed strings of bits as the encoding. I won't describe the concept here, just check out these references:

Some notes / FAQs answers. Because every time I write something about these system people start asking the same things... I think because so many books still talk about "immediate" versus "retained" 3d graphic APIs and the "retained" is usually some kind of scenegraph... Also scenegraphs are soooo OOP and books love OOP.
  • Bits in the keys are usually either indices in arrays of grouped state (e.g. camera/viewport/rendertarget state, texture set, etc...) or direct pointers to the underlying 3d API data structures
    • So we are following pointers anyways, aren't we? Yes of course, but the magic is in the sort, it not only will help minimize state changes but also guarantees that all accesses in the arrays are (as-)linear(-as possible)!
    • Of course if you for example sort strictly over depth (not in depth chunks), then you have to accept to jump between materials at each draw, and the accesses over these might very well be random.
      • If that's the case try to avoid indirections for these and store the relevant data bits directly in the draw structure.
      • Another solution for this example case is to sort the material data in a way that is roughy depth coherent, i.e. all materials in a room are stored near each other. In theory you could also dynamically sort and back-patch the pointers to the material data in the game code, but we're getting too complex now...
    • The same can't be guaranteed for resource pointers (GL, DX...), even if the pointers will be linearly ordered they might be far away in memory, that's unavoidable. On consoles you have control on where resources are allocated even for GPU stuff so you can pack them together, but even more importantly you can directly store the pointers that the GPU needs w/o intermediate CPU data structures
  • You don't need to have a single array of keys and sort it!
    • Use buckets, i.e. some bits of the key index which bucket to use. Bucketing per rendertarget/pass is wise
    • "Buckets", a.k.a. separate lists. In other words don't be shy to have a list per subsystem, nobody says there should be one solution for all the draws in your engine.
      • This is usually a good idea also because draw-emitting jobs can and should be sequenced by pass, e.g. in a deferred renderer maybe we want first a rough depth-prepass, then g-buffer, then shadows... These can be pulled in pass-order from the visibility system
      • Doing the emission per pass means we can kick the GPU as soon as the first pass is done. Actually if we don't care for perfect sorting, and we really care about kicking draws as soon as possible, we can even divide each pass in segments and kick draws as soon as the first segment is done.
      • I shouldn't say it but just in case... These systems allow to generate draws in parallel, obviously, and also to sort in parallel and to generate GPU commands in parallel, quite easily. Just keep lists per thread, sort per thread, then merge them all (the only sync point) then split in chunks and per thread create GPU command lists (if you have an API where these are fast...)
  • You don't need to use the same encoding for all keys!
    • Some bits can decide what the other bits mean. Typically per rendertarget/pass you need to do very different things, e.g. a shadowmap render pass doesn't need to care about materials but might want to use some more bits as a z-key for depth sorting
    • Similarly, you can and should have specialized decoding loops
  • Not all the bits in the key need to be used for sorting
    • Bits of state that directly map to the GPU and don't incur in overheads from setting them, should not be part of the sorting, they will just slow it down.
  • Culling: make the keys be part of the visibility system
    • When a bounding primitive is finally deemed to be visible, it should add all the keys related to drawing its contents
    • At that point you want also to patch in the bits related to the projected depth, for depth sorting
  • Hierarchical transforms
    • Many scenegraphs are used as a transformation hierarchy. It's silly, on most engines a tiny fraction of objects need that, mostly the animation/skinning system for its bones. Bones do express a graph, but it's not enough of a reason to base your -entire- rendering system on it.
  • Group state that is (almost) always set together in the same bits
    • E.G. instead of having separate bits (referring to separate state structures) for viewport, rendertarget, viewworldprojection constant data and so on, merge all that in a single state structure.
  • Won't I need other rendering "commands" in my list? Clears? Buffer copies? Async CPU jobs waits? Compute shaders? Async compute shaders...
    • All of these can be part of "on bind" properties of certain parts of the state. E.G. when the bits pointing to the rendertarget/pass change we look up in that state structure to see if the newly set rendertarget have to be cleared
    • In practice as you should "bucket" your keys into different arrays processed by different decode loops, these decode loops will know what to do (e.g. the shadowmap decode will make sure the CPU skinning jobs are finished before trying to draw and so on)
  • Are there other ways?
    • Yes but this is a very good starting point...
    • Depends on the game. A system like this is good when you don't know what draws you'll have, typically because they come from a visibility system which can't spit them in the right order and/or because of parallel processing.
    • Games/systems where you can easily generate GPU commands in the right order and you exactly know which state changes are needed, obviously can sidestep all this architecture. E.G. Fifa, being a soccer game, doesn't need to do much visibility and knows exactly how each player is made in terms of materials, thus the code can be written to exactly process things in the right order... Something like this would be reasonable for Frostbite, but you won't use Frostbite for Fifa...