Mixed Resolution Rendering

March 26th, 2009

Here are the slides for my presentation ‘Mixed Resolution Rendering’ at GDC’09.

Here are a few relevant previous blog entries:

Keepin’ it Low Res

Let’s Have a Min/Max Party

Imperfect Shadow Maps

Geometry-Aware Framebuffer Level of Detail

GDC’09, ShaderX7

March 24th, 2009

I’m out at GDC now. The first two days have been pretty low-key. I spent the day yesterday in the Advanced D3D tutorial and today in the Insomniac PS3 programming tutorial. AD3D had some interesting tidbits, I think ATI and NVIDIA will have slides up shortly. Unfortunately I can’t access the slides that GDC has online for attendees for some reason. The Insomniac session had a really good introduction to Cell programming, how they do gameplay updates on SPUs, a talk on debugging the SPUs that went a bit over my weary head, and a talk on some of their PS3 graphics work for Resistance and Ratchet & Clank. They’re using a sort of halfway lighting pre-pass technique where they render out the normals and specular power, then compute deferred lighting into a buffer, then do a forward pass where they just grab the direct lighting from the previous pass’ results. It wasn’t exactly clear to me how they were doing this with MSAA. All the deferred lighting is computed once per pixel but their forward pass is MSAA. So inevitably their MSAA samples are going to be grabbing incorrect lighting information from the lighting buffer, or so it seems to me.

Overall it looks like attendance is way down this year. This is a shame because I think this year has the best content out of the three years I’ve gone. The lineup for the next three days is solid: Killzone 2 rendering, Gears of War 2 rendering, Larrabee talks, terrain rendering in Halo Wars, a few talks on PS3 programming, etc.

I got my copy of ShaderX7 in the mail the other day. There are lots of neat little articles packed within the monstrous 800 page book. Unfortunately,  myself and my co-authors were excluded from the bios and authors list due to some error, but the article found its way in (under the title “Deferred Occlusion from Analytic Surfaces”).

Mixed Resolution Rendering talk @ GDC 2009

March 3rd, 2009

Organizers finally scheduled my talk at GDC 2009. It will be happening Friday 27th from 10:30am — 10:50am in Room 130, North Hall. The description of the talk is here (mirrored link). I’ll be showing one or more short demos as time allows. Come out and say hi.

How Did I Forget About Humus?

March 1st, 2009

Somehow I haven’t checked Humus’ website for about six months. Not sure how that happened, as that guy always has great little demos and tricks to share. Of particular interest to me is his trick for detecting where to perform multisampled deferred shading in his Deferred Shading 2 app. By passing SV_Position to the pixel shader with centroid interpolation, you can detect if you are at an edge by examining if the centroid sampled position has moved from the pixel center. Awesome! Saves you an edge detection pass.

Also of particular interest to me are Shader Programming Tips #1, 2, and 3. It’s great to see someone talking about GPU Shader Analyzer. The program is super-handy and I use it almost every day. I can’t imagine optimizing without it.

His little experiment with Alpha to Coverage here is pretty cool too. I’ve always thought A2C was lousy. I never thought that maybe that was partially because the HW implementation was not so good.

Thanks Humus for adding an RSS feed so that this will never happen again!

DX11 is Swell

March 1st, 2009

Microsoft released the DX11 tech preview in their Nov 2008 SDK and I haven’t heard too much in the way of public developer reaction. Tesselation and Compute Shaders are cool and everything but I like a lot of the small improvements. Append and Consume buffers are a good example of this.

In DX11, shaders can now effectively ‘stream out’ variable amounts of data to special ‘append’ buffers, whereas previously you could only get this type of behavior from geometry shaders. The problem with geometry shaders is that the API puts restrictions on the streamed out data. For example, the output has to maintain the ordering of the input. But if the ordering of the output doesn’t matter to you, which in the majority of cases I’ve run into it doesn’t, you get a performance penalty because of the overhead of enforcing these restrictions.

The append and consume buffers can be RAW (byte address based) or structured (define arbitrary element structure). So effectively what you get is a two-pass Producer/Consumer model. You make an append view of your buffer, output to it, then use a consume view of your buffer to subsequently access that data. This is how we would have done our scene management in the Froblins demo rather than using successive stream out passes from the Geometry Shader, because the ordering does not matter. So if you’re simply trying to do variable feedback, append buffers are for you.

Other awesome little features in DX11 are:

  • Read-only depth buffers: You can sample your depth buffer while you’re using it for depth culling. Now your depth-based splatting techniques can use 3D proxies and have them culled against the depth buffer
  • Conservative depth: You can output depth from a pixel shader without trashing Early-Z. Basically you provide the limit on what depth you’ll write out and Early-Z will use that information to do early depth culling. This will be great for relief texture mapping and the like
  • DrawIndirect: You can write the amount of data streamed out from a shader and use that to invoke an instanced draw call. Previously you had to issue a stream out statistics query to know the instance count. Queries == bad. This is another improvement we could have used in Froblins
  • Coverage as PS input: You can get the coverage mask, woo hoo! Now you can figure out where to do per-sample shading in your deferred shading pipeline without a separate edge detection pass (amongst other uses, for sure)
  • Gather4 improvements: Specify which channel of a multi-channel texture to fetch from. Can also use programmable offsets

I don’t want to give the wrong impression here, I am quite enthused about compute shaders too. There was more than one talk at I3D that complained of high interop costs between CUDA and their graphics API. Having your compute capability as a part of your graphics API will be nice.

While I’m on the topic, I am also into the Feature Level support in DX11. What this does is allow the DX11 API to be used for non-DX11 hardware platforms. It can go all of the way back to DX9, SM2.0. This is going to be great, and a big reason why a lot of PC developers are going to go straight to DX11 engine development.

One thing that is still missing from DX: OR blending. Come on! OGL has had this forever.

I3D 2009 : Day 1

February 27th, 2009

Today was the first day of I3D 2009. It’s in Boston this year so I don’t have to travel far to attend. I3D is my favorite conference because of its small size and real-time focus. It was great to see many familiar faces and meet some new ones.

The conference starts early tomorrow so I’ll just post a few highlights. These are my high level understandings of the papers, I may be missing the point entirely :)

GigaVoxels: Ray-Guided Streaming for Efficient and Detailed Voxel Rendering

Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre, Elmar Eisemann

The GigaVoxels paper is a sparse voxel rendering technique that improves on some of the limitations of previous work in this area. For example, instead of raycasting the voxels directly (like in the Olick demo that was shown at SIGGRAPH08), 3D volumes are stored at nodes. This enables tri-linear filtering, fixing some of the aliasing previously present in rendering. They also have improvements in storage representation and memory layout (over Gobbetti et al’s work). Definitely want to give this one more of a read later on…

paper link

Multiresolution Splatting for Indirect Illumination

Greg Nichols, Chris Wyman

I’ve been wanting to write something about this paper for awhile. I’ll probably write more later because I have an implementation of this working and have quite a few suggestions for anyone else who is interested in implementing it themselves.

Anyway, this method allows splatting techniques such as Splatting Indirection Illumination by Dachsbacher and Stamminger to operate at multiple resolutions, saving on fill-rate. The intuition here is that for diffuse indirect illumination, high frequency changes in illumination (in image-space) occur near geometric discontinuities. By generating min-max maps of depth and normals, discontinuities can be detected at any resolution, and illumination can be splatted at the lowest resolution possible. A semi-intelligent upsampling technique is used to combine these multi-res illumination images smoothly. I think this upsampling technique could be used in other algorithms with some tweaking. More on this later.

paper link

Hair Self Shadowing and Transparency Depth Ordering Using Occupancy Maps

Erik Sintorn, Ulf Assarsson

Another great hair rendering paper from Sintorn and Assarsson. This one introduces Occupancy Maps, which is basically fast voxelization used in an intelligent manner for hair rendering. The occupancy map, in conjunction with something called a slab map, allows each rendered hair to figure out how many hairs are between it and the camera. If you assume that every hair has the same opacity, you can calculate the alpha value of the nth hair in pixel based on the n-1 hairs in front of it, allowing them to be correctly composited for viewing. On the shadowing side, the same sort of operation can be performed from the light view to figure out a volumetric shadow value for every hair fragment.

paper link

Approximating Dynamic Global Illumination in Image Space

Tobias Ritschel, Thorsten Grosch, Hans-Peter Seidel

I have to be honest that when I first read this paper, I kind of didn’t give it a chance. At first glance, it appeared to just be an obvious extension of SSAO. And judging from some posts I’ve seen on message boards, I’m not the only one who did this. But the fact is that image-space indirect illumination is just a small part of this paper.

In my opinion, the best part of this paper is the directional occlusion information that you get per-pixel. This allows you to incorporate directional occlusion into environment lighting, which looks really nice. I actually had a similar idea, but in a much more expensive approach. When the NVIDIA guys started talking about Horizon-Based AO, it seemed really obvious to me that if you’re going to generate horizon information for AO, you might as well use that information to figure out the region of the hemisphere that is visible. So if you know the size of the “aperture” above a point, you can choose from a few different pre-convolved environment maps to get illumination from the right portion of the hemisphere. Also, calculating a bent normal from the same horizon info allows you to adjust where you sample that convolved environment map. But the HBAO technique is a bit expensive (computing a tangent frame, etc), while the technique in this paper is a little lighter weight, but probably more prone to artifacts. Definitely want to play around with this stuff.

paper link

I look forward to meeting more people tomorrow!

Imperfect Shadow Maps

December 19th, 2008

“Imperfect Shadow Maps for Efficient Computation of Indirect Illumination” by Ritschel et al., a real-time indirect lighting can be summarized as follows: it solves the visibility problem present in the paper “Splatting Indirect Illumination” by Dachsbacher and Stamminger.

The splatting indirect illumination method works by rendering what the authors call a reflective shadow map. A RSM is a collection of images that capture information of surfaces visible from a light source. The RSM is then sampled to choose surfaces that will be used as Virtual Point Lights. Indirect lighting is then calculated as the sum of the direct lighting contribution of these VPLs. The idea of approximating radiosity with point lights was first described in the paper Instant Radiosity. In order to light the scene with each VPL, the method performs deferred shading by rendering some proxy geometry that bounds the influence of the light and effectively splats the illumination from that (indirect) light onto the scene.

The problem with this method is that the illumination is splatted onto the scene without any information about the visibility of that VPL. The surface being splatted upon could be completely obscured by an occluder, but would receive the full amount of bounced lighting. What you would really need here is a shadow map rendered for each VPL. But in order to get good indirect illumination you need hundreds or thousands of VPLs, which requires hundreds or thousands of shadow maps. Let’s face it, that ain’t happenin’ in real-time. First of all, you’d have to render your scene X number of times, which means you’d have to limit the complexity of your scene or use some kind of adaptive technique like progressive meshes. But on top of that you’d have X number of draw calls, which have their own amount of overhead.

So what Imperfect Shadow Maps does is figure out a way to render hundreds or thousands of shadow maps in one draw call and with dramatically reduced amounts of geometry.

The paper achieves this by rendering 1024 paraboloid shadow maps of a sparse point representation of the scene. During preprocessing, many points are distributed uniformly across the scene. Then, n sets of ~8k points are constructed, where n is the number of VPLs the algorithm will use at run-time. The number 8k is not mentioned in the paper but the author stated this number in his SIGGRAPH Asia presentation. The points in these sets are chosen randomly. At run-time, each of the n sets of points are rendered to its respective paraboloid depth map.

Ok, you’re rendering a bunch of sparse points to a low-res (128×128 or less) shadow map. As you may suspect, it’s going to look like garbage:

ism

It’s a Cornell box, can’t you tell?

The authors get clever here and use pull-push upsampling to fill holes between the points, being smart and using some thresholding to make sure they dont fill holes around depth discontinuties. Anyway, after the holes are filled the shadow maps still kind of look bad:

ism-filled

But it doesn’t matter so much because the indirect illumination is smooth and you’re going to adding the contribution of hundreds of these things at each pixel, so the incorrect visibility of each individual VPL gets smoothed out in the end.

That’s the basic idea.

The authors present some other cool things in the paper, like how to adaptively choose VPLs from the RSMs, and they also use the trick from “Non-interleaved Deferred Shading of Interleaved Sample Patterns” (talked about here) and only process a subset of the VPLs at each pixel.

Also, there is a paper that just got accepted to I3D called “Multiresolution Splatting for Indirect Illumination” by Nichols and Wyman that is a perfect fit for this paper. I’ll probably post a bit about that tomorrow.

Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

Tobias Ritschel, Thorsten Grosch, Min H. Kim, Hans-Peter Seidel, Carsten Dachsbacher, Jan Kautz ACM Trans. on Graphics (Proceedings SIGGRAPH Asia 2008), 27(5), 2008.

Splatting indirect illumination

Dachsbacher, C. and Stamminger, M. 2006. In Proceedings of the 2006 Symposium on interactive 3D Graphics and Games (Redwood City, California, March 14 – 17, 2006). I3D ’06. ACM, New York, NY, 93-100.

Let’s Have a Min/Max Party

December 11th, 2008

Today I was waiting for a session to begin at SIGGRAPH ASIA and began to think about how there are several cool papers that exploit min/max  images. A min/max image is an image pyramid that is sort of like a quadtree. The bottom level of the hierarchy is the original image while the elements in each subsequent level of the hierarchy contain the minimum and maximum of four elements in the previous level. So it’s sort of like a mip map, but instead of averaging values, you store the min and max of the previous level. This min/max hierarchy can be generated quickly in log n passes but can be used for making conservative estimations for large regions of your image. Refer to the following papers:

Maximum Mipmaps for Fast, Accurate, and Scalable Dynamic Height Field Rendering by A. Tevs, I. Ihrke, H.-P. Seidel

- Uses min/max maps to ray trace height fields. I feel like this idea has been around for ages but here it is all packaged up with a neat little bow.

Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images by Nathan Carr, Jared Hoberock, Keenan Crane, John C. Hart.

- Uses min/max hierarchies of Geometry Images to accelerate the ray tracing of meshes.

Real-time Soft Shadow Mapping by Backprojection by Gaël Guennebaud, Loïc Barthe, Mathias Paulin
High-Quality Adaptive Soft Shadow Mapping by Gaël Guennebaud, Loïc Barthe, Mathias Paulin

-  I’ve ranted about these papers before. These works generate min/max hierarchies of shadow camera depth images to perform efficient blocker searches for soft shadow rendering, and also to determine penumbra regions for further optimization.

March of the Froblins SIGGRAPH course notes by Jeremy Shopf, Joshua Barczak, Christopher Oat, Natalya Tatarchuk

- Used a min/max hierarchy of the depth buffer to occlusion cull agents in our crowd simulation. Technically this only used the max portion of the hierarchy, but I didn’t want to title this Let’s have a Min/Max Party (Min is Optional).

Anyway, I think it’s kind of neat. I’m going to make another post tomorrow night about an awesome paper that’s here at the conference but I don’t want to write about it until I have a chance to clear up some nebulous parts of the paper with the author.

In other news, I received official word that the GDC lecture I proposed was accepted so I guess I will be seeing some of you in San Francisco next year in March. I’m excited about this talk because it came directly out of a post on this blog. Turns out this isn’t a waste of time after all!

Pixel-Correct Shadow Maps with Temporal Reprojection …

November 6th, 2008

.. and Shadow Test Confidence!

Image from the paper depicting convergence on pixel-correct shadows

Image from the paper depicting convergence on pixel-correct shadows

This paper is in the running for both the longest graphics paper title and neatest lil’ shadow rendering method in recent memory. It’s pretty cool in it’s own right but I’ve had a special affection for papers that exploit temporal coherence as of late. I’d like to implement this in the near future (wouldn’t think it would take much longer than writing two blog posts.. why do I do this again?!) to see how usable it is in practice.

So.. the basic idea here is that each rasterization of the scene used for a shadow map provides a limited amount of information about occluders from the view point of the light (due to its discrete nature). This is the source of spatial aliasing (blockiness). However, rasterizations of the scene over several frames provides much more information. This paper exploits this fact to generate pixel-correct shadows by “honing in” on the correct answer over many frames and relying on the human eye’s inability to adapt quickly to notice this adaptive process. This method is lightweight, simple and should fit right into an existing rendering pipeline.

Here are the salient points of the paper:

- The method uses a screen-space history buffer that maintains information about per-pixel visibility over the past few frames. This is similar to the reprojection cache (another awesome temporally-exploitive paper, links below).

- Algorithm consists of four steps:

1) Calculate current frame’s per-pixel visibility using traditional shadow mapping.

2) Transform each pixel to the history buffer by transforming the pixel’s position using the transformation matrices of the camera in the current and previous frame.

3) Update the history buffer using this frame’s visibility test results.

4) Shadow scene with updated history buffer.

- The history buffer is updated with exponential smoothing according to some confidence value that describes how close the sample is to a correct visibility result.

- The confidence value is calculated as the distance between a pixel’s position in shadow map space and the closest shadow map texel center. This makes sense.. a scene position that maps exactly onto a shadow map texel has a correct visibility test result.

- The shadow map must contain different rasterizations of the scene over time or no new information is added to the system. This is achieved by sub-pixel jittering in both translation and rotation of the shadow camera.


READ!

Daniel Scherzer, Stefan Jeschke, Michael Wimmer. Pixel-Correct Shadow Maps with Temporal Reprojection and Shadow Test Confidence. In Rendering Techniques 2007 (Proceedings Eurographics Symposium on Rendering).

P. Sitthi-amorn, J. Lawrence, L. Yang, P. V. Sander, D. Nehab. An Improved Shading Cache for Modern GPUs. ACM SIGGRAPH Symposium on Graphics Hardware 2008.

D. Nehab, P. V. Sander, J. Lawrence, N. Tatarchuk, J. Isidoro. Accelerating Real-Time Shading with Reverse Reprojection Caching. ACM SIGGRAPH Symposium on Graphics Hardware 2007.

What’s going on here?!

October 17th, 2008

I haven’t been posting much lately. What I have been posting has been other people’s stuff rather than any original thoughts. I’ve been pursuing several different topics but I can’t post about them because they may end up being used at work. Anyway, if you’re looking for lots of interesting graphics links, you should start/keep reading the Real-time Rendering blog. You should read the Real-time Rendering book too (though I haven’t got a copy myself, yet). From here on out I think I’m just going to stick to posting things that I am doing myself and content related to it.

Screenshot of a real-time technique described in an article I wrote to be published in ShaderX7

Edit: I should add that the title “What’s going on here?!” is referring to what’s been going on with the blog. It’s not a request for people to guess what is going on in the picture (but you can do that too).