Archive for the ‘Papers’ Category

Odds and Ends

Wednesday, June 2nd, 2010

As of last month, I’ve been at Firaxis Games for a year. That time just flew right past. The transition to game development has been challenging and rewarding (and frustrating). Every aspect of creating “pretty pictures” becomes much more difficult when working on a game. I imagine more than a few of you reading this to be internally commenting “No Shit, Sherlock”, but it cracks me up when I think about some of the things that I always took for granted when making demos at ATI/AMD. Things like shadowing, translucency, etc. are easily handled when you are making one-off demo apps over the course of a few weeks/months. Usually those apps have controlled camera angles, small scenes, a very specific technical goal and the privilege  of supporting a limited range of hardware. It’s certainly changed my perspective on what is impressive as far as real-time graphics goes.

I’ve joined a few friends and former schoolmates at Firaxis. Actually, my graduate school adviser, Marc Olano, has been here for the last year on sabbatical. Some of his work on filtering specular highlights from normal maps for ocean rendering in Civilization V was published at I3D 2010 under the title LEAN Mapping. Speaking of, he’s been posting a number of graphics tricks over at the UMBC Games and Interactive Media blog. Trick 1 reformulates the computation of a normal from a heightfield and Trick 2 deals with calculating the size of a mip chain via closed form equations. He’ll be posting a few more in the future, so add the RSS feed and check back in a week or so.

Many of the blogs I used to read on the regular have either gone silent or their authors have primarily moved to Twitter. I read a few people’s tweets but mostly I just haven’t made (or plan to make) that transition. I find it hard enough to post anything here without over-analyzing whether or not its even worth reading, let alone posting 8 times a day. And frankly as a reader I find the signal:noise ratio difficult to manage. A few of the good old fashioned web blogs I’ve been reading lately are Miles Macklin’s blog, Game Angst by Adrian Stone,  and Smash’s direct to video. On the less graphics focused front, The Witness blog starring Jon Blow and Ignacio Castaño, the Wolfire Games blog and Charles Bloom’s cbloom rants (as always). As a total hypocrite, I would like to call-out and encourage Brian Karis, Tim Farrar and Christer Ericsson to update their damn blogs.

Anyway, I’ve been reading more papers lately so maybe I’ll have some impetus to update this dumb blog. And if you still have my RSS feed in your reader, thanks!

As a side note, Civilization V was announced several weeks ago and has been available for pre-order on Steam. Go team!

Ambient Occlusion by the Bucket-Load

Thursday, February 4th, 2010

Volumetric Obscurance - Bradford Loos, Peter-Pike Sloan. I3D 2010

This paper presents two techniques, a simple line sampling technique and a more involved area sampling technique.

Line integral technique – Depth values are sampled in a uniform disc around each screen-space pixel.  This disc has a radius that is constant in object space. For each sample, a line integral of the occupancy for that sample is computed. In plain speak, this is the ratio of stuff to non-stuff between the view-ray intersection of the unit sphere around the pixel and the aforementioned disc (see illustration). Therefore depth is considered analytically in the calculation of AO and as a result the computed AO should be less noisy during camera movement. The authors also posit that suitable results can be achieved with less samples because the samples are generated in screen-space vs. in object space, where object space sampling can result in multiple samples being close to each other in screen-space (which is where depth is ultimately sampled). This results in a more accurate consideration of nearby occluding surfaces.

Area integral technique- Very similar in spirit to Angelo Pesce’s “Variance Methods for Screen-Space Ambient Occlusion” in ShaderX7. A mip-map of depth average and variance is computed for the depth buffer. This
statistical information is then used to compute an AO value without undersampling artifacts. The authors state that only one sample can be considered before the performance becomes prohibitive. I’m probably misunderstanding their technique because it seems to me that most of the overhead is in generating the hierarchical statistical information about the depth buffer. In the case of one sample, you figure out the screen-space area of your sampling disc and sample the mip-map level in which a texel covers about the same amount of screen-space. If you decide to use five samples instead of one, each sample would represent about 1/5th the amount of screen-space as a single sample, so you would in turn sample the mip-map at the level in which a texel is about 1/5th the screen-space area of the sampling disc.I’ll have to reread this section later.
The authors also present a thickness model that is supposed to treat surfaces in the depth buffer as if they aren’t part of a relief image, i.e. each surface has a certain thickness, so nearby pixels shouldn’t be occluded by them if they are a certain z-distance away. I’m pretty sure that just about every SSAO technique already does this. It’s usually covered by a world-space distance falloff or depth thresholding.

The paper also discusses computing AO at different frequencies and combining them, which I believe is what is done in the article “Multi-layered, Dual Resolution Ambient Occlusion” by the guys at NVIDIA.

Volumetric Ambient Occlusion – László Szirmay-Kalos, Tamás Umenhoffer, Balázs Tóth, László Szécsi, and Mateu Sbert. IEEE Computer Graphics and Applications. 2009.

Though this paper is a little dense, it boils down to a few simple ideas. The most important is a method for generating occlusion test samples that are above the tangent plane, without actually having to do tangent plane calculations (ala Horizon-Based Ambient Occlusion). This is done by sampling within a sphere of radius R/2, with its center R/2 units away from the shaded point along the normal. This  is the largest sphere above the tangent plane and also contained within the original sampling sphere (of radius R). Occlusion is calculated via an integral similar to that used in the Volumetric Obscurance paper discussed above.

Ambient Occlusion Volumes – Morgan McGuire. Technical Report. 2009.

This tech report is a much more fully developed implementation of the ideas I discussed in the article “Deferred Occlusion by Analytic Surfaces” in ShaderX7. The occlusion due to a triangle is computed analytically and splatted onto the scene in a deferred manner. The same equation was used for computing form-factors in Baum’s 1989 radiosity paper and was previously used for AO by Hoberock and Jia in the GPU Gems 3 article “High-Quality Ambient Occlusion”. A 3D bounding volume of the influence of each primitive is generated via the Geometry Shader to compute the occlusion value to be splatted. The reported framerates are surprisingly good, though I believe it is mentioned that AO is computed at a drastically reduced resolution. At least I think that is what is being referred to by “15×15 subsampling”. There is probably a lot of room for performance improvements to this algorithm. The author states that all AO Volumes are precomputed (dynamically outputting geometry from the GS adds 25% more render time). I don’t think is an unreasonable assumption.. most scene geometry in games is static. But considering that all bounding geometry is essentially a cube, an instanced stream of cubes could be rendered, where each cube fetches the primitive’s vertices it is bounding and computes its own vertex positions in the vertex shader. This would eliminate the use of the GS. Also, doing an early out in the pixel shader based on depth would help with fill in situations where the volume is covering parts of the screen that are in the distance and couldn’t possibly be occluded.

Hybrid Ambient Occlusion – Christoph K. Reinbothe, Tamy Boubekeur and Marc Alexa. Eurographics 2009.

Computes ambient occlusion by raycasting a voxelized representation of the scene per-pixel and filtering the results. This uses everyone’s (especially mine) favorite scene voxelization technique. Didn’t read this one too in-depth, but it seems to be worth a closer look.

Hierarchical Image-Space Radiosity for Interactive Global Illumination

Wednesday, June 3rd, 2009
Image from the paper

Chris Wyman has posted a link to the preprint of the EGSR 2009 paper with Greg Nichols and myself. The paper builds on the multiresolution technique described in the I3D 2009 paper “Multiresolution Splatting for Indirect Illumination” by pairing it with two virtual point light clustering techniques. The first technique clusters VPLs that sample the same surface. This VPL similarity is determined using the same multiresolution discontinuity detection method from the last paper. The other clusters lights hierarchically and performs a traversal based on VPL contribution to the final image. Clustering VPLs allows two advantages: Performance scales with light-space scene complexity instead of the number of initial VPLs and flickering in indirect illumination due to temporal incoherence in reflective shadow map sampling is reduced.

The paper also offers a stencil buffer based implementation of the multiresolution illumination calculation technique detailed in the previous paper that offers dramatic performance improvement over the original geometry shader implementation.

Greg Nichols, Jeremy Shopf, and Chris Wyman. “Hierarchical Image-Space Radiosity for Interactive Global Illumination.” Accepted to the Eurographics Symposium on Rendering. (June 2009)

Paper link

Related blog posts: Imperfect Shadow MapsI3D 2009 Day 1 - Mixed Resolution Rendering

I3D 2009 : Day 1

Friday, February 27th, 2009

Today was the first day of I3D 2009. It’s in Boston this year so I don’t have to travel far to attend. I3D is my favorite conference because of its small size and real-time focus. It was great to see many familiar faces and meet some new ones.

The conference starts early tomorrow so I’ll just post a few highlights. These are my high level understandings of the papers, I may be missing the point entirely :)

GigaVoxels: Ray-Guided Streaming for Efficient and Detailed Voxel Rendering

Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre, Elmar Eisemann

The GigaVoxels paper is a sparse voxel rendering technique that improves on some of the limitations of previous work in this area. For example, instead of raycasting the voxels directly (like in the Olick demo that was shown at SIGGRAPH08), 3D volumes are stored at nodes. This enables tri-linear filtering, fixing some of the aliasing previously present in rendering. They also have improvements in storage representation and memory layout (over Gobbetti et al’s work). Definitely want to give this one more of a read later on…

paper link

Multiresolution Splatting for Indirect Illumination

Greg Nichols, Chris Wyman

I’ve been wanting to write something about this paper for awhile. I’ll probably write more later because I have an implementation of this working and have quite a few suggestions for anyone else who is interested in implementing it themselves.

Anyway, this method allows splatting techniques such as Splatting Indirection Illumination by Dachsbacher and Stamminger to operate at multiple resolutions, saving on fill-rate. The intuition here is that for diffuse indirect illumination, high frequency changes in illumination (in image-space) occur near geometric discontinuities. By generating min-max maps of depth and normals, discontinuities can be detected at any resolution, and illumination can be splatted at the lowest resolution possible. A semi-intelligent upsampling technique is used to combine these multi-res illumination images smoothly. I think this upsampling technique could be used in other algorithms with some tweaking. More on this later.

paper link

Hair Self Shadowing and Transparency Depth Ordering Using Occupancy Maps

Erik Sintorn, Ulf Assarsson

Another great hair rendering paper from Sintorn and Assarsson. This one introduces Occupancy Maps, which is basically fast voxelization used in an intelligent manner for hair rendering. The occupancy map, in conjunction with something called a slab map, allows each rendered hair to figure out how many hairs are between it and the camera. If you assume that every hair has the same opacity, you can calculate the alpha value of the nth hair in pixel based on the n-1 hairs in front of it, allowing them to be correctly composited for viewing. On the shadowing side, the same sort of operation can be performed from the light view to figure out a volumetric shadow value for every hair fragment.

paper link

Approximating Dynamic Global Illumination in Image Space

Tobias Ritschel, Thorsten Grosch, Hans-Peter Seidel

I have to be honest that when I first read this paper, I kind of didn’t give it a chance. At first glance, it appeared to just be an obvious extension of SSAO. And judging from some posts I’ve seen on message boards, I’m not the only one who did this. But the fact is that image-space indirect illumination is just a small part of this paper.

In my opinion, the best part of this paper is the directional occlusion information that you get per-pixel. This allows you to incorporate directional occlusion into environment lighting, which looks really nice. I actually had a similar idea, but in a much more expensive approach. When the NVIDIA guys started talking about Horizon-Based AO, it seemed really obvious to me that if you’re going to generate horizon information for AO, you might as well use that information to figure out the region of the hemisphere that is visible. So if you know the size of the “aperture” above a point, you can choose from a few different pre-convolved environment maps to get illumination from the right portion of the hemisphere. Also, calculating a bent normal from the same horizon info allows you to adjust where you sample that convolved environment map. But the HBAO technique is a bit expensive (computing a tangent frame, etc), while the technique in this paper is a little lighter weight, but probably more prone to artifacts. Definitely want to play around with this stuff.

paper link

I look forward to meeting more people tomorrow!

Imperfect Shadow Maps

Friday, December 19th, 2008

“Imperfect Shadow Maps for Efficient Computation of Indirect Illumination” by Ritschel et al., a real-time indirect lighting can be summarized as follows: it solves the visibility problem present in the paper “Splatting Indirect Illumination” by Dachsbacher and Stamminger.

The splatting indirect illumination method works by rendering what the authors call a reflective shadow map. A RSM is a collection of images that capture information of surfaces visible from a light source. The RSM is then sampled to choose surfaces that will be used as Virtual Point Lights. Indirect lighting is then calculated as the sum of the direct lighting contribution of these VPLs. The idea of approximating radiosity with point lights was first described in the paper Instant Radiosity. In order to light the scene with each VPL, the method performs deferred shading by rendering some proxy geometry that bounds the influence of the light and effectively splats the illumination from that (indirect) light onto the scene.

The problem with this method is that the illumination is splatted onto the scene without any information about the visibility of that VPL. The surface being splatted upon could be completely obscured by an occluder, but would receive the full amount of bounced lighting. What you would really need here is a shadow map rendered for each VPL. But in order to get good indirect illumination you need hundreds or thousands of VPLs, which requires hundreds or thousands of shadow maps. Let’s face it, that ain’t happenin’ in real-time. First of all, you’d have to render your scene X number of times, which means you’d have to limit the complexity of your scene or use some kind of adaptive technique like progressive meshes. But on top of that you’d have X number of draw calls, which have their own amount of overhead.

So what Imperfect Shadow Maps does is figure out a way to render hundreds or thousands of shadow maps in one draw call and with dramatically reduced amounts of geometry.

The paper achieves this by rendering 1024 paraboloid shadow maps of a sparse point representation of the scene. During preprocessing, many points are distributed uniformly across the scene. Then, n sets of ~8k points are constructed, where n is the number of VPLs the algorithm will use at run-time. The number 8k is not mentioned in the paper but the author stated this number in his SIGGRAPH Asia presentation. The points in these sets are chosen randomly. At run-time, each of the n sets of points are rendered to its respective paraboloid depth map.

Ok, you’re rendering a bunch of sparse points to a low-res (128×128 or less) shadow map. As you may suspect, it’s going to look like garbage:

ism

It’s a Cornell box, can’t you tell?

The authors get clever here and use pull-push upsampling to fill holes between the points, being smart and using some thresholding to make sure they dont fill holes around depth discontinuties. Anyway, after the holes are filled the shadow maps still kind of look bad:

ism-filled

But it doesn’t matter so much because the indirect illumination is smooth and you’re going to adding the contribution of hundreds of these things at each pixel, so the incorrect visibility of each individual VPL gets smoothed out in the end.

That’s the basic idea.

The authors present some other cool things in the paper, like how to adaptively choose VPLs from the RSMs, and they also use the trick from “Non-interleaved Deferred Shading of Interleaved Sample Patterns” (talked about here) and only process a subset of the VPLs at each pixel.

Also, there is a paper that just got accepted to I3D called “Multiresolution Splatting for Indirect Illumination” by Nichols and Wyman that is a perfect fit for this paper. I’ll probably post a bit about that tomorrow.

Imperfect Shadow Maps for Efficient Computation of Indirect Illumination

Tobias Ritschel, Thorsten Grosch, Min H. Kim, Hans-Peter Seidel, Carsten Dachsbacher, Jan Kautz ACM Trans. on Graphics (Proceedings SIGGRAPH Asia 2008), 27(5), 2008.

Splatting indirect illumination

Dachsbacher, C. and Stamminger, M. 2006. In Proceedings of the 2006 Symposium on interactive 3D Graphics and Games (Redwood City, California, March 14 – 17, 2006). I3D ’06. ACM, New York, NY, 93-100.

Let’s Have a Min/Max Party

Thursday, December 11th, 2008

Today I was waiting for a session to begin at SIGGRAPH ASIA and began to think about how there are several cool papers that exploit min/max  images. A min/max image is an image pyramid that is sort of like a quadtree. The bottom level of the hierarchy is the original image while the elements in each subsequent level of the hierarchy contain the minimum and maximum of four elements in the previous level. So it’s sort of like a mip map, but instead of averaging values, you store the min and max of the previous level. This min/max hierarchy can be generated quickly in log n passes but can be used for making conservative estimations for large regions of your image. Refer to the following papers:

Maximum Mipmaps for Fast, Accurate, and Scalable Dynamic Height Field Rendering by A. Tevs, I. Ihrke, H.-P. Seidel

- Uses min/max maps to ray trace height fields. I feel like this idea has been around for ages but here it is all packaged up with a neat little bow.

Fast GPU Ray Tracing of Dynamic Meshes using Geometry Images by Nathan Carr, Jared Hoberock, Keenan Crane, John C. Hart.

- Uses min/max hierarchies of Geometry Images to accelerate the ray tracing of meshes.

Real-time Soft Shadow Mapping by Backprojection by Gaël Guennebaud, Loïc Barthe, Mathias Paulin
High-Quality Adaptive Soft Shadow Mapping by Gaël Guennebaud, Loïc Barthe, Mathias Paulin

-  I’ve ranted about these papers before. These works generate min/max hierarchies of shadow camera depth images to perform efficient blocker searches for soft shadow rendering, and also to determine penumbra regions for further optimization.

March of the Froblins SIGGRAPH course notes by Jeremy Shopf, Joshua Barczak, Christopher Oat, Natalya Tatarchuk

- Used a min/max hierarchy of the depth buffer to occlusion cull agents in our crowd simulation. Technically this only used the max portion of the hierarchy, but I didn’t want to title this Let’s have a Min/Max Party (Min is Optional).

Anyway, I think it’s kind of neat. I’m going to make another post tomorrow night about an awesome paper that’s here at the conference but I don’t want to write about it until I have a chance to clear up some nebulous parts of the paper with the author.

In other news, I received official word that the GDC lecture I proposed was accepted so I guess I will be seeing some of you in San Francisco next year in March. I’m excited about this talk because it came directly out of a post on this blog. Turns out this isn’t a waste of time after all!

Pixel-Correct Shadow Maps with Temporal Reprojection …

Thursday, November 6th, 2008

.. and Shadow Test Confidence!

Image from the paper depicting convergence on pixel-correct shadows

Image from the paper depicting convergence on pixel-correct shadows

This paper is in the running for both the longest graphics paper title and neatest lil’ shadow rendering method in recent memory. It’s pretty cool in it’s own right but I’ve had a special affection for papers that exploit temporal coherence as of late. I’d like to implement this in the near future (wouldn’t think it would take much longer than writing two blog posts.. why do I do this again?!) to see how usable it is in practice.

So.. the basic idea here is that each rasterization of the scene used for a shadow map provides a limited amount of information about occluders from the view point of the light (due to its discrete nature). This is the source of spatial aliasing (blockiness). However, rasterizations of the scene over several frames provides much more information. This paper exploits this fact to generate pixel-correct shadows by “honing in” on the correct answer over many frames and relying on the human eye’s inability to adapt quickly to notice this adaptive process. This method is lightweight, simple and should fit right into an existing rendering pipeline.

Here are the salient points of the paper:

- The method uses a screen-space history buffer that maintains information about per-pixel visibility over the past few frames. This is similar to the reprojection cache (another awesome temporally-exploitive paper, links below).

- Algorithm consists of four steps:

1) Calculate current frame’s per-pixel visibility using traditional shadow mapping.

2) Transform each pixel to the history buffer by transforming the pixel’s position using the transformation matrices of the camera in the current and previous frame.

3) Update the history buffer using this frame’s visibility test results.

4) Shadow scene with updated history buffer.

- The history buffer is updated with exponential smoothing according to some confidence value that describes how close the sample is to a correct visibility result.

- The confidence value is calculated as the distance between a pixel’s position in shadow map space and the closest shadow map texel center. This makes sense.. a scene position that maps exactly onto a shadow map texel has a correct visibility test result.

- The shadow map must contain different rasterizations of the scene over time or no new information is added to the system. This is achieved by sub-pixel jittering in both translation and rotation of the shadow camera.


READ!

Daniel Scherzer, Stefan Jeschke, Michael Wimmer. Pixel-Correct Shadow Maps with Temporal Reprojection and Shadow Test Confidence. In Rendering Techniques 2007 (Proceedings Eurographics Symposium on Rendering).

P. Sitthi-amorn, J. Lawrence, L. Yang, P. V. Sander, D. Nehab. An Improved Shading Cache for Modern GPUs. ACM SIGGRAPH Symposium on Graphics Hardware 2008.

D. Nehab, P. V. Sander, J. Lawrence, N. Tatarchuk, J. Isidoro. Accelerating Real-Time Shading with Reverse Reprojection Caching. ACM SIGGRAPH Symposium on Graphics Hardware 2007.

Larrabee paper and articles

Monday, August 4th, 2008

Amidst a flurry of articles from technical websites, Intel also released the paper (non-ACM link) on the Larrabee architecture that will be presented at SIGGRAPH next week.

Articles discussing some details that were released in a presentation by Larry Siler:

http://www.pcper.com/article.php?aid=602

http://techgage.com/article/intel_opens_up_about_larrabee

http://www.hexus.net/content/item.php?item=14757

http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367

Geometry-Aware Framebuffer Level of Detail

Sunday, July 6th, 2008
Image from \
Comparison image from “Geometry-Aware Framebuffer Level of Detail”

Here’s a nice compact little paper from GH 2008 on scaling down shading computation in a rendered scene.

1) Render G-buffer (normals, depth) at a lower resolution (full res * some resizing factor r )

2) Render scene, using bilateral filter (orig paper, SIGGRAPH course, bilateral upsampling ) to upsample some or all of the shading components in a discontinuity respecting manner. Expensive, lower frequency computations should be performed at the lower resolution while inexpensive high-frequency shading (such as specular) should be done at normal resolution.

3) Adjust r based on current framerate vs. some baseline desired framerate to maintain a more constant framerate.

This method is limited by the fact that you have to render the scene geometry twice and highly detailed geometry may introduce artifacts when upsampling, but it is interesting and definitely worth a quick read.

Geometry-Aware Framebuffer Level of Detail
Lei Yang, Pedro V. Sander, Jason Lawrence
Eurographics Symposium on Rendering 2008

Non-interleaved Deferred Shading of Interleaved Sample Patterns

Wednesday, June 25th, 2008

Continuing with the theme of my last substantive post (it’s been awhile, but it was Keepin’ It Low-Res), here’s a paper from Graphics Hardware 2006 that deals with computing lighting at a resolution lower than the screen resolution. The exact application is in this case is Instant Radiosity style global illumination. As a refresh, instant radiosity approximates bounce lighting by tracing rays from the light sources and placing point lights where the ray intersects the scene to approximate reflected light. The radiance equation can then be approximated with a Monte Carlo integration of radiance computed by evaluating a set of point lights.

With a deferred rendering pipeline, you can reduce some of the shader bottlenecks you’d encounter evaluating hundreds of these placed point lights and shadow map comparisons by rendering proxies for each light source so that shading is computed only at relevant pixels. However, it would be even better if only had to evaluate a subset of those lights per-pixel. Obviously, that is the goal of this paper. The devil is in the details though, as performing incoherent lighting calculations (as you may do in a naive implementation) doesn’t jive so well on graphics hardware where coherency is key.

Let’s first talk about the naive implementation mentioned in the last paragraph. How might you first go about selectively shading different pixels with a different set of lights? You could start out by using dynamic flow control. Something like:

if ScreenSpacePos.x % 2 == 0 && ScreenSpacePos.y % 2 == 0

    for( int i = 0; i < nLightsSet1; i++ )
    {
        // Compute shading for light i, in set 1
    }
else if ScreenSpacePos.x % 2 == 1 && ScreenSpacePos.y % 2 == 0
{
    for( int i = 0; i < nLightsSet2; i++ )
    {
       // Compute shading for light i, in set 2
    }
}
else if ... // The other two cases

This is obviously bad for coherency. Within a group of pixels assigned to one SIMD, you are guaranteed to follow all four paths and essentially all four paths will be executed for each pixel in that SIMD group. So you will be doing 4x more work per-pixel than if all pixels in that group were from VPL set 1. You could similarly try and use a stencil buffer to mask all pixels for a given light set, but given the alternating light set pattern the stencil mask is incoherent and you won’t get any of the benefits of Hi-Stencil. The same thing goes for using depth to mask pixels (Hi-Z is trashed). I’m pretty sure in the last few generations of hardware that there isn’t a performance difference between using stencil or depth to cull computation, FYI.

I think you’re getting the idea. All of the GBuffer texels for a given light set should be computed coherently, so ideally all of the aforementioned GBuffer texels should be organized next to each other in the GBuffer. As an aside, it is important to understand that textures are block allocated on the GPU. They are allocated in 2D blocks because texture fetches are spatially coherent in 2D. When you fetch one texel, you’re typically going to fetch another one in the 2D neighborhood of the last fetch. Therefore when I say that GBuffer texels for a given light set should be next to each other in the Gbuffer I mean in blocks as depicted in Figure 1:

Figure 1 Left: Interleaved texels Center: Correctly coalesced texels Right: Non-cache friendly coalescence

In order to reorganize the GBuffer texels in such a fashion, the authors first suggest a one-pass approach. Say we have four VPL light sets. While we want them uniformly distributed across the viewport in a regular fashion so that we can interpolate the shaded values at the end, we want them organized in blocks to be coherent when shading. Texel (x,y) in the shuffled GBuffer should be mapped to texel ( (x % 2)*2 + x/2, (y%2)*2 + y/2) in the initial GBuffer. This will work fine, but the fetching of the initial GBuffer texels is incoherent (word count: 23) because they’re being fetched from all over the GBuffer instead of in a smaller neighborhood.

The authors suggest a two-pass approach instead. Rather than jumping straight to the shuffling described in the last paragraph, the shuffling is performed in cache-friendly blocks.

The shading is then performed on the reorganized GBuffer and then a de-swizzle or “gathering” step is performed to undo the mapping that was performed in previous steps. When the shaded pixels are de-swizzled, each pixel only has the shading for one subset of the overall set of lights in the scene. The authors discuss some clever little tricks to quickly filter these shading values in a discontinuity-respecting manner so that the computed irradiance is smooth across the image. I’ll leave that for you to parse from the paper.

Non-interleaved Deferred Shading of Interleaved Sample Patterns

Benjamin Segovia, Jean-Claude Iehl , Richard Mitanchey and Bernard Péroche Proceedings of SIGGRAPH/Eurographics Workshop on Graphics Hardware 2006