原文参见 Reverse engineering the rendering of The Witcher 3: Index
这是第二篇,翻译原文13-15节。由于文章太长,而且废话较多,这里先做个简单的摘要吧
摘要
13介绍巫师嗅觉的效果实现,其中,
13.a-介绍如何通过stencil 操作获得蒙版
13.b-介绍了如何做描边
13.c-把所有结合起来,还加了暗角,拖影,鱼眼等效果
14-介绍了云的渲染,主要用层积云作为案例。云是用贴图做的光照计算
15-介绍了雾的计算,这里雾不是完全的体积雾,但是做了raymarching计算,受光照和高度影响
13.a “巫师嗅觉”
目前为止,本系列中解释的几乎所有效果/技术都与巫师3无关。你几乎可以在每一款现代电子游戏中找到tonemapping, 暗角或计算平均亮度之类的东西。即使是醉酒效应也相当普遍。
所以我决定仔细研究一下“巫师嗅觉”的渲染机制。因为杰拉特是个巫师,他的感官比普通人敏感得多。因此,他比其他人看得多,听得多,这对他解决调查问题有很大帮助。巫师感官机制允许玩家可视化这些痕迹。
下面是这种效果的演示:
如你所见,有两种类型的物体:Geralt可以与之交互的(黄色轮廓)和与调查相关的痕迹(红色轮廓)。一旦Geralt调查红色痕迹,它可以变成黄色。请注意,整个屏幕变得更灰色,并应用鱼眼效果
效果相当复杂,所以我决定把它分成三篇博文。
在第一个我将描述如何选择对象,在第二个描述如何生成描边,第三个里结合在一起。
选择对象
正如我所提到的,有两种类型的对象,所以我们要区分它们。在巫师3,这是通过使用模板缓冲(stencil buffer)。渲染gbuffer时,要标记为“痕迹”(红色)的模型将使用Stencil=8进行渲染。用黄色标记为“感兴趣”的模型将使用Stencil=4进行渲染。
例如,以下两个贴图显示了“巫师嗅觉”和相应Stencil Buffer的示例:


Stencil Buffer-简短回顾
Stencil Buffer通常用于通过为某些类别的网格指定相同的ID,来标识所绘制的网格。
思想是在模板测试通过后,使用Always函数和replace运算符,在其他情况下使用keep运算符。
下面是如何用d3d11实现它:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
D3D11_DEPTH_STENCIL_DESC depthstencilState; // Set depth parameters.... // Enable stencil depthstencilState.StencilEnable = TRUE; // Read & write all bits depthstencilState.StencilReadMask = 0xFF; depthstencilState.StencilWriteMask = 0xFF; // Stencil operator for front face depthstencilState.FrontFace.StencilFunc = D3D11_COMPARISON_ALWAYS; depthstencilState.FrontFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP; depthstencilState.FrontFace.StencilFailOp = D3D11_STENCIL_OP_KEEP; depthstencilState.FrontFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE; // Stencil operator for back face. depthstencilState.BackFace.StencilFunc = D3D11_COMPARISON_ALWAYS; depthstencilState.BackFace.StencilDepthFailOp = D3D11_STENCIL_OP_KEEP; depthstencilState.BackFace.StencilFailOp = D3D11_STENCIL_OP_KEEP; depthstencilState.BackFace.StencilPassOp = D3D11_STENCIL_OP_REPLACE; pDevice->CreateDepthStencilState( &depthstencilState, &m_pDS_AssignValue ); |
使用API的StencilRef将stencil写入缓冲
1 2 3 4 |
// 之后stencil是8 pDevCon->OMSetDepthStencilState( m_pDS_AssignValue, 8 ); ... pDevCon->DrawIndexed( ... ); |
渲染的强度
在实现上,用了一个r11g11b10_float全屏纹理,在r通道中保存感兴趣的对象,g通道中保存痕迹。
为什么我们需要它呢?杰拉特的感官半径是有限的,当玩家离得足够近的时候,特定的物体才会有描边
看看表现

首先用黑色清空这个强度贴图
然后两个全屏drawcall:第一个用于“痕迹”,第二个用于感兴趣的对象:

第一个drawcall用于跟踪-绿色通道:

第二个是感兴趣的东西-红色通道:

好吧,但是我们如何区分哪些像素会被考虑呢?我们必须使用stencil buffer!
在每次调用期间,都会执行stencil测试,仅接受之前用“8”或“4”标记的像素。
在这种情况下如何进行测试?关于stencil测试的基本知识,这里有一篇很好的博客
一般模板测试公式如下:
1 2 3 4 |
if (StencilRef & StencilReadMask OP StencilValue & StencilReadMask) accept pixel else discard pixel |
stencilref是通过api调用传递的值,
StencilReadMask是一个用于读取stencil的掩码(请注意,它同时存在于左侧和右侧)。
op是用于比较的运算符,它是通过api设置的,
StencilValue是当前处理像素中stencil buffer区的值。
我们使用二进制AND来计算操作数
了解了基本知识,让我们看看在这些drawcall使用的设置:
痕迹的stencil状态

感兴趣对象的stencil状态

如我们所见,readmask是唯一的区别。我们试试看将这些值替换为stencil测试公式:
1 2 3 4 5 6 7 8 9 10 11 |
Let StencilReadMask = 0x08 and StencilRef = 0: For a pixel with stencil = 8: 0 & 0x08 < 8 & 0x08 0 < 8 TRUE For a pixel with stencil = 4: 0 & 0x08 < 4 & 0x08 0 < 0 FALSE |
很聪明。如你所见,在这种情况下,我们不比较模板值,而是检查是否设置了stencil buffer的特定位。stencil buffer的每个像素都是uint8,所以我们有[0-255]。
附带说明:所有drawIndexed(36)调用都与将足迹渲染为轨迹相关,因此强度贴图在该特定帧中的最终为:

但在stencil 测试之前有一个pixel shader。28738和28748都使用相同的像素着色器:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[2], immediateIndexed dcl_constantbuffer cb3[8], immediateIndexed dcl_constantbuffer cb12[214], immediateIndexed dcl_sampler s15, mode_default dcl_resource_texture2d (float,float,float,float) t15 dcl_input_ps_siv v0.xy, position dcl_output o0.xyzw dcl_output o1.xyzw dcl_output o2.xyzw dcl_output o3.xyzw dcl_temps 2 0: mul r0.xy, v0.xyxx, cb0[1].zwzz 1: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t15.xyzw, s15 2: mul r1.xyzw, v0.yyyy, cb12[211].xyzw 3: mad r1.xyzw, cb12[210].xyzw, v0.xxxx, r1.xyzw 4: mad r0.xyzw, cb12[212].xyzw, r0.xxxx, r1.xyzw 5: add r0.xyzw, r0.xyzw, cb12[213].xyzw 6: div r0.xyz, r0.xyzx, r0.wwww 7: add r0.xyz, r0.xyzx, -cb3[7].xyzx 8: dp3 r0.x, r0.xyzx, r0.xyzx 9: sqrt r0.x, r0.x 10: mul r0.y, r0.x, l(0.120000) 11: log r1.x, abs(cb3[6].y) 12: mul r1.xy, r1.xxxx, l(2.800000, 0.800000, 0.000000, 0.000000) 13: exp r1.xy, r1.xyxx 14: mad r0.zw, r1.xxxy, l(0.000000, 0.000000, 120.000000, 120.000000), l(0.000000, 0.000000, 1.000000, 1.000000) 15: lt r1.x, l(0.030000), cb3[6].y 16: movc r0.xy, r1.xxxx, r0.yzyy, r0.xwxx 17: div r0.x, r0.x, r0.y 18: log r0.x, r0.x 19: mul r0.x, r0.x, l(1.600000) 20: exp r0.x, r0.x 21: add r0.x, -r0.x, l(1.000000) 22: max r0.x, r0.x, l(0) 23: mul o0.xyz, r0.xxxx, cb3[0].xyzx 24: mov o0.w, cb3[0].w 25: mov o1.xyzw, cb3[1].xyzw 26: mov o2.xyzw, cb3[2].xyzw 27: mov o3.xyzw, cb3[3].xyzw 28: ret |
这个pixel shader只写入一个渲染目标,因此第24-27行是多余的。
这里发生的第一件事是取样深度(point clamp sampler),第1行用于重建世界坐标,通过与特殊矩阵相乘,之后通过透视除法(第2-6行)。
有了杰拉特的位置(CB3[7].xyz-请注意这不是摄像机位置!),计算从杰拉特到这个特定点的距离(第7-9行)。
对于该着色器很重要的输入是:
-CB3[0].rgb-输出的颜色。这可以是float3(0,1,0)(踪迹)或float3(1,0,0)(感兴趣的对象)。
-CB3[6].Y-距离比例因子。这直接影响最终输出的半径和强度。
后面我们有一些tricky的公式来计算强度,根据从杰拉特到物体的距离。我猜所有的系数都是通过实验选定的。
最终输出是颜色*强度。
HLSL应该是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
struct FSInput { float4 param0 : SV_Position; }; struct FSOutput { float4 param0 : SV_Target0; float4 param1 : SV_Target1; float4 param2 : SV_Target2; float4 param3 : SV_Target3; }; float3 getWorldPos( float2 screenPos, float depth ) { float4 worldPos = float4(screenPos, depth, 1.0); worldPos = mul( worldPos, screenToWorld ); return worldPos.xyz / worldPos.w; } FSOutput EditedShaderPS(in FSInput IN) { // * Inputs // Directly affects radius of the effect float distanceScaling = cb3_v6.y; // Color of output at the end float3 color = cb3_v0.rgb; // Sample depth float2 uv = IN.param0.xy * cb0_v1.zw; float depth = texture15.Sample( sampler15, uv ).x; // Reconstruct world position float3 worldPos = getWorldPos( IN.param0.xy, depth ); // Calculate distance from Geralt to world position of particular object float dist_geraltToWorld = length( worldPos - cb3_v7.xyz ); // Calculate two squeezing params float t0 = 1.0 + 120*pow( abs(distanceScaling), 2.8 ); float t1 = 1.0 + 120*pow( abs(distanceScaling), 0.8 ); // Determine nominator and denominator float2 params; params = (distanceScaling > 0.03) ? float2(dist_geraltToWorld * 0.12, t0) : float2(dist_geraltToWorld, t1); // Distance Geralt <-> Object float nominator = params.x; // Hiding factor float denominator = params.y; // Raise to power of 1.6 float param = pow( params.x / params.y, 1.6 ); // Calculate final intensity float intensity = max(0.0, 1.0 - param ); // * Final outputs. // * // * This PS outputs only one color, the rest // * is redundant. I just added this to keep 1-1 ratio with // * original assembly. FSOutput OUT = (FSOutput)0; OUT.param0.xyz = color * intensity; // == redundant == OUT.param0.w = cb3_v0.w; OUT.param1 = cb3_v1; OUT.param2 = cb3_v2; OUT.param3 = cb3_v3; // =============== return OUT; } |
13.b “巫师嗅觉”之二
在第一篇文章中,我展示了“强度图”是如何生成的。
我们有一个全分辨率的r11g11b10_float纹理
有了这个,我们可以进入下一个阶段-我称之为“描边图”。

这是一个有点奇怪的512×512 r16g16_float的纹理。重要的是,它是以ping-pong缓冲的方式实现的。就是说,输入前一帧的轮廓图(连同强度图)以在当前帧中生成新的轮廓图。
你可以用很多方法实现乒乓缓冲,但我个人喜欢如下(伪代码):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// Declarations Texture2D m_texOutlineMap[2]; uint m_outlineIndex = 0; // Rendering void Render() { pDevCon->SetInputTexture( m_texOutlineMap[m_outlineIndex] ); pDevCon->SetOutputTexture( m_texOutlineMap[!m_outlineIndex] ); ... pDevCon->Draw(...); // after draw m_outlineIndex = !m_outlineIndex; } |
这种方法,输入总是[m_outlineIndex]而输出总是[!m_outlineIndex],这一般在应用后处理特效方面具有良好的灵活性。
让我们看看pixel shader:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb3[1], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s1, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_input_ps linear v2.xy dcl_output o0.xyzw dcl_temps 4 0: add r0.xyzw, v2.xyxy, v2.xyxy 1: round_ni r1.xy, r0.zwzz 2: frc r0.xyzw, r0.xyzw 3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000) 4: dp2 r1.z, r1.zwzz, r1.zwzz 5: add r1.z, -r1.z, l(1.000000) 6: max r2.w, r1.z, l(0) 7: dp2 r1.z, r1.xyxx, r1.xyxx 8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000) 9: add r1.x, -r1.z, l(1.000000) 10: max r2.x, r1.x, l(0) 11: dp2 r1.x, r3.xyxx, r3.xyxx 12: dp2 r1.y, r3.zwzz, r3.zwzz 13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000) 14: max r2.yz, r1.xxyx, l(0, 0, 0, 0) 15: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r0.zwzz, t1.xyzw, s1 16: dp4 r1.x, r1.xyzw, r2.xyzw 17: add r2.xyzw, r0.zwzw, l(0.003906, 0.000000, -0.003906, 0.000000) 18: add r0.xyzw, r0.xyzw, l(0.000000, 0.003906, 0.000000, -0.003906) 19: sample_indexable(texture2d)(float,float,float,float) r1.yz, r2.xyxx, t1.zxyw, s1 20: sample_indexable(texture2d)(float,float,float,float) r2.xy, r2.zwzz, t1.xyzw, s1 21: add r1.yz, r1.yyzy, -r2.xxyx 22: sample_indexable(texture2d)(float,float,float,float) r0.xy, r0.xyxx, t1.xyzw, s1 23: sample_indexable(texture2d)(float,float,float,float) r0.zw, r0.zwzz, t1.zwxy, s1 24: add r0.xy, -r0.zwzz, r0.xyxx 25: max r0.xy, abs(r0.xyxx), abs(r1.yzyy) 26: min r0.xy, r0.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000) 27: mul r0.xy, r0.xyxx, r1.xxxx 28: sample_indexable(texture2d)(float,float,float,float) r0.zw, v2.xyxx, t0.zwxy, s0 29: mad r0.w, r1.x, l(0.150000), r0.w 30: mad r0.x, r0.x, l(0.350000), r0.w 31: mad r0.x, r0.y, l(0.350000), r0.x 32: mul r0.yw, cb3[0].zzzw, l(0.000000, 300.000000, 0.000000, 300.000000) 33: mad r0.yw, v2.xxxy, l(0.000000, 150.000000, 0.000000, 150.000000), r0.yyyw 34: ftoi r0.yw, r0.yyyw 35: bfrev r0.w, r0.w 36: iadd r0.y, r0.w, r0.y 37: ishr r0.w, r0.y, l(13) 38: xor r0.y, r0.y, r0.w 39: imul null, r0.w, r0.y, r0.y 40: imad r0.w, r0.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 41: imad r0.y, r0.y, r0.w, l(146956042240.000000) 42: and r0.y, r0.y, l(0x7fffffff) 43: itof r0.y, r0.y 44: mad r0.y, r0.y, l(0.000000001), l(0.650000) 45: add_sat r1.xyzw, v2.xyxy, l(0.001953, 0.000000, -0.001953, 0.000000) 46: sample_indexable(texture2d)(float,float,float,float) r0.w, r1.xyxx, t0.yzwx, s0 47: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.zwzz, t0.xyzw, s0 48: add r0.w, r0.w, r1.x 49: add_sat r1.xyzw, v2.xyxy, l(0.000000, 0.001953, 0.000000, -0.001953) 50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t0.xyzw, s0 51: sample_indexable(texture2d)(float,float,float,float) r1.y, r1.zwzz, t0.yxzw, s0 52: add r0.w, r0.w, r1.x 53: add r0.w, r1.y, r0.w 54: mad r0.w, r0.w, l(0.250000), -r0.z 55: mul r0.w, r0.y, r0.w 56: mul r0.y, r0.y, r0.z 57: mad r0.x, r0.w, l(0.900000), r0.x 58: mad r0.y, r0.y, l(-0.240000), r0.x 59: add r0.x, r0.y, r0.z 60: mov_sat r0.z, cb3[0].x 61: log r0.z, r0.z 62: mul r0.z, r0.z, l(100.000000) 63: exp r0.z, r0.z 64: mad r0.z, r0.z, l(0.160000), l(0.700000) 65: mul o0.xy, r0.zzzz, r0.xyxx 66: mov o0.zw, l(0, 0, 0, 0) 67: ret |
如您所见,轮廓图的输出被划分为四个相等的正方形,这是我们需要看的第一件事:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
0: add r0.xyzw, v2.xyxy, v2.xyxy 1: round_ni r1.xy, r0.zwzz 2: frc r0.xyzw, r0.xyzw 3: add r1.zw, r1.xxxy, l(0.000000, 0.000000, -1.000000, -1.000000) 4: dp2 r1.z, r1.zwzz, r1.zwzz 5: add r1.z, -r1.z, l(1.000000) 6: max r2.w, r1.z, l(0) 7: dp2 r1.z, r1.xyxx, r1.xyxx 8: add r3.xyzw, r1.xyxy, l(-1.000000, -0.000000, -0.000000, -1.000000) 9: add r1.x, -r1.z, l(1.000000) 10: max r2.x, r1.x, l(0) 11: dp2 r1.x, r3.xyxx, r3.xyxx 12: dp2 r1.y, r3.zwzz, r3.zwzz 13: add r1.xy, -r1.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000) 14: max r2.yz, r1.xxyx, l(0, 0, 0, 0) |
我们首先计算floor(TextureUV*2.0),得出:

要确定单个正方形,使用一个小函数:
1 2 3 4 5 6 7 8 |
float getParams(float2 uv) { float d = dot(uv, uv); d = 1.0 - d; d = max( d, 0.0 ); return d; } |
注意,当输入为float2(0.0,0.0)时,此函数返回1.0
所以
1 2 3 4 5 6 7 8 9 10 11 12 13 |
float2 flooredTextureUV = floor( 2.0 * TextureUV ); ... float2 uv1 = flooredTextureUV; float2 uv2 = flooredTextureUV + float2(-1.0, -0.0); float2 uv3 = flooredTextureUV + float2( -0.0, -1.0); float2 uv4 = flooredTextureUV + float2(-1.0, -1.0); float4 mask; mask.x = getParams( uv1 ); mask.y = getParams( uv2 ); mask.z = getParams( uv3 ); mask.w = getParams( uv4 ); |
mask每个部分都等于1或0,并负责贴图内四个方块之一


一旦我们得到mask,让我们进一步第15行采样强度贴图。请注意,强度纹理是r11g11b10_float,而我们采样所有rgba。在这种情况下,.a隐式设置为1.0f。
用于此操作的贴图坐标可以计算为frac(textureuv*2.0)。因此,此操作的结果如下所示:

你觉得相似吗?
下一步很巧妙,做了一个点积
1 |
16: dp4 r1.x, r1.xyzw, r2.xyzw |
这样,在左上角的正方形中,我们只有红色通道(因此,只有感兴趣的对象),在右上角只有绿色通道(只有痕迹),在右下角有所有东西(因为强度的.w分量隐式设置为1.0)。结果如下:

有了这个主过滤器,我们就可以确定物体的轮廓了,这并不像人们想象的那么难。算法与锐化的算法非常相似-计算最大绝对差!
接下来,我们在当前处理的一个纹素附近采样四个texel(重要的是:本例中的texel大小是1.0/256.0!)并且计算红色通道和绿色通道的最大绝对差:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
float fTexel = 1.0 / 256; float2 sampling1 = TextureUV + float2( fTexel, 0 ); float2 sampling2 = TextureUV + float2( -fTexel, 0 ); float2 sampling3 = TextureUV + float2( 0, fTexel ); float2 sampling4 = TextureUV + float2( 0, -fTexel ); float2 intensity_x0 = texIntensityMap.Sample( sampler1, sampling1 ).xy; float2 intensity_x1 = texIntensityMap.Sample( sampler1, sampling2 ).xy; float2 intensity_diff_x = intensity_x0 - intensity_x1; float2 intensity_y0 = texIntensityMap.Sample( sampler1, sampling3 ).xy; float2 intensity_y1 = texIntensityMap.Sample( sampler1, sampling4 ).xy; float2 intensity_diff_y = intensity_y0 - intensity_y1; float2 maxAbsDifference = max( abs(intensity_diff_x), abs(intensity_diff_y) ); maxAbsDifference = saturate(maxAbsDifference); |
现在-如果我们把filter和maxAbsDifference相乘

如此简单有效。
一旦我们有了轮廓,我们就从上一帧中提取轮廓图。
然后,为了产生“重影”效果,我们使用当前过程和轮廓图中的值计算一些参数。
向我们的老朋友-整数噪音-问好。这里也有个动画参数(cb3[0].zw),在cbuffer中,并且随时间变化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
float2 outlines = masterFilter * maxAbsDifference; // Sample outline map float2 outlineMap = texOutlineMap.Sample( samplerLinearWrap, uv ).xy; // I guess it's related with ghosting float paramOutline = masterFilter*0.15 + outlineMap.y; paramOutline += 0.35 * outlines.r; paramOutline += 0.35 * outlines.g; // input for integer noise float2 noiseWeights = cb3_v0.zw; float2 noiseInputs = 150.0*uv + 300.0*noiseWeights; int2 iNoiseInputs = (int2) noiseInputs; float noise0 = clamp( integerNoise( iNoiseInputs.x + reversebits(iNoiseInputs.y) ), -1, 1 ) + 0.65; // r0.y |
然后,我们以与之前强度图相同的方式对轮廓图进行采样(此时纹素的大小为1.0/512.0),并计算.x分量的平均值:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// sampling of outline map fTexel = 1.0 / 512.0; sampling1 = saturate( uv + float2( fTexel, 0 ) ); sampling2 = saturate( uv + float2( -fTexel, 0 ) ); sampling3 = saturate( uv + float2( 0, fTexel ) ); sampling4 = saturate( uv + float2( 0, -fTexel ) ); float outline_x0 = texOutlineMap.Sample( sampler0, sampling1 ).x; float outline_x1 = texOutlineMap.Sample( sampler0, sampling2 ).x; float outline_y0 = texOutlineMap.Sample( sampler0, sampling3 ).x; float outline_y1 = texOutlineMap.Sample( sampler0, sampling4 ).x; float averageOutline = (outline_x0+outline_x1+outline_y0+outline_y1) / 4.0; |
然后,计算该特定像素中的平均值和值之间的差,并用整数噪声进行扰动:
1 2 3 |
// perturb with noise float frameOutlineDifference = averageOutline - outlineMap.x; frameOutlineDifference *= noise0; |
下一步是用噪声干扰“旧”轮廓图的值-这是主要给输出贴图提供块状外观的地方。
后面还有一些计算,在最后计算了“阻尼”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
// the main place with gives blocky look of texture float newNoise = outlineMap.x * noise0; float newOutline = frameOutlineDifference * 0.9 + paramOutline; newOutline -= 0.24*newNoise; // 59: add r0.x, r0.y, r0.z float2 finalOutline = float2( outlineMap.x + newOutline, newOutline); // * calculate damping float dampingParam = saturate( cb3_v0.x ); dampingParam = pow( dampingParam, 100 ); float damping = 0.7 + 0.16*dampingParam; // * final multiplication float2 finalColor = finalOutline * damping; return float4(finalColor, 0, 0); |
13.c “巫师嗅觉” 之三
在第一部分中,生成了全屏的(效果)强度图,它包含可视效果与距离的关系。在第二部分中,详细地研究了描边贴图如何决定描边和拖影效果。
这是最后一站,我们要把一切结合起来。最后是一个全屏后处理,输入有:颜色缓冲、描边图和强度图。
之前:

之后:

这个视频展示了这个效果
如你所见,除了将轮廓应用于杰拉特能看到/听到的物体之外,鱼眼效果也用于整个屏幕,整个屏幕(特别是角落)变得灰暗,感觉像真正的怪物猎人在行动。
完整的pixel shader汇编代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[3], immediateIndexed dcl_constantbuffer cb3[7], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s2, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t2 dcl_resource_texture2d (float,float,float,float) t3 dcl_input_ps_siv v0.xy, position dcl_output o0.xyzw dcl_temps 7 0: div r0.xy, v0.xyxx, cb0[2].xyxx 1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000) 2: mov r1.yz, abs(r0.zzwz) 3: div r0.z, cb0[2].x, cb0[2].y 4: mul r1.x, r0.z, r1.y 5: add r0.zw, r1.xxxz, -cb3[2].xxxy 6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556) 7: log r0.zw, r0.zzzw 8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000) 9: exp r0.zw, r0.zzzw 10: dp2 r0.z, r0.zwzz, r0.zwzz 11: sqrt r0.z, r0.z 12: min r0.z, r0.z, l(1.000000) 13: add r0.z, -r0.z, l(1.000000) 14: mov_sat r0.w, cb3[6].x 15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000) 16: add r1.x, r1.y, r1.x 17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000) 18: add r0.x, r0.x, r1.x 19: add r0.x, r0.y, r0.x 20: mul r0.x, r0.x, l(20.000000) 21: min r0.x, r0.x, l(1.000000) 22: add r1.xy, v0.xyxx, v0.xyxx 23: div r1.xy, r1.xyxx, cb0[2].xyxx 24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000) 25: dp2 r0.y, r1.xyxx, r1.xyxx 26: mul r1.xy, r0.yyyy, r1.xyxx 27: mul r0.y, r0.w, l(0.100000) 28: mul r1.xy, r0.yyyy, r1.xyxx 29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000) 30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000) 31: mul r1.xy, r1.xyxx, cb3[1].xxxx 32: mul r1.zw, r1.xxxy, cb0[2].zzzw 33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw 34: sample_indexable(texture2d)(float,float,float,float) r2.xyz, r1.zwzz, t0.xyzw, s0 35: mul r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000) 36: sample_indexable(texture2d)(float,float,float,float) r0.y, r3.xyxx, t2.yxzw, s2 37: mad r3.xy, r1.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000), l(0.500000, 0.000000, 0.000000, 0.000000) 38: sample_indexable(texture2d)(float,float,float,float) r2.w, r3.xyxx, t2.yzwx, s2 39: mul r2.w, r2.w, l(0.125000) 40: mul r3.x, cb0[0].x, l(0.100000) 41: add r0.x, -r0.x, l(1.000000) 42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000) 43: mov r3.yzw, l(0, 0, 0, 0) 44: mov r4.x, r0.y 45: mov r4.y, r2.w 46: mov r4.z, l(0) 47: loop 48: ige r4.w, r4.z, l(8) 49: breakc_nz r4.w 50: itof r4.w, r4.z 51: mad r4.w, r4.w, l(0.785375), -r3.x 52: sincos r5.x, r6.x, r4.w 53: mov r6.y, r5.x 54: mul r5.xy, r0.xxxx, r6.xyxx 55: mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw 56: mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000) 57: sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2 58: mad r4.x, r4.w, l(0.125000), r4.x 59: mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000) 60: sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2 61: mad r4.y, r4.w, l(0.125000), r4.y 62: mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz 63: sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0 64: mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw 65: iadd r4.z, r4.z, l(1) 66: endloop 67: sample_indexable(texture2d)(float,float,float,float) r0.xy, r1.zwzz, t3.xyzw, s0 68: mad_sat r0.xy, -r0.xyxx, l(0.800000, 0.750000, 0.000000, 0.000000), r4.xyxx 69: dp3 r1.x, r3.yzwy, l(0.300000, 0.300000, 0.300000, 0.000000) 70: add r1.yzw, -r1.xxxx, r3.yyzw 71: mad r1.xyz, r0.zzzz, r1.yzwy, r1.xxxx 72: mad r1.xyz, r1.xyzx, l(0.600000, 0.600000, 0.600000, 0.000000), -r2.xyzx 73: mad r1.xyz, r0.wwww, r1.xyzx, r2.xyzx 74: mul r0.yzw, r0.yyyy, cb3[4].xxyz 75: mul r2.xyz, r0.xxxx, cb3[5].xyzx 76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx 77: mov_sat r2.xyz, r0.xyzx 78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) 79: add r0.yzw, -r1.xxyz, r2.xxyz 80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx 81: mov o0.w, l(1.000000) 82: ret |
先看看输入
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
// *** Inputs // * Zoom amount, always 1 float zoomAmount = cb3_v1.x; // Another value which affect fisheye effect // but always set to float2(1.0, 1.0). float2 amount = cb0_v2.zw; // Elapsed time in seconds float time = cb0_v0.x; // Colors of witcher senses float3 colorInteresting = cb3_v5.rgb; float3 colorTraces = cb3_v4.rgb; // Was always set to float2(0.0, 0.0). // Setting this to higher values // makes "grey corners" effect weaker. float2 offset = cb3_v2.xy; // Dimensions of fullscreen float2 texSize = cb0_v2.xy; float2 invTexSize = cb0_v1.zw; // Main value which causes fisheye effect [0-1] const float fisheyeAmount = saturate( cb3_v6.x ); |
fisheyeAmount是主要的变量,我猜当杰拉特开始使用嗅觉时,它会从0渐变到1。其他的数值大多是常量,但我才如果用户关掉鱼眼会有些不同
shader中第一件事情是计算暗角
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
0: div r0.xy, v0.xyxx, cb0[2].xyxx 1: mad r0.zw, r0.xxxy, l(0.000000, 0.000000, 2.000000, 2.000000), l(0.000000, 0.000000, -1.000000, -1.000000) 2: mov r1.yz, abs(r0.zzwz) 3: div r0.z, cb0[2].x, cb0[2].y 4: mul r1.x, r0.z, r1.y 5: add r0.zw, r1.xxxz, -cb3[2].xxxy 6: mul_sat r0.zw, r0.zzzw, l(0.000000, 0.000000, 0.555556, 0.555556) 7: log r0.zw, r0.zzzw 8: mul r0.zw, r0.zzzw, l(0.000000, 0.000000, 2.500000, 2.500000) 9: exp r0.zw, r0.zzzw 10: dp2 r0.z, r0.zwzz, r0.zwzz 11: sqrt r0.z, r0.z 12: min r0.z, r0.z, l(1.000000) 13: add r0.z, -r0.z, l(1.000000) |
In HLSL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
// Main uv float2 uv = PosH.xy / texSize; // Scale at first from [0-1] to [-1;1], then calculate abs float2 uv3 = abs( uv * 2.0 - 1.0); // Aspect ratio float aspectRatio = texSize.x / texSize.y; // * Mask used to make corners grey float mask_gray_corners; { float2 newUv = float2( uv3.x * aspectRatio, uv3.y ) - offset; newUv = saturate( newUv / 1.8 ); newUv = pow(newUv, 2.5); mask_gray_corners = 1-min(1.0, length(newUv) ); } |
uv先被映射到[-1,1],取决ui之,然后发生了一个挤压,蒙版最后这样

现在,我有意省略几行代码,并仔细研究负责“缩放”效果的代码。
1 2 3 4 5 6 7 8 9 10 11 12 |
22: add r1.xy, v0.xyxx, v0.xyxx 23: div r1.xy, r1.xyxx, cb0[2].xyxx 24: add r1.xy, r1.xyxx, l(-1.000000, -1.000000, 0.000000, 0.000000) 25: dp2 r0.y, r1.xyxx, r1.xyxx 26: mul r1.xy, r0.yyyy, r1.xyxx 27: mul r0.y, r0.w, l(0.100000) 28: mul r1.xy, r0.yyyy, r1.xyxx 29: max r1.xy, r1.xyxx, l(-0.400000, -0.400000, 0.000000, 0.000000) 30: min r1.xy, r1.xyxx, l(0.400000, 0.400000, 0.000000, 0.000000) 31: mul r1.xy, r1.xyxx, cb3[1].xxxx 32: mul r1.zw, r1.xxxy, cb0[2].zzzw 33: mad r1.zw, v0.xxxy, cb0[1].zzzw, -r1.zzzw |
首先uv坐标乘二减一
1 2 3 |
float2 uv4 = 2 * PosH.xy; uv4 /= cb0_v2.xy; uv4 -= float2(1.0, 1.0); |
坐标变成

计算点积,变成一个蒙版

与之前提到的uv坐标相乘

重要提示:在左上角值为负,它们被表示为黑色的原因是(这里显示用的)r11g1b10_float的精度有限。那里没有符号位,所以我们不能存储负值。
稍后计算一个衰减因子
1 2 |
float attenuation = fisheyeAmount * 0.1; uv4 *= attenuation; |
做一个clamp和乘法,这样计算了uv偏移
1 |
float2 colorUV = mainUv - offset; |
使用colorUV采样颜色缓冲,就得到了边角扭曲的样子

描边
下一步是采样描边图图以查找轮廓。这很容易,首先uv坐标可以对“感兴趣的物体”描边进行采样,然后对“痕迹”进行采样:
1 2 3 4 5 6 7 8 9 10 11 12 |
// * Sample outline map // interesting objects (upper left square) float2 outlineUV = colorUV * 0.5; float outlineInteresting = texture2.Sample( sampler2, outlineUV ).x; // r0.y // traces (upper right square) outlineUV = colorUV * 0.5 + float2(0.5, 0.0); float outlineTraces = texture2.Sample( sampler2, outlineUV ).x; // r2.w outlineInteresting /= 8.0; // r4.x outlineTraces /= 8.0; // r4.y |
值得注意的是,我们只对描边图中的.x通道进行采样,以及只采样了四方格的上半部分
运动
为了产生拖影的运动,使用了与醉酒效果相似的技巧。引入一个单位圆,我们对感兴趣的物体和痕迹的描边图以及颜色缓冲进行了8次采样。
注意,我们刚才用8.0除以找到的描边。
由于我们在纹理坐标空间[0-1]^2中,半径为1的圆围绕特定像素旋转会产生不可接受的瑕疵
所以,我们先来看看半径是如何计算的。要做到这一点,我们必须回到刚才跳过的15-21行,计算半径的部分。一个小问题是,它的计算分散在着色器中。所以,一部分在(15-21)和第二部分在(41-42):
1 2 3 4 5 6 7 8 9 10 |
15: add_sat r1.xy, -r0.xyxx, l(0.030000, 0.030000, 0.000000, 0.000000) 16: add r1.x, r1.y, r1.x 17: add_sat r0.xy, r0.xyxx, l(-0.970000, -0.970000, 0.000000, 0.000000) 18: add r0.x, r0.x, r1.x 19: add r0.x, r0.y, r0.x 20: mul r0.x, r0.x, l(20.000000) 21: min r0.x, r0.x, l(1.000000) ... 41: add r0.x, -r0.x, l(1.000000) 42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000) |
如你所见,我们只考虑纹素[0-0.03]附近的区域,加和乘以20,最终结果是

第41行之后

然后在第42行乘以0.03,这是整个屏幕的圆半径。如你所见,屏幕边缘附近的半径越来越小。
有鉴于此,我们可以看看负责拖尾运动的汇编代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
40: mul r3.x, cb0[0].x, l(0.100000) 41: add r0.x, -r0.x, l(1.000000) 42: mul r0.xy, r0.xyxx, l(0.030000, 0.125000, 0.000000, 0.000000) 43: mov r3.yzw, l(0, 0, 0, 0) 44: mov r4.x, r0.y 45: mov r4.y, r2.w 46: mov r4.z, l(0) 47: loop 48: ige r4.w, r4.z, l(8) 49: breakc_nz r4.w 50: itof r4.w, r4.z 51: mad r4.w, r4.w, l(0.785375), -r3.x 52: sincos r5.x, r6.x, r4.w 53: mov r6.y, r5.x 54: mul r5.xy, r0.xxxx, r6.xyxx 55: mad r5.zw, r5.xxxy, l(0.000000, 0.000000, 0.125000, 0.125000), r1.zzzw 56: mul r6.xy, r5.zwzz, l(0.500000, 0.500000, 0.000000, 0.000000) 57: sample_indexable(texture2d)(float,float,float,float) r4.w, r6.xyxx, t2.yzwx, s2 58: mad r4.x, r4.w, l(0.125000), r4.x 59: mad r5.zw, r5.zzzw, l(0.000000, 0.000000, 0.500000, 0.500000), l(0.000000, 0.000000, 0.500000, 0.000000) 60: sample_indexable(texture2d)(float,float,float,float) r4.w, r5.zwzz, t2.yzwx, s2 61: mad r4.y, r4.w, l(0.125000), r4.y 62: mad r5.xy, r5.xyxx, r1.xyxx, r1.zwzz 63: sample_indexable(texture2d)(float,float,float,float) r5.xyz, r5.xyxx, t0.xyzw, s0 64: mad r3.yzw, r5.xxyz, l(0.000000, 0.125000, 0.125000, 0.125000), r3.yyzw 65: iadd r4.z, r4.z, l(1) 66: endloop |
停一下,在第40行,我们有个时间因子- elapsedTime * 0.1。在第43行,我们在循环中采样了颜色缓冲
r0.x(第41-42行)是圆的半径,r4.x(第44行)是感兴趣物体的轮廓,r4.y(第45行)-痕迹的轮廓(之前除以8)和r4.z(第46行)-循环计数器
如我们所料,循环有8次。我们首先用i*pi_4计算弧度角,得到2*pi-全周期。角度随时间而变化。
使用sincos我们确定采样点(单位圆)坐标并使用乘法调整半径(第54行)。
之后,我们围绕一个像素旋转,并对轮廓和颜色进行采样。循环之后,我们将得到轮廓和颜色的平均值(由于除以8)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
float timeParam = time * 0.1; // adjust circle radius circle_radius = 1.0 - circle_radius; circle_radius *= 0.03; float3 color_circle_main = float3(0.0, 0.0, 0.0); [loop] for (int i=0; 8 > i; i++) { // full 2*PI = 360 angles cycle const float angleRadians = (float) i * PI_4 - timeParam; // unit circle float2 unitCircle; sincos(angleRadians, unitCircle.y, unitCircle.x); // unitCircle.x = cos, unitCircle.y = sin // adjust radius unitCircle *= circle_radius; // * base texcoords (circle) - note we also scale radius here by 8 // * probably because of dimensions of outline map. // line 55 float2 uv_outline_base = colorUV + unitCircle / 8.0; // * interesting objects (circle) float2 uv_outline_interesting_circle = uv_outline_base * 0.5; float outline_interesting_circle = texture2.Sample( sampler2, uv_outline_interesting_circle ).x; outlineInteresting += outline_interesting_circle / 8.0; // * traces (circle) float2 uv_outline_traces_circle = uv_outline_base * 0.5 + float2(0.5, 0.0); float outline_traces_circle = texture2.Sample( sampler2, uv_outline_traces_circle ).x; outlineTraces += outline_traces_circle / 8.0; // * sample color texture (zooming effect) with perturbation float2 uv_color_circle = colorUV + unitCircle * offsetUV; float3 color_circle = texture0.Sample( sampler0, uv_color_circle ).rgb; color_circle_main += color_circle / 8.0; } |
颜色采样非常相似,但对于颜色UV,我们将偏移量乘以“单位”圆。
强度
循环之后,我们采样强度贴图并调整最终强度:
HLSL:
1 2 3 4 5 6 7 8 9 |
// * Sample intensity map float2 intensityMap = texture3.Sample( sampler0, colorUV ).xy; float intensityInteresting = intensityMap.r; float intensityTraces = intensityMap.g; // * Adjust outlines float mainOutlineInteresting = saturate( outlineInteresting - 0.8*intensityInteresting ); float mainOutlineTraces = saturate( outlineTraces - 0.75*intensityTraces ); |
暗角与最终合成
1 2 |
// * Greyish color float3 color_greyish = dot( color_circle_main, float3(0.3, 0.3, 0.3) ).xxx; |

然后我们有两个插值。用我早前描述的 “环形采样”的颜色图与灰色图合成这样除角落是灰色以外,还用0.6的系数降低了最终图像的饱和度:
然后用鱼眼量和颜色图结合
第二种使用鱼眼量将颜色缓冲与上述颜色组合。这意味着边角灰色,而且屏幕变暗了
HLSL:
1 2 3 4 5 6 7 8 9 |
// * Determine main color. // (1) At first, combine "circled" color with gray one. // Now we have have greyish corners here. float3 mainColor = lerp( color_greyish, color_circle_main, mask_gray_corners ) * 0.6; // (2) Then mix "regular" color with the above. // Please note this operation makes corners gradually gray (because fisheyeAmount rises from 0 to 1) // and gradually darker (because of 0.6 multiplier). mainColor = lerp( color, mainColor, fisheyeAmount ); |
现在我们可以看看描边
颜色(红色和黄色)来自cbuffer
1 2 3 4 |
// * Determine color of witcher senses float3 senses_traces = mainOutlineTraces * colorTraces; float3 senses_interesting = mainOutlineInteresting * colorInteresting; float3 senses_total = 1.2 * senses_traces + senses_interesting; |

最后是把描边混合起来,不仅仅是相加,首先计算点积
1 2 3 |
78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) ); |
看起来是这样的:

这用来插值原本颜色与巫师直觉的颜色
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
76: mad r0.xyz, r0.yzwy, l(1.200000, 1.200000, 1.200000, 0.000000), r2.xyzx 77: mov_sat r2.xyz, r0.xyzx 78: dp3_sat r0.x, r0.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) 79: add r0.yzw, -r1.xxyz, r2.xxyz 80: mad o0.xyz, r0.xxxx, r0.yzwy, r1.xyzx 81: mov o0.w, l(1.000000) 82: ret float3 senses_total = 1.2 * senses_traces + senses_interesting; // * Final combining float3 senses_total_sat = saturate(senses_total); float dot_senses_total = saturate( dot(senses_total, float3(1.0, 1.0, 1.0) ) ); float3 finalColor = lerp( mainColor, senses_total_sat, dot_senses_total ); return float4( finalColor, 1.0 ); |

14. 层卷云
说到户外,天空是决定游戏世界是否可信的因素之一。天空在大部分时间里占据了整个屏幕的40-50%。天空不仅仅是一个渐变色,我们还有星星,太阳,月亮,最后还有云。
虽然当前的趋势显然是使用raymarching渲染体积云,但巫师3中的云完全是基于贴图的。我已经看了一段时间,但明显,它比我最初预期的要复杂。如果你一直在关注这个系列,你就会知道“血与酒”和本体之间有区别。你猜怎么着-血与酒的云层也有一些变化。
巫师3里有几层云。根据目前的天气情况,我们只能看到卷云、高积云,也许还有一些来自层云家族(例如在暴风雨中)。或者,一无所有。
某些层输入贴图和shader有所不同。这影响了pixel shader的复杂性和长度。
尽管有这么多的多样性,但我们可以在巫师3的云渲染中观察到一些常见的模式。首先,他们都是前向管线中的,这是绝对正确的选择。它们都使用颜色混合(见下文)。这样就更容易控制特定层如何覆盖天空。
更有趣的是,有些层用相同的设置渲染两次。
经过评估,我选择了我能找到的最短的shader-为了(1)有最大的概率完整逆向工程它,(2)能够理解它的所有方面。
我会仔细看看巫师3:血与酒中的层卷云。
下面是一个示例:
渲染前

第一次渲染过程后

在第二次渲染过程之后

这里卷云被渲染了两次,增加了它的强度。
几何体与Vertex Shader
云模型看上去类似于典型的天空半球:

所有顶点都在[0-1]中,因此为了使模型围绕(0,0,0)点居中,在投影矩阵变换之前使用“缩放+偏移”。对于云,网格主要沿XY平面拉伸以超过视锥大小,结果如下:

它还计算了TBN。此外,还有逐顶点雾计算(颜色和强度)。
Pixel Shader
汇编如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[10], immediateIndexed dcl_constantbuffer cb1[9], immediateIndexed dcl_constantbuffer cb12[238], immediateIndexed dcl_constantbuffer cb4[13], immediateIndexed dcl_sampler s0, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_input_ps linear v0.xyzw dcl_input_ps linear v1.xyzw dcl_input_ps linear v2.w dcl_input_ps linear v3.xyzw dcl_input_ps linear v4.xyz dcl_input_ps linear v5.xyz dcl_output o0.xyzw dcl_temps 4 0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000) 1: dp3 r0.w, r0.xyzx, r0.xyzx 2: rsq r0.w, r0.w 3: mul r0.xyz, r0.wwww, r0.xyzx 4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx 5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx 6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0 7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000) 8: add r1.xyz, r1.xyzx, r1.xyzx 9: dp3 r0.w, r1.xyzx, r1.xyzx 10: rsq r0.w, r0.w 11: mul r1.xyz, r0.wwww, r1.xyzx 12: mul r2.xyz, r1.yyyy, v3.xyzx 13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx 14: mov r3.xy, v1.zwzz 15: mov r3.z, v3.w 16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx 17: dp3_sat r0.x, r0.xyzx, r1.xyzx 18: add r0.y, -cb4[2].x, cb4[3].x 19: mad r0.x, r0.x, r0.y, cb4[2].x 20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx 21: rsq r0.y, r0.y 22: mul r0.yz, r0.yyyy, -cb0[9].xxyx 23: add r1.xyz, -v4.xyzx, cb1[8].xyzx 24: dp3 r0.w, r1.xyzx, r1.xyzx 25: rsq r1.z, r0.w 26: sqrt r0.w, r0.w 27: add r0.w, r0.w, -cb4[7].x 28: mul r1.xy, r1.zzzz, r1.xyxx 29: dp2_sat r0.y, r0.yzyy, r1.xyxx 30: add r0.y, r0.y, r0.y 31: min r0.y, r0.y, l(1.000000) 32: add r0.z, -cb4[0].x, cb4[1].x 33: mad r0.z, r0.y, r0.z, cb4[0].x 34: mul r0.x, r0.x, r0.z 35: log r0.x, r0.x 36: mul r0.x, r0.x, l(2.200000) 37: exp r0.x, r0.x 38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx 39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx 40: mul r2.xyz, r0.xxxx, r1.xyzx 41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx 42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx 43: add r1.x, -cb4[7].x, cb4[8].x 44: div_sat r0.w, r0.w, r1.x 45: mul r1.x, r1.w, cb4[9].x 46: mad r1.y, -cb4[9].x, r1.w, r1.w 47: mad r0.w, r0.w, r1.y, r1.x 48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx 49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx 50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0 51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x 52: mad_sat r1.x, cb4[12].x, v2.w, r1.x 53: mul r0.w, r0.w, r1.x 54: mul_sat r0.w, r0.w, cb4[6].x 55: mul o0.xyz, r0.wwww, r0.xyzx 56: mov o0.w, r0.w 57: ret |
输入有两张四方连续纹理。其中一个包含法线贴图和云形状(A通道)。第二种是形状扰动的噪声。



带云参数的主要cbuffer是cb4。其值为

除此之外,还有其他cbuffer使用的值。别担心,我们也会讲到。
反向阳光方向
在shader中发生的第一件事是计算日光的归一化、Z反转方向:
1 2 3 4 5 6 7 |
0: mul r0.xyz, cb0[9].xyzx, l(1.000000, 1.000000, -1.000000, 0.000000) 1: dp3 r0.w, r0.xyzx, r0.xyzx 2: rsq r0.w, r0.w 3: mul r0.xyz, r0.wwww, r0.xyzx float3 invertedSunlightDir = normalize(lightDir * float3(1, 1, -1) ); |
正如我前面提到的,Z是向上轴,而CB0[9]是阳光方向。这个矢量进入太阳
采样云贴图
下一步是计算uv来采样“云”贴图,解压法向量并对其进行归一化。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
4: mul r1.xy, cb0[0].xxxx, cb4[5].xyxx 5: mad r1.xy, v1.xyxx, cb4[4].xyxx, r1.xyxx 6: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, r1.xyxx, t0.xyzw, s0 7: add r1.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000) 8: add r1.xyz, r1.xyzx, r1.xyzx 9: dp3 r0.w, r1.xyzx, r1.xyzx 10: rsq r0.w, r0.w // Calc sampling coords float2 cloudTextureUV = Texcoords * textureScale + elapsedTime * speedFactors; // Sample texture and get data from it float4 cloudTextureValue = texture0.Sample( sampler0, cloudTextureUV ).rgba; float3 normalMap = cloudTextureValue.xyz; float cloudShape = cloudTextureValue.a; // Unpack normal and normalize it float3 unpackedNormal = (normalMap - 0.5) * 2.0; unpackedNormal = normalize(unpackedNormal); |
为了使云层运动,我们需要以秒为单位的经过时间(cb[0].x),乘以速度因子,它影响云层在天空中运动速度(cb4[5].xy)
在我之前所说的云层的模型上uv被拉伸了,我们还需要影响贴图缩放系数(cb4[4].xy),影响云的大小
最终公式为:
1 |
samplingUV = Input.TextureUV * textureScale + time * speedMultiplier; |
在对所有4个通道进行采样后,我们得到了法线贴图(rgb通道)和云形状(a通道)。
法线贴图
正常方式计算法线贴图
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
11: mul r1.xyz, r0.wwww, r1.xyzx 12: mul r2.xyz, r1.yyyy, v3.xyzx 13: mad r2.xyz, v5.xyzx, r1.xxxx, r2.xyzx 14: mov r3.xy, v1.zwzz 15: mov r3.z, v3.w 16: mad r1.xyz, r3.xyzx, r1.zzzz, r2.xyzx // Perform bump mapping float3 SkyTangent = Input.Tangent; float3 SkyNormal = (float3( Input.Texcoords.zw, Input.param3.w )); float3 SkyBitangent = Input.param3.xyz; float3x3 TBN = float3x3(SkyTangent, SkyBitangent, SkyNormal); float3 finalNormal = (float3)mul( unpackedNormal, (TBN) ); |
高光强度(1)
下一步是计算ndotl,这会影响特定像素的高亮显示。
考虑以下汇编:
1 2 3 |
17: dp3_sat r0.x, r0.xyzx, r1.xyzx 18: add r0.y, -cb4[2].x, cb4[3].x 19: mad r0.x, r0.x, r0.y, cb4[2].x |
下面可视化了这帧的NdL

它用来插值最小强度和最大强度:
这样,部分云层暴露在阳光下会更明亮。
1 2 3 4 5 |
// Calculate cosine between normal and up-inv lightdir float NdotL = saturate( dot(invertedSunlightDir, finalNormal) ); // Param 1, line 19, r0.x float intensity1 = lerp( param1Min, param1Max, NdotL ); |
高光强度(2)
还有一个因素影响云的强度。
人和太阳垂直截面上的云更亮(米氏散射,译者注)
因此基于xy平面计算梯度,用它插值最小/最大值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
20: dp2 r0.y, -cb0[9].xyxx, -cb0[9].xyxx 21: rsq r0.y, r0.y 22: mul r0.yz, r0.yyyy, -cb0[9].xxyx 23: add r1.xyz, -v4.xyzx, cb1[8].xyzx 24: dp3 r0.w, r1.xyzx, r1.xyzx 25: rsq r1.z, r0.w 26: sqrt r0.w, r0.w 27: add r0.w, r0.w, -cb4[7].x 28: mul r1.xy, r1.zzzz, r1.xyxx 29: dp2_sat r0.y, r0.yzyy, r1.xyxx 30: add r0.y, r0.y, r0.y 31: min r0.y, r0.y, l(1.000000) 32: add r0.z, -cb4[0].x, cb4[1].x 33: mad r0.z, r0.y, r0.z, cb4[0].x 34: mul r0.x, r0.x, r0.z 35: log r0.x, r0.x 36: mul r0.x, r0.x, l(2.200000) 37: exp r0.x, r0.x // Calculate normalized -lightDir.xy (20-22) float2 lightDirXY = normalize( -lightDir.xy ); // Calculate world to camera float3 vWorldToCamera = ( CameraPos - WorldPos ); float worldToCamera_distance = length(vWorldToCamera); // normalize vector vWorldToCamera = normalize( vWorldToCamera ); float LdotV = saturate( dot(lightDirXY, vWorldToCamera.xy) ); float highlightedSkySection = saturate( 2*LdotV ); float intensity2 = lerp( param2Min, param2Max, highlightedSkySection ); float finalIntensity = pow( intensity2 *intensity1, 2.2); |
最后,我们将两个强度相乘,并做了2.2次幂。
云的颜色
计算云的颜色首先从cbuffer中的两个值开始,是太阳附近的云和天空对面的云的颜色。他们被highlightedSkySection插值
然后,结果乘以 finalIntensity。
最后,将结果与雾混合(为了提高性能,在vertex shader中进行了计算)。
1 2 3 4 5 6 7 8 9 |
38: add r1.xyz, cb12[236].xyzx, -cb12[237].xyzx 39: mad r1.xyz, r0.yyyy, r1.xyzx, cb12[237].xyzx 40: mul r2.xyz, r0.xxxx, r1.xyzx 41: mad r0.xyz, -r1.xyzx, r0.xxxx, v0.xyzx 42: mad r0.xyz, v0.wwww, r0.xyzx, r2.xyzx float3 cloudsColor = lerp( cloudsColorBack, cloudsColorFront, highlightedSunSection ); cloudsColor *= finalIntensity; cloudsColor = lerp( cloudsColor, FogColor, FogAmount ); |
确保卷云在地平线上更明显
在这帧上看不太到它,但事实上,云层在地平线附近比在杰拉特头上更明显。
你可能注意到我们在计算第二强度时计算了worldtocamera的长度:
1 2 3 4 |
23: add r1.xyz, -v4.xyzx, cb1[8].xyzx 24: dp3 r0.w, r1.xyzx, r1.xyzx 25: rsq r1.z, r0.w 26: sqrt r0.w, r0.w |
汇编中下次出现
1 2 3 4 5 |
26: sqrt r0.w, r0.w 27: add r0.w, r0.w, -cb4[7].x ... 43: add r1.x, -cb4[7].x, cb4[8].x 44: div_sat r0.w, r0.w, r1.x |
cb[7].x和cb[8].x的值分别为2000.0和7000.0。
这使用一个名为linstep的函数。
它有三个参数:最小值/最大值(范围)和V值(值)。
所以它的工作方式是,如果v在[min max]范围内,它返回一个介于[0.0-1.0]之间的线性插值。另一方面,如果v超出边界,linstep返回0.0或1.0。
一个简单的例子:
1 2 3 |
linstep( 1000.0, 2000.0, 999.0) = 0.0 linstep( 1000.0, 2000.0, 1500.0) = 0.5 linstep( 1000.0, 2000.0, 2000.0) = 1.0 |
所以它与HLSL中的smoothstep非常相似,只是在这种情况下执行的是线性插值而不是hermite插值。
linstep在hlsl中不存在,但它非常有用。真的值得放在你的工具箱里。
1 2 3 4 5 6 7 8 9 10 |
// linstep: // // Returns a linear interpolation between 0 and 1 if t is in the range [min, max] // if "v" is <= min, the output is 0 // if "v" i >= max, the output is 1 float linstep( float min, float max, float v ) { return saturate( (v - min) / (max - min) ); } |
回到巫师3:
一旦我们推导出这个系数了,它表明了天空离杰拉特有多远,我们就用它来减弱云层的强度:
1 2 3 4 5 6 7 8 |
45: mul r1.x, r1.w, cb4[9].x 46: mad r1.y, -cb4[9].x, r1.w, r1.w 47: mad r0.w, r0.w, r1.y, r1.x float distanceAttenuation = linstep( fadeDistanceStart, fadeDistanceEnd, worldToCamera_distance ); float fadedCloudShape = closeCloudsHidingFactor * cloudShape; cloudShape = lerp( fadedCloudShape, cloudShape, distanceAttenuation ); |
cloudShape是第一个贴图的A通道,closeCloudsHidingFactor是cbuffer的一个值,它控制杰拉特头上的云的可见程度。在我测试的每一帧中,它都是0.0,这等于没有云。随着距离衰减越来越接近1.0(从相机到天穹顶的距离增加),云越来越明显。
采样噪声贴图
对于噪声贴图,采样坐标的计算与对于云纹理的计算相同,有uv缩放和速度倍增。
1 2 3 4 5 6 7 8 9 |
48: mul r1.xy, cb0[0].xxxx, cb4[11].xyxx 49: mad r1.xy, v1.xyxx, cb4[10].xyxx, r1.xyxx 50: sample_indexable(texture2d)(float,float,float,float) r1.x, r1.xyxx, t1.xyzw, s0 // Calc sampling coords for noise float2 noiseTextureUV = Texcoords * textureScaleNoise + elapsedTime * speedFactorsNoise; // Sample texture and get data from it float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x; |
结合在一起
一旦我们有了一个噪声值,我们就必须把它和cloudShape结合起来。
我在理解“param2.w”(始终为1.0)和noisemult(设置为5.0,来自cbuffer)时遇到了一些问题。
无论如何,这里最重要的是影响云的可见度的最终值generalCloudsVisibility 。
最后输出颜色是cloudsColor 乘以噪声,A通道也是如此
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
51: mad r1.x, r1.x, cb4[12].x, -cb4[12].x 52: mad_sat r1.x, cb4[12].x, v2.w, r1.x 53: mul r0.w, r0.w, r1.x 54: mul_sat r0.w, r0.w, cb4[6].x 55: mul o0.xyz, r0.wwww, r0.xyzx 56: mov o0.w, r0.w 57: ret // Sample noise texture and get data from it float noiseTextureValue = texture1.Sample( sampler0, noiseTextureUV ).x; noiseTextureValue = noiseTextureValue * noiseMult - noiseMult; float noiseValue = saturate( noiseMult * Input.param2.w + noiseTextureValue); noiseValue *= cloudShape; float finalNoise = saturate( noiseValue * generalCloudsVisibility); return float4( cloudsColor*finalNoise, finalNoise ); |
15. 雾
雾可以通过多种方式实现。然而,简单距离雾的时代已经过去了,可编程渲染管线给我们打开了新的大门,可以有物理上正确和视觉上合理的解决方案。
当前雾渲染的趋势是使用compute shader
尽管上述介绍已经出现在2014年,巫师3也在2015/2016年上线,但杰拉特冒险中的雾是完全基于屏幕的,是典型的后处理。
这里是针对雾的pixel shader 汇编-值得注意的是,整个游戏(2015年和两个DLC)都是相同的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb3[2], immediateIndexed dcl_constantbuffer cb12[214], immediateIndexed dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_resource_texture2d (float,float,float,float) t2 dcl_input_ps_siv v0.xy, position dcl_output o0.xyzw dcl_temps 7 0: ftou r0.xy, v0.xyxx 1: mov r0.zw, l(0, 0, 0, 0) 2: ld_indexable(texture2d)(float,float,float,float) r1.x, r0.xyww, t0.xyzw 3: mad r1.y, r1.x, cb12[22].x, cb12[22].y 4: lt r1.y, r1.y, l(1.000000) 5: if_nz r1.y 6: utof r1.yz, r0.xxyx 7: mul r2.xyzw, r1.zzzz, cb12[211].xyzw 8: mad r2.xyzw, cb12[210].xyzw, r1.yyyy, r2.xyzw 9: mad r1.xyzw, cb12[212].xyzw, r1.xxxx, r2.xyzw 10: add r1.xyzw, r1.xyzw, cb12[213].xyzw 11: div r1.xyz, r1.xyzx, r1.wwww 12: ld_indexable(texture2d)(float,float,float,float) r2.xyz, r0.xyww, t1.xyzw 13: ld_indexable(texture2d)(float,float,float,float) r0.x, r0.xyzw, t2.xyzw 14: max r0.x, r0.x, cb3[1].x 15: add r0.yzw, r1.xxyz, -cb12[0].xxyz 16: dp3 r1.x, r0.yzwy, r0.yzwy 17: sqrt r1.x, r1.x 18: add r1.y, r1.x, -cb3[0].x 19: add r1.zw, -cb3[0].xxxz, cb3[0].yyyw 20: div_sat r1.y, r1.y, r1.z 21: mad r1.y, r1.y, r1.w, cb3[0].z 22: add r0.x, r0.x, l(-1.000000) 23: mad r0.x, r1.y, r0.x, l(1.000000) 24: div r0.yzw, r0.yyzw, r1.xxxx 25: mad r1.y, r0.w, cb12[22].z, cb12[0].z 26: add r1.x, r1.x, -cb12[22].z 27: max r1.x, r1.x, l(0) 28: min r1.x, r1.x, cb12[42].z 29: mul r1.z, r0.w, r1.x 30: mul r1.w, r1.x, cb12[43].x 31: mul r1.zw, r1.zzzw, l(0.000000, 0.000000, 0.062500, 0.062500) 32: dp3 r0.y, cb12[38].xyzx, r0.yzwy 33: add r0.z, r0.y, cb12[42].x 34: add r0.w, cb12[42].x, l(1.000000) 35: div_sat r0.z, r0.z, r0.w 36: add r0.w, -cb12[43].z, cb12[43].y 37: mad r0.z, r0.z, r0.w, cb12[43].z 38: mul r0.w, abs(r0.y), abs(r0.y) 39: mad_sat r2.w, r1.x, l(0.002000), l(-0.300000) 40: mul r0.w, r0.w, r2.w 41: lt r0.y, l(0), r0.y 42: movc r3.xyz, r0.yyyy, cb12[39].xyzx, cb12[41].xyzx 43: add r3.xyz, r3.xyzx, -cb12[40].xyzx 44: mad r3.xyz, r0.wwww, r3.xyzx, cb12[40].xyzx 45: movc r4.xyz, r0.yyyy, cb12[45].xyzx, cb12[47].xyzx 46: add r4.xyz, r4.xyzx, -cb12[46].xyzx 47: mad r4.xyz, r0.wwww, r4.xyzx, cb12[46].xyzx 48: ge r0.y, r1.x, cb12[48].y 49: if_nz r0.y 50: add r0.y, r1.y, cb12[42].y 51: mul r0.w, r0.z, r0.y 52: mul r1.y, r0.z, r1.z 53: mad r5.xyzw, r1.yyyy, l(16.000000, 15.000000, 14.000000, 13.000000), r0.wwww 54: max r5.xyzw, r5.xyzw, l(0, 0, 0, 0) 55: add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 56: div_sat r5.xyzw, r1.wwww, r5.xyzw 57: add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 58: mul r1.z, r5.y, r5.x 59: mul r1.z, r5.z, r1.z 60: mul r1.z, r5.w, r1.z 61: mad r5.xyzw, r1.yyyy, l(12.000000, 11.000000, 10.000000, 9.000000), r0.wwww 62: max r5.xyzw, r5.xyzw, l(0, 0, 0, 0) 63: add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 64: div_sat r5.xyzw, r1.wwww, r5.xyzw 65: add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 66: mul r1.z, r1.z, r5.x 67: mul r1.z, r5.y, r1.z 68: mul r1.z, r5.z, r1.z 69: mul r1.z, r5.w, r1.z 70: mad r5.xyzw, r1.yyyy, l(8.000000, 7.000000, 6.000000, 5.000000), r0.wwww 71: max r5.xyzw, r5.xyzw, l(0, 0, 0, 0) 72: add r5.xyzw, r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 73: div_sat r5.xyzw, r1.wwww, r5.xyzw 74: add r5.xyzw, -r5.xyzw, l(1.000000, 1.000000, 1.000000, 1.000000) 75: mul r1.z, r1.z, r5.x 76: mul r1.z, r5.y, r1.z 77: mul r1.z, r5.z, r1.z 78: mul r1.z, r5.w, r1.z 79: mad r5.xy, r1.yyyy, l(4.000000, 3.000000, 0.000000, 0.000000), r0.wwww 80: max r5.xy, r5.xyxx, l(0, 0, 0, 0) 81: add r5.xy, r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000) 82: div_sat r5.xy, r1.wwww, r5.xyxx 83: add r5.xy, -r5.xyxx, l(1.000000, 1.000000, 0.000000, 0.000000) 84: mul r1.z, r1.z, r5.x 85: mul r1.z, r5.y, r1.z 86: mad r0.w, r1.y, l(2.000000), r0.w 87: max r0.w, r0.w, l(0) 88: add r0.w, r0.w, l(1.000000) 89: div_sat r0.w, r1.w, r0.w 90: add r0.w, -r0.w, l(1.000000) 91: mul r0.w, r0.w, r1.z 92: mad r0.y, r0.y, r0.z, r1.y 93: max r0.y, r0.y, l(0) 94: add r0.y, r0.y, l(1.000000) 95: div_sat r0.y, r1.w, r0.y 96: add r0.y, -r0.y, l(1.000000) 97: mad r0.y, -r0.w, r0.y, l(1.000000) 98: add r0.z, r1.x, -cb12[48].y 99: mul_sat r0.z, r0.z, cb12[48].z 100: else 101: mov r0.yz, l(0.000000, 1.000000, 0.000000, 0.000000) 102: endif 103: log r0.y, r0.y 104: mul r0.w, r0.y, cb12[42].w 105: exp r0.w, r0.w 106: mul r0.y, r0.y, cb12[48].x 107: exp r0.y, r0.y 108: mul r0.yw, r0.yyyw, r0.zzzz 109: mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy 110: add r5.xyz, -r3.xyzx, cb12[188].xyzx 111: mad r5.xyz, r1.xxxx, r5.xyzx, r3.xyzx 112: add r0.z, cb12[188].w, l(-1.000000) 113: mad r0.z, r1.y, r0.z, l(1.000000) 114: mul_sat r5.w, r0.z, r0.w 115: lt r0.z, l(0), cb12[192].x 116: if_nz r0.z 117: mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy 118: add r6.xyz, -r3.xyzx, cb12[190].xyzx 119: mad r3.xyz, r1.xxxx, r6.xyzx, r3.xyzx 120: add r0.z, cb12[190].w, l(-1.000000) 121: mad r0.z, r1.y, r0.z, l(1.000000) 122: mul_sat r3.w, r0.z, r0.w 123: add r1.xyzw, -r5.xyzw, r3.xyzw 124: mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw 125: endif 126: mul r0.z, r0.x, r5.w 127: mul r0.x, r0.x, r0.y 128: dp3 r0.y, l(0.333000, 0.555000, 0.222000, 0.000000), r2.xyzx 129: mad r1.xyz, r0.yyyy, r4.xyzx, -r2.xyzx 130: mad r0.xyw, r0.xxxx, r1.xyxz, r2.xyxz 131: add r1.xyz, -r0.xywx, r5.xyzx 132: mad r0.xyz, r0.zzzz, r1.xyzx, r0.xywx 133: else 134: mov r0.xyz, l(0, 0, 0, 0) 135: endif 136: mov o0.xyz, r0.xyzx 137: mov o0.w, l(1.000000) 138: ret |
下面是一个有雾的日落场景示例:

我们看看输入:
贴图上我们有深度缓冲、AO和HDR颜色缓冲



结果是

深度缓冲用来重建世界位置
AO可以使阴影变暗
shader从确定像素是否不在天空开始。如果像素位于天空(深度==1.0)shader返回黑色。如果一个像素在场景中(深度<1.0),我们使用深度缓冲(第7-11行)重建世界位置,并通过雾的计算进行处理。
雾在延迟着色处理后不久。可以看到一些前向渲染元素还缺少在这个场景中。
关于巫师3中的雾,首先要知道的是它由两部分组成:“雾色fog color”和“空气色aerial color”。
1 2 3 4 5 |
struct FogResult { float4 paramsFog; // RGB: color, A: influence float4 paramsAerial; // RGB: color, A: influence }; |
每个部分有3种颜色:前、中、后。因此,我们有cbuffer数据有如“fogcolorfront”、“fogcolormidle”、“aerialcolorback”等。参见输入:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// *** Inputs *** // float3 FogSunDir = cb12_v38.xyz; float3 FogColorFront = cb12_v39.xyz; float3 FogColorMiddle = cb12_v40.xyz; float3 FogColorBack = cb12_v41.xyz; float4 FogBaseParams = cb12_v42; float4 FogDensityParamsScene = cb12_v43; float4 FogDensityParamsSky = cb12_v44; float3 AerialColorFront = cb12_v45.xyz; float3 AerialColorMiddle = cb12_v46.xyz; float3 AerialColorBack = cb12_v47.xyz; float4 AerialParams = cb12_v48; |
在计算最终颜色之前,我们需要计算一些向量和点积。Shader可以访问像素的世界位置、相机位置和雾/光方向这允许我们计算视向量和雾方向之间的点积。
1 2 3 4 5 6 |
float3 frag_vec = fragPosWorldSpace.xyz - customCameraPos.xyz; float frag_dist = length(frag_vec); float3 frag_dir = frag_vec / frag_dist; float dot_fragDirSunDir = dot(GlobalLightDirection.xyz, frag_dir); |
点积的绝对值的平方用来计算混合系数,再将结果与一些和距离有关的参数相乘:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
float3 curr_col_fog; float3 curr_col_aerial; { float _dot = dot_fragDirSunDir; float _dd = _dot; { const float _distOffset = -150; const float _distRange = 500; const float _mul = 1.0 / _distRange; const float _bias = _distOffset * _mul; _dd = abs(_dd); _dd *= _dd; _dd *= saturate( frag_dist * _mul + _bias ); } curr_col_fog = lerp( FogColorMiddle.xyz, (_dot>0.0f ? FogColorFront.xyz : FogColorBack.xyz), _dd ); curr_col_aerial = lerp( AerialColorMiddle.xyz, (_dot>0.0f ? AerialColorFront.xyz : AerialColorBack.xyz), _dd ); } |
视向量和光照方向点积负责在“前”和“后”颜色之间进行选择。
这里是最终梯度的可视化(_dd)。

空气/雾影响系数的计算要复杂得多。它有更多的参数,不仅仅是rgb颜色。它还包括场景密度。我们使用raymarching来确定雾的强度和比例因子:
有了视向量,我们可以把它除以16进行raymarching。如下,在计算中仅考虑.z分量(高度)(curr_pos_z_step)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
float fog_amount = 1; float fog_amount_scale = 0; [branch] if ( frag_dist >= AerialParams.y ) { float curr_pos_z_base = (customCameraPos.z + FogBaseParams.y) * density_factor; float curr_pos_z_step = frag_step.z * density_factor; [unroll] for ( int i=16; i>0; --i ) { fog_amount *= 1 - saturate( density_sample_scale / (1 + max( 0.0, curr_pos_z_base + (i) * curr_pos_z_step ) ) ); } fog_amount = 1 - fog_amount; fog_amount_scale = saturate( (frag_dist - AerialParams.y) * AerialParams.z ); } FogResult ret; ret.paramsFog = float4 ( curr_col_fog, fog_amount_scale * pow( abs(fog_amount), final_exp_fog ) ); ret.paramsAerial = float4 ( curr_col_aerial, fog_amount_scale * pow( abs(fog_amount), final_exp_aerial ) ); |
雾的强度显然取决于高度(.z分量),在最后雾的强度做了个指数计算
“final_exp_fog”和“final_exp_aerial”来自cbuffer,它们允许控制雾和空气颜色如何随着高度的升高影响。
雾的重载(Override)
我发现的shader不包括这段汇编
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
109: mad_sat r1.xy, r0.wwww, cb12[189].xzxx, cb12[189].ywyy 110: add r5.xyz, -r3.xyzx, cb12[188].xyzx 111: mad r5.xyz, r1.xxxx, r5.xyzx, r3.xyzx 112: add r0.z, l(-1.000000), cb12[188].w 113: mad r0.z, r1.y, r0.z, l(1.000000) 114: mul_sat r5.w, r0.w, r0.z 115: lt r0.z, l(0.000000), cb12[192].x 116: if_nz r0.z 117: mad_sat r1.xy, r0.wwww, cb12[191].xzxx, cb12[191].ywyy 118: add r6.xyz, -r3.xyzx, cb12[190].xyzx 119: mad r3.xyz, r1.xxxx, r6.xyzx, r3.xyzx 120: add r0.z, l(-1.000000), cb12[190].w 121: mad r0.z, r1.y, r0.z, l(1.000000) 122: mul_sat r3.w, r0.w, r0.z 123: add r1.xyzw, -r5.xyzw, r3.xyzw 124: mad r5.xyzw, cb12[192].xxxx, r1.xyzw, r5.xyzw 125: endif |
根据我的理解,这看起来像是对雾颜色和影响的两次重载
在大多数情况下,只有一个重载(cb12_v192.x是0.0),但在这种特殊情况下-它的值是~0.22,所以我们执行第二个重载。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
#ifdef OVERRIDE_FOG // Override float fog_influence = ret.paramsFog.w; // r0.w float override1ColorScale = cb12_v189.x; float override1ColorBias = cb12_v189.y; float3 override1Color = cb12_v188.rgb; float override1InfluenceScale = cb12_v189.z; float override1InfluenceBias = cb12_v189.w; float override1Influence = cb12_v188.w; float override1ColorAmount = saturate(fog_influence * override1ColorScale + override1ColorBias); float override1InfluenceAmount = saturate(fog_influence * override1InfluenceScale + override1InfluenceBias); float4 paramsFogOverride; paramsFogOverride.rgb = lerp(curr_col_fog, override1Color, override1ColorAmount ); // ***r5.xyz float param1 = lerp(1.0, override1Influence, override1InfluenceAmount); // r0.x paramsFogOverride.w = saturate(param1 * fog_influence ); // ** r5.w const float extraFogOverride = cb12_v192.x; [branch] if (extraFogOverride > 0.0) { float override2ColorScale = cb12_v191.x; float override2ColorBias = cb12_v191.y; float3 override2Color = cb12_v190.rgb; float override2InfluenceScale = cb12_v191.z; float override2InfluenceBias = cb12_v191.w; float override2Influence = cb12_v190.w; float override2ColorAmount = saturate(fog_influence * override2ColorScale + override2ColorBias); float override2InfluenceAmount = saturate(fog_influence * override2InfluenceScale + override2InfluenceBias); float4 paramsFogOverride2; paramsFogOverride2.rgb = lerp(curr_col_fog, override2Color, override2ColorAmount); // r3.xyz float ov_param1 = lerp(1.0, override2Influence, override2InfluenceAmount); // r0.z paramsFogOverride2.w = saturate(ov_param1 * fog_influence); // r3.w paramsFogOverride = lerp(paramsFogOverride, paramsFogOverride2, extraFogOverride); } ret.paramsFog = paramsFogOverride; #endif |
这是我们最后一个没有雾重载(第一个图像)、单重载(第二个图像)和双重载(第三个图像,最终结果)的场景:



调整AO
我发现的shader也根本没有使用AO。让我们再来看看AO贴图

1 2 3 4 5 6 7 8 9 10 11 |