原文参见 Reverse engineering the rendering of The Witcher 3: Index
这是第二篇,翻译原文9-12节。由于文章太长,而且废话较多,这里先做个简单的摘要吧
摘要
9-中介绍GBuffer的组成,讲了一些特殊的操作,比如Best Fit Normal,比如降饱和度。另外神奇的是巫师3并没有使用PBR,毕竟是2015年的游戏。
10-中介绍了远景雨幕,其实是一个圆柱上通过噪声滚动做出的
11-介绍了闪电的做法,是一个树状的mesh渲染出的,可以做粗细远近变化,并且加了一些随机效果
12-介绍了天空的渲染,包括大气散射,太阳和星空的渲染,星空是旋转的天空盒,并做了随机闪烁
9 GBuffer
这是我系列的第9部分
这部分我会揭示巫师3的gbuffer的一些细节
假定你了解延迟渲染的基本知识。简单回顾:不立即计算最后光照和着色,而是分成两个阶段:第一阶段(几何pass)填充gbuffer表面数据(颜色,法线,高光颜色等等),第二阶段(光照pass)混合所有并计算光照
延迟渲染很流行,因为它允许在全屏幕空间计算光照,结合tile-based等技术极大地提高了性能
简单地说,gbuffer是包含一系列几何属性贴图的集合,很重要的一点是设计它的组成。比如这个:Crysis3的渲染技术.
之后我们来看看巫师的一帧

gbuffer有三张R8G8B8A8_UNORM格式的rendertarget和一张D24_UNORM_S8_UNIT格式的深度+stentil缓冲







当然这不是gbuffer的全部,光照pass使用反射探针和其它buffer,但不是这篇的主题。在开始之前,我们先讨论一些泛泛的观察
整体观察
- 唯一清理的缓冲是深度/蒙版
如果你用帧分析器分析会有点惊讶,除了深度缓冲,其它没有调用clear指令。所以实际上RenderTarget1看上去这样,注意远处模糊的像素。

这是简单实用的优化,因为ClearRenderTargetView调用是有开销的,只有需要时才用
2. 反向Z
很多文章讨论过浮点深度缓冲的精确度,巫师3用了反向z,这是开放世界和较大渲染距离的自然选项。
在DX中这不复杂
a) 用0清理深度缓冲,而不是1远处是0而不是1
b) 计算投影矩阵时,将远近裁切值交换
c) 将深度测试从Less改为Greater
OpenGL中有些复杂,不过是值得的
Pixel Shader
我想展示pixel shader如何给gbuffer传递数据,我们至少存了颜色,法线,高光。但可能不像你想得容易
问题是pixel shader有很多变种,使用的贴图和参数不一样。比如这个桶。

看看他们的贴图:

我们有颜色,法线和高光颜色,很常见。但开始之前,一些碎语:几何体带有位置,UV,法线和tangent属性vertex shader输出uv,归一化的TBN。对于复杂的材质,比如有两张颜色贴图的。vertex shader会输出其他的,这里一个简单的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb4[3], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s13, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_resource_texture2d (float,float,float,float) t2 dcl_resource_texture2d (float,float,float,float) t13 dcl_input_ps linear v0.zw dcl_input_ps linear v1.xyzw dcl_input_ps linear v2.xyz dcl_input_ps linear v3.xyz dcl_input_ps_sgv v4.x, isfrontface dcl_output o0.xyzw dcl_output o1.xyzw dcl_output o2.xyzw dcl_temps 3 0: sample_indexable(texture2d)(float,float,float,float) r0.xyzw, v1.xyxx, t1.xyzw, s0 1: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t0.xyzw, s0 2: add r1.w, r1.y, r1.x 3: add r1.w, r1.z, r1.w 4: mul r2.x, r1.w, l(0.333300) 5: add r2.y, l(-1.000000), cb4[1].x 6: mul r2.y, r2.y, l(0.500000) 7: mov_sat r2.z, r2.y 8: mad r1.w, r1.w, l(-0.666600), l(1.000000) 9: mad r1.w, r2.z, r1.w, r2.x 10: mul r2.xzw, r1.xxyz, cb4[0].xxyz 11: mul_sat r2.xzw, r2.xxzw, l(1.500000, 0.000000, 1.500000, 1.500000) 12: mul_sat r1.w, abs(r2.y), r1.w 13: add r2.xyz, -r1.xyzx, r2.xzwx 14: mad r1.xyz, r1.wwww, r2.xyzx, r1.xyzx 15: max r1.w, r1.z, r1.y 16: max r1.w, r1.w, r1.x 17: lt r1.w, l(0.220000), r1.w 18: movc r1.w, r1.w, l(-0.300000), l(-0.150000) 19: mad r1.w, v0.z, r1.w, l(1.000000) 20: mul o0.xyz, r1.wwww, r1.xyzx 21: add r0.xyz, r0.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000) 22: add r0.xyz, r0.xyzx, r0.xyzx 23: mov r1.x, v0.w 24: mov r1.yz, v1.zzwz 25: mul r1.xyz, r0.yyyy, r1.xyzx 26: mad r1.xyz, v3.xyzx, r0.xxxx, r1.xyzx 27: mad r0.xyz, v2.xyzx, r0.zzzz, r1.xyzx 28: uge r1.x, l(0), v4.x 29: if_nz r1.x 30: dp3 r1.x, v2.xyzx, r0.xyzx 31: mul r1.xyz, r1.xxxx, v2.xyzx 32: mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx 33: endif 34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0 35: max r1.w, r1.z, r1.y 36: max r1.w, r1.w, r1.x 37: lt r1.w, l(0.200000), r1.w 38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000) 39: add r2.xyz, -r1.xyzx, r2.xyzx 40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx 41: lt r1.x, r0.w, l(0.330000) 42: mul r1.y, r0.w, l(0.950000) 43: movc r1.x, r1.x, r1.y, l(0.330000) 44: add r1.x, -r0.w, r1.x 45: mad o1.w, v0.z, r1.x, r0.w 46: dp3 r0.w, r0.xyzx, r0.xyzx 47: rsq r0.w, r0.w 48: mul r0.xyz, r0.wwww, r0.xyzx 49: max r0.w, abs(r0.y), abs(r0.x) 50: max r0.w, r0.w, abs(r0.z) 51: lt r1.xy, abs(r0.zyzz), r0.wwww 52: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz) 53: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy) 54: lt r1.z, r1.y, r1.x 55: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy 56: div r1.z, r1.y, r1.x 57: div r0.xyz, r0.xyzx, r0.wwww 58: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0) 59: mul r0.xyz, r0.wwww, r0.xyzx 60: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000) 61: mov o0.w, cb4[2].x 62: mov o2.w, l(0) 63: ret |
这个shader有很多步,我分别描述一下。首先是cbuffer的数值

颜色
我们先从难的考试,它不简单是采样贴图颜色,采样之后做了一步降饱和
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
float3 albedoColorFilter( in float3 color, in float desaturationFactor, in float3 desaturationValue ) { float sumColorComponents = color.r + color.g + color.b; float averageColorComponentValue = 0.3333 * sumColorComponents; float oneMinusAverageColorComponentValue = 1.0 - averageColorComponentValue; float factor = 0.5 * (desaturationFactor - 1.0); float avgColorComponent = lerp(averageColorComponentValue, oneMinusAverageColorComponentValue, saturate(factor)); float3 desaturatedColor = saturate(color * desaturationValue * 1.5); float mask = saturate( avgColorComponent * abs(factor) ); float3 finalColor = lerp( color, desaturatedColor, mask ); return finalColor; } |
对于大多数的物体,它就是返回的贴图本色,但适当的材质cbuffer数值,cb4_v1.x如果是1,会导致蒙版为0,会使用lerp混合颜色。
但有一些反例
我发现最高的降饱和系数时4,降饱和颜色取决于材质,可以是(0.2,0.3,0.4),但没有严格规范。我迫不及待重现了,下面是结果,当降饱和颜色为(0.25,0.3,0.45)时




我很确定这仅仅应用了材质属性,不是颜色的最终部分。15-20行是最后的几步
1 2 3 4 5 6 |
15: max r1.w, r1.z, r1.y 16: max r1.w, r1.w, r1.x 17: lt r1.w, l(0.220000), r1.w 18: movc r1.w, r1.w, l(-0.300000), l(-0.150000) 19: mad r1.w, v0.z, r1.w, l(1.000000) 20: mul o0.xyz, r1.wwww, r1.xyzx |
v0.z是vertex shader出来的,结果是0.记住它,因为vo.z后面会多次用到。
看上去一些系数和代码会让颜色变暗,但如果v0.z是0的话,颜色就不会变,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
/* ALBEDO */ // 可选的降饱和 float3 albedoColor = albedoColorFilter( colorTex, cb4_v1.x, cb4_v0.rgb ); float albedoMaxComponent = getMaxComponent( albedoColor ); // 不知道是什么,大部分情况是0 float paramZ = Input.out0.z; // note, mostly 0 // 注意0.70和0.85不是最后的汇编中,因为这里我想lerp,不得不手动调整他们。 float param = (albedoMaxComponent > 0.22) ? 0.70 : 0.85; float mulParam = lerp(1, param, paramZ); // Output pout.RT0.rgb = albedoColor * mulParam; pout.RT0.a = cb4_v2.x; |
关于RT0.a,如你所见,材质材质cbuffer,但是因为没有debug信息,我们也不知道是什么,或许是透明度?
法线
首先解压法线,然后正常做,没什么特别的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
/* NORMALS */ float3 sampledNormal = ((normalTex.xyz - 0.5) * 2); // Data to construct TBN matrix float3 Tangent = Input.TangentW.xyz; float3 Normal = Input.NormalW.xyz; float3 Bitangent; Bitangent.x = Input.out0.w; Bitangent.yz = Input.out1.zw; // 真实情况中去掉他们,只是一个hack,为了让法线-TBN乘法用mad指令,而不是mov Bitangent = saturate(Bitangent); float3x3 TBN = float3x3(Tangent, Bitangent, Normal); float3 normal = mul( sampledNormal, TBN ); |
看看28-33行,
1 2 3 4 5 6 |
28: uge r1.x, l(0), v4.x 29: if_nz r1.x 30: dp3 r1.x, v2.xyzx, r0.xyzx 31: mul r1.xyz, r1.xxxx, v2.xyzx 32: mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx 33: endif |
大概可以写成
1 2 3 4 5 6 |
[branch] if (bIsFrontFace <= 0) { float cosTheta = dot(Input.NormalW, normal); float3 invNormal = cosTheta * Input.NormalW; normal = normal - 2*invNormal; } |
我不确定这是合适的写法,如果你知道这是什么数学计算,请告诉我。(译者注:像是一个reflect操作。。。但是为啥对模型反面这么干我是头一回见)
我们看到pixel shader使用了SV_IsFrontFace
对于线和点,始终为True。特殊情况是三角面的线(线框模式),和实体模式一致。可以用geometry shader设置并被pixel shader读取
我想自己检查下,的确他只在线框模式可见。我相信目的是线框模式下正确计算法线。下面是比较(译者注:也有可能是开启双面渲染的材质,背面有特殊的计算法线方式,trick地修正光照)




你注意到rendertarget地格式是R8G8B8A8_UNORM了嘛?它意味着每个组件有256种可能,但它够用吗?
用有限地资源储存高质量地法线是一个已知问题,但是幸运的是我们有很多材料可以学习
也许你注意到这个技术在这里使用了,我要说整个几何pass种确实有一个附加贴图:

巫师3使用了Best Fit Normal的技术,我不会详细解释,它是2009-2010时Crytek开发的,随着CryEngine开源
BFN造成了法线贴图地颗粒感在缩放法线获得最佳后,我们把它从[-1,1]区间映射到[0,1]区间
高光
我们从34行开始,采样高光贴图
1 2 3 4 5 6 7 |
34: sample_indexable(texture2d)(float,float,float,float) r1.xyz, v1.xyxx, t2.xyzw, s0 35: max r1.w, r1.z, r1.y 36: max r1.w, r1.w, r1.x 37: lt r1.w, l(0.200000), r1.w 38: movc r2.xyz, r1.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000) 39: add r2.xyz, -r1.xyzx, r2.xyzx 40: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx |
如你所见他也有类似的变暗过滤,计算最大值,然后计算变暗值,然后和本色lerp。
1 2 3 4 5 6 7 8 |
/* SPECULAR */ float3 specularTex = texture2.Sample( samplerAnisoWrap, Texcoords ).rgb; // 和颜色类似,计算最大值,和阈值比较 float specularMaxComponent = getMaxComponent( specularTex ); float3 specB = (specularMaxComponent > 0.2) ? specularTex : float3(0.12, 0.12, 0.12); float3 finalSpec = lerp(specularTex, specB, paramZ); pout.RT2.xyz = finalSpec; |

我也不知道合适的名字是什么,我不知道他如何影响光照。它仅仅是法线图的alpha通道
汇编:
1 2 3 4 5 |
41: lt r1.x, r0.w, l(0.330000) 42: mul r1.y, r0.w, l(0.950000) 43: movc r1.x, r1.x, r1.y, l(0.330000) 44: add r1.x, -r0.w, r1.x 45: mad o1.w, v0.z, r1.x, r0.w |
和我们的老朋友v0.z打招呼,类似颜色和高光
1 2 3 4 5 6 |
/* REFLECTIVITY */ float reflectivity = normalTex.a; float reflectivity2 = (reflectivity < 0.33) ? (reflectivity * 0.95) : 0.33; float finalReflectivity = lerp(reflectivity, reflectivity2, paramZ); pout.RT1.a = finalReflectivity; |
好了这是第一个pixel shader的变种
pixel shader – 颜色+法线变种
我给你展示另一个变种,这次是颜色和法线。没有高光贴图
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb4[8], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s13, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t1 dcl_resource_texture2d (float,float,float,float) t13 dcl_input_ps linear v0.zw dcl_input_ps linear v1.xyzw dcl_input_ps linear v2.xyz dcl_input_ps linear v3.xyz dcl_input_ps_sgv v4.x, isfrontface dcl_output o0.xyzw dcl_output o1.xyzw dcl_output o2.xyzw dcl_temps 4 0: mul r0.x, v0.z, cb4[0].x 1: sample_indexable(texture2d)(float,float,float,float) r1.xyzw, v1.xyxx, t1.xyzw, s0 2: sample_indexable(texture2d)(float,float,float,float) r0.yzw, v1.xyxx, t0.wxyz, s0 3: add r2.x, r0.z, r0.y 4: add r2.x, r0.w, r2.x 5: add r2.z, l(-1.000000), cb4[2].x 6: mul r2.yz, r2.xxzx, l(0.000000, 0.333300, 0.500000, 0.000000) 7: mov_sat r2.w, r2.z 8: mad r2.x, r2.x, l(-0.666600), l(1.000000) 9: mad r2.x, r2.w, r2.x, r2.y 10: mul r3.xyz, r0.yzwy, cb4[1].xyzx 11: mul_sat r3.xyz, r3.xyzx, l(1.500000, 1.500000, 1.500000, 0.000000) 12: mul_sat r2.x, abs(r2.z), r2.x 13: add r2.yzw, -r0.yyzw, r3.xxyz 14: mad r0.yzw, r2.xxxx, r2.yyzw, r0.yyzw 15: max r2.x, r0.w, r0.z 16: max r2.x, r0.y, r2.x 17: lt r2.x, l(0.220000), r2.x 18: movc r2.x, r2.x, l(-0.300000), l(-0.150000) 19: mad r0.x, r0.x, r2.x, l(1.000000) 20: mul o0.xyz, r0.xxxx, r0.yzwy 21: add r0.xyz, r1.xyzx, l(-0.500000, -0.500000, -0.500000, 0.000000) 22: add r0.xyz, r0.xyzx, r0.xyzx 23: mov r1.x, v0.w 24: mov r1.yz, v1.zzwz 25: mul r1.xyz, r0.yyyy, r1.xyzx 26: mad r0.xyw, v3.xyxz, r0.xxxx, r1.xyxz 27: mad r0.xyz, v2.xyzx, r0.zzzz, r0.xywx 28: uge r0.w, l(0), v4.x 29: if_nz r0.w 30: dp3 r0.w, v2.xyzx, r0.xyzx 31: mul r1.xyz, r0.wwww, v2.xyzx 32: mad r0.xyz, -r1.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), r0.xyzx 33: endif 34: add r0.w, -r1.w, l(1.000000) 35: log r1.xyz, cb4[3].xyzx 36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 37: exp r1.xyz, r1.xyzx 38: mad r0.w, r0.w, cb4[4].x, cb4[5].x 39: mul_sat r1.xyz, r0.wwww, r1.xyzx 40: log r1.xyz, r1.xyzx 41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000) 42: exp r1.xyz, r1.xyzx 43: max r0.w, r1.z, r1.y 44: max r0.w, r0.w, r1.x 45: lt r0.w, l(0.200000), r0.w 46: movc r2.xyz, r0.wwww, r1.xyzx, l(0.120000, 0.120000, 0.120000, 0.000000) 47: add r2.xyz, -r1.xyzx, r2.xyzx 48: mad o2.xyz, v0.zzzz, r2.xyzx, r1.xyzx 49: lt r0.w, r1.w, l(0.330000) 50: mul r1.x, r1.w, l(0.950000) 51: movc r0.w, r0.w, r1.x, l(0.330000) 52: add r0.w, -r1.w, r0.w 53: mad o1.w, v0.z, r0.w, r1.w 54: lt r0.w, l(0), cb4[7].x 55: and o2.w, r0.w, l(0.064706) 56: dp3 r0.w, r0.xyzx, r0.xyzx 57: rsq r0.w, r0.w 58: mul r0.xyz, r0.wwww, r0.xyzx 59: max r0.w, abs(r0.y), abs(r0.x) 60: max r0.w, r0.w, abs(r0.z) 61: lt r1.xy, abs(r0.zyzz), r0.wwww 62: movc r1.yz, r1.yyyy, abs(r0.zzyz), abs(r0.zzxz) 63: movc r1.xy, r1.xxxx, r1.yzyy, abs(r0.yxyy) 64: lt r1.z, r1.y, r1.x 65: movc r1.xy, r1.zzzz, r1.xyxx, r1.yxyy 66: div r1.z, r1.y, r1.x 67: div r0.xyz, r0.xyzx, r0.wwww 68: sample_l(texture2d)(float,float,float,float) r0.w, r1.xzxx, t13.yzwx, s13, l(0) 69: mul r0.xyz, r0.wwww, r0.xyzx 70: mad o1.xyz, r0.xyzx, l(0.500000, 0.500000, 0.500000, 0.000000), l(0.500000, 0.500000, 0.500000, 0.000000) 71: mov o0.w, cb4[6].x 72: ret |
和前面的区别:
a) 1,19行,插值参数v0.z乘以了cbuffer系数cb4[0].x,被用于19行插值颜色,其他地方还是用的v0.z
b) 54-55行,o2.w设定条件是cb4[7].x > 0.0我们已经知道这是计算明度的地方
1 |
pout.RT2.w = (cb4_v7.x > 0.0) ? (16.5/255.0) : 0.0; |
c) 34-42行,完全不同的计算高光的方法
没有高光贴图
1 2 3 4 5 6 7 8 9 |
34: add r0.w, -r1.w, l(1.000000) 35: log r1.xyz, cb4[3].xyzx 36: mul r1.xyz, r1.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 37: exp r1.xyz, r1.xyzx 38: mad r0.w, r0.w, cb4[4].x, cb4[5].x 39: mul_sat r1.xyz, r0.wwww, r1.xyzx 40: log r1.xyz, r1.xyzx 41: mul r1.xyz, r1.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000) 42: exp r1.xyz, r1.xyzx |
我们使用了1-reflectivity
1 2 3 4 5 6 7 8 9 |
float oneMinusReflectivity = 1.0 - normalTex.a; float3 specularTex = pow(cb4_v3.rgb, 2.2); oneMinusReflectivity = oneMinusReflectivity * cb4_v4.x + cb4_v5.x; specularTex = saturate(specularTex * oneMinusReflectivity); specularTex = pow(specularTex, 1.0/2.2); // proceed as in the first variant... float specularMaxComponent = getMaxComponent( specularTex ); ... |
这个变种中cbuffer稍微大一些,这些参数用来模拟高光颜色。
其它的变种跟之前一样
总结,看上去简单的在实际应用中都会很复杂,巫师3的gbufer也不例外,我展示了pixel shader的一些简单变种,和一些总体观察。
10 远景雨
这次我们看一个有趣的气象现象-靠近地平线的远景雨。最容易的与它相遇的方法是拜访Skellige岛
我个人很喜欢这个现象并且好奇CDPR的图形程序如何实现它。让我们看看吧
这里是两张前后对比图


模型

第一步是几何,用了一个小圆柱
局部坐标系位置很小,在0-1之前drawcall的内容如下:

这里重要的是:Texcoord和Instance_Transformuv很简单,甚至可以程序化生成这个模型
有了局部坐标系的圆柱,乘以Instance_Transform提供的世界矩阵



看上去很吓人,我们解析一下
1 |
XMMATRIX mat( -227.7472, 159.8043, 374.0736, -116.4951, -194.7577, -173.3836, -494.4982, 238.6908, -14.16466, -185.4743, 784.564, -1.45565, 0.0, 0.0, 0.0, 1.0 ); mat = XMMatrixTranspose( mat ); XMVECTOR vScale; XMVECTOR vRotateQuat; XMVECTOR vTranslation; XMMatrixDecompose( &vScale, &vRotateQuat, &vTranslation, mat ); // Rotation matrix... XMMATRIX matRotate = XMMatrixRotationQuaternion( vRotateQuat ); |
有意思:
旋转四元数:0.0925, -0.3149, 0.883412, -0.33446
缩放:299.99, 300, 1000
位移:-116.495, 238.691, -1.456
重要的是了解到相机位置是-116.533, 234.869, 2.09
如你所见,我们缩放到很大,移动到相机位置并且旋转

Vertex Shader
输入几何体和vertex shader互相依赖:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[7], immediateIndexed dcl_constantbuffer cb2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_input v4.xyzw dcl_input v5.xyzw dcl_input v6.xyzw dcl_input v7.xyzw dcl_output o0.xyz dcl_output o1.xyzw dcl_output_siv o2.xyzw, position dcl_temps 2 0: mov o0.xy, v1.xyxx 1: mul r0.xyzw, v5.xyzw, cb1[6].yyyy 2: mad r0.xyzw, v4.xyzw, cb1[6].xxxx, r0.xyzw 3: mad r0.xyzw, v6.xyzw, cb1[6].zzzz, r0.xyzw 4: mad r0.xyzw, cb1[6].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 5: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 6: mov r1.w, l(1.000000) 7: dp4 o0.z, r1.xyzw, r0.xyzw 8: mov o1.xyzw, v7.xyzw 9: mul r0.xyzw, v5.xyzw, cb1[0].yyyy 10: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw 11: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw 12: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 13: dp4 o2.x, r1.xyzw, r0.xyzw 14: mul r0.xyzw, v5.xyzw, cb1[1].yyyy 15: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw 16: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw 17: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 18: dp4 o2.y, r1.xyzw, r0.xyzw 19: mul r0.xyzw, v5.xyzw, cb1[2].yyyy 20: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw 21: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw 22: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 23: dp4 o2.z, r1.xyzw, r0.xyzw 24: mul r0.xyzw, v5.xyzw, cb1[3].yyyy 25: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw 26: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw 27: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 28: dp4 o2.w, r1.xyzw, r0.xyzw 29: ret |
除了传递坐标和instance_lod_param,另外输出了两个:SV_Position和高度(z分量)这里,缩放为4,4,2而bias为-2,-2,-1
你可以注意到9和28行成了两个行矩阵
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
cbuffer cbPerFrame : register (b1) { row_major float4x4 g_viewProjMatrix; row_major float4x4 g_rainShaftsViewProjMatrix; } cbuffer cbPerObject : register (b2) { float4x4 g_mtxWorld; float4 g_modelScale; float4 g_modelBias; } struct VS_INPUT { float3 PositionW : POSITION; float2 Texcoord : TEXCOORD; float3 NormalW : NORMAL; float3 TangentW : TANGENT; float4 InstanceTransform0 : INSTANCE_TRANSFORM0; float4 InstanceTransform1 : INSTANCE_TRANSFORM1; float4 InstanceTransform2 : INSTANCE_TRANSFORM2; float4 InstanceLODParams : INSTANCE_LOD_PARAMS; }; struct VS_OUTPUT { float3 TexcoordAndZ : Texcoord0; float4 LODParams : LODParams; float4 PositionH : SV_Position; }; VS_OUTPUT RainShaftsVS( VS_INPUT Input ) { VS_OUTPUT Output = (VS_OUTPUT)0; // simple data passing Output.TexcoordAndZ.xy = Input.Texcoord; Output.LODParams = Input.InstanceLODParams; // world space float3 meshScale = g_modelScale.xyz; // float3( 4, 4, 2 ); float3 meshBias = g_modelBias.xyz; // float3( -2, -2, -1 ); float3 PositionL = Input.PositionW * meshScale + meshBias; // Manually build instanceWorld matrix from float4s: float4x4 matInstanceWorld = float4x4(Input.InstanceTransform0, Input.InstanceTransform1, Input.InstanceTransform2 , float4(0, 0, 0, 1) ); // World-space Height (.z) float4x4 matWorldInstanceLod = mul( g_rainShaftsViewProjMatrix, matInstanceWorld ); Output.TexcoordAndZ.z = mul( float4(PositionL, 1.0), transpose(matWorldInstanceLod) ).z; // SV_Posiiton float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceWorld ); Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) ); return Output; } |
Pixel Shader
有两个贴图:一张噪声和一张深度


cbuffer的数值




Pixel Shader:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[8], immediateIndexed dcl_constantbuffer cb2[3], immediateIndexed dcl_constantbuffer cb12[23], immediateIndexed dcl_constantbuffer cb4[8], immediateIndexed dcl_sampler s0, mode_default dcl_sampler s15, mode_default dcl_resource_texture2d (float,float,float,float) t0 dcl_resource_texture2d (float,float,float,float) t15 dcl_input_ps linear v0.xyz dcl_input_ps linear v1.w dcl_input_ps_siv v2.xy, position dcl_output o0.xyzw dcl_temps 1 0: mul r0.xy, cb0[0].xxxx, cb4[5].xyxx 1: mad r0.xy, v0.xyxx, cb4[4].xyxx, r0.xyxx 2: sample_indexable(texture2d)(float,float,float,float) r0.x, r0.xyxx, t0.xyzw, s0 3: add r0.y, -cb4[2].x, cb4[3].x 4: mad_sat r0.x, r0.x, r0.y, cb4[2].x 5: mul r0.x, r0.x, v0.y 6: mul r0.x, r0.x, v1.w 7: mul r0.x, r0.x, cb4[1].x 8: mul r0.yz, v2.xxyx, cb0[1].zzwz 9: sample_l(texture2d)(float,float,float,float) r0.y, r0.yzyy, t15.yxzw, s15, l(0) 10: mad r0.y, r0.y, cb12[22].x, cb12[22].y 11: mad r0.y, r0.y, cb12[21].x, cb12[21].y 12: max r0.y, r0.y, l(0.000100) 13: div r0.y, l(1.000000, 1.000000, 1.000000, 1.000000), r0.y 14: add r0.y, r0.y, -v0.z 15: mul_sat r0.y, r0.y, cb4[6].x 16: mul_sat r0.x, r0.y, r0.x 17: mad r0.y, cb0[7].y, r0.x, -r0.x 18: mad r0.x, cb4[7].x, r0.y, r0.x 19: mul r0.xyz, r0.xxxx, cb4[0].xyzx 20: log r0.xyz, r0.xyzx 21: mul r0.xyz, r0.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 22: exp r0.xyz, r0.xyzx 23: mul r0.xyz, r0.xyzx, cb2[2].xyzx 24: mul o0.xyz, r0.xyzx, cb2[2].wwww 25: mov o0.w, l(0) 26: ret |
看上去很多,实际不差
发生了什么?首先计算了动画UV,从cbuffer获取到逝去的时间来计算的。这个uv用来采样噪声贴图
有了噪声后,用它插值最大最小值,做了一些乘法,计算了强度蒙版
注意到远景都没了,因为圆柱执行了深度测试(没有写入)。之后计算了“远景物体蒙版”

可以这么计算:
1 |
farObjectsMask = saturate( (FrustumDepth - CylinderWorldSpaceHeight) * 0.001 ); |

个人觉得这可以更廉价,不计算世界坐标高度,而是将视锥深度乘以一个更小的数值。
两个乘起来的到最后蒙版

有了这个最后的蒙版,我们做了另一个插值(其实没做啥)然后乘上了shaft颜色,计算gamma矫正并输出最后输出用了0alpha
1 |
FinalColor = SourceColor * 1.0 + (1.0 - SourceAlpha) * DestColor. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
struct VS_OUTPUT { float3 TexcoordAndWorldspaceHeight : Texcoord0; float4 LODParams : LODParams; // float4(1,1,1,1) float4 PositionH : SV_Position; }; float getFrustumDepth( in float depth ) { // from [1-0] to [0-1] float d = depth * cb12_v22.x + cb12_v22.y; // special coefficents d = d * cb12_v21.x + cb12_v21.y; // return frustum depth return 1.0 / max(d, 1e-4); } float4 EditedShaderPS( in VS_OUTPUT Input ) : SV_Target0 { // * Input from Vertex Shader float2 InputUV = Input.TexcoordAndWorldspaceHeight.xy; float WorldHeight = Input.TexcoordAndWorldspaceHeight.z; float LODParam = Input.LODParams.w; // * Inputs float elapsedTime = cb0_v0.x; float2 uvAnimation = cb4_v5.xy; float2 uvScale = cb4_v4.xy; float minValue = cb4_v2.x; // 0.0 float maxValue = cb4_v3.x; // 1.0 float3 shaftsColor = cb4_v0.rgb; // RGB( 147, 162, 173 ) float3 finalColorFilter = cb2_v2.rgb; // float3( 1.175, 1.296, 1.342 ); float finalEffectIntensity = cb2_v2.w; float2 invViewportSize = cb0_v1.zw; float depthScale = cb4_v6.x; // 0.001 // sample noise float2 uvOffsets = elapsedTime * uvAnimation; float2 uv = InputUV * uvScale + uvOffsets; float disturb = texture0.Sample( sampler0, uv ).x; // * Intensity mask float intensity = saturate( lerp(minValue, maxValue, disturb) ); intensity *= InputUV.y; // transition from (0, 1) intensity *= LODParam; // usually 1.0 intensity *= cb4_v1.x; // 1.0 // Sample depth float2 ScreenUV = Input.PositionH.xy * invViewportSize; float hardwareDepth = texture15.SampleLevel( sampler15, ScreenUV, 0 ).x; float frustumDepth = getFrustumDepth( hardwareDepth ); // * Calculate mask covering distant objects behind cylinder. // Seems that the input really is world-space height (.z component, see vertex shader) float depth = frustumDepth - WorldHeight; float distantObjectsMask = saturate( depth * depthScale ); // * calculate final mask float finalEffectMask = saturate( intensity * distantObjectsMask ); // cb0_v7.y and cb4_v7.x are set to 1.0 so I didn't bother with naming them :) float paramX = finalEffectMask; float paramY = cb0_v7.y * finalEffectMask; float effectAmount = lerp(paramX, paramY, cb4_v7.x); // color of shafts comes from contant buffer float3 effectColor = effectAmount * shaftsColor; // gamma correction effectColor = pow(effectColor, 2.2); |
11 闪电
这一节我介绍闪电在巫师3中是如何渲染的
在雨幕之后,闪电也是在前向渲染pass中。


它持续很短的一段时间,因此最好的方式是0.25倍速播放,你可以看到它不是静态图像,强度随着时间变化微微增强。
和雨幕有很多相似的地方,比如混合方式和深度测试。
它的几何用了一个树状的模型,这个闪电用的是这个模型。它有uv和法线,并会在vertex shader中用到

Vertex Shader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[9], immediateIndexed dcl_constantbuffer cb2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_input v2.xyz dcl_input v4.xyzw dcl_input v5.xyzw dcl_input v6.xyzw dcl_input v7.xyzw dcl_output o0.xy dcl_output o1.xyzw dcl_output_siv o2.xyzw, position dcl_temps 3 0: mov o0.xy, v1.xyxx 1: mov o1.xyzw, v7.xyzw 2: mul r0.xyzw, v5.xyzw, cb1[0].yyyy 3: mad r0.xyzw, v4.xyzw, cb1[0].xxxx, r0.xyzw 4: mad r0.xyzw, v6.xyzw, cb1[0].zzzz, r0.xyzw 5: mad r0.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 6: mov r1.w, l(1.000000) 7: mad r1.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 8: dp4 r2.x, r1.xyzw, v4.xyzw 9: dp4 r2.y, r1.xyzw, v5.xyzw 10: dp4 r2.z, r1.xyzw, v6.xyzw 11: add r2.xyz, r2.xyzx, -cb1[8].xyzx 12: dp3 r1.w, r2.xyzx, r2.xyzx 13: rsq r1.w, r1.w 14: div r1.w, l(1.000000, 1.000000, 1.000000, 1.000000), r1.w 15: mul r1.w, r1.w, l(0.000001) 16: mad r2.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000) 17: mad r1.xyz, r2.xyzx, r1.wwww, r1.xyzx 18: mov r1.w, l(1.000000) 19: dp4 o2.x, r1.xyzw, r0.xyzw 20: mul r0.xyzw, v5.xyzw, cb1[1].yyyy 21: mad r0.xyzw, v4.xyzw, cb1[1].xxxx, r0.xyzw 22: mad r0.xyzw, v6.xyzw, cb1[1].zzzz, r0.xyzw 23: mad r0.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 24: dp4 o2.y, r1.xyzw, r0.xyzw 25: mul r0.xyzw, v5.xyzw, cb1[2].yyyy 26: mad r0.xyzw, v4.xyzw, cb1[2].xxxx, r0.xyzw 27: mad r0.xyzw, v6.xyzw, cb1[2].zzzz, r0.xyzw 28: mad r0.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 29: dp4 o2.z, r1.xyzw, r0.xyzw 30: mul r0.xyzw, v5.xyzw, cb1[3].yyyy 31: mad r0.xyzw, v4.xyzw, cb1[3].xxxx, r0.xyzw 32: mad r0.xyzw, v6.xyzw, cb1[3].zzzz, r0.xyzw 33: mad r0.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r0.xyzw 34: dp4 o2.w, r1.xyzw, r0.xyzw 35: ret |
它有雨幕相比有很多相似的地方,我就不重复了。主要不同在11-18行
1 2 3 4 5 6 7 8 9 |
11: add r2.xyz, r2.xyzx, -cb1[8].xyzx 12: dp3 r1.w, r2.xyzx, r2.xyzx 13: rsq r1.w, r1.w 14: div r1.w, l(1.000000, 1.000000, 1.000000, 1.000000), r1.w 15: mul r1.w, r1.w, l(0.000001) 16: mad r2.xyz, v2.xyzx, l(2.000000, 2.000000, 2.000000, 0.000000), l(-1.000000, -1.000000, -1.000000, 0.000000) 17: mad r1.xyz, r2.xyzx, r1.wwww, r1.xyzx 18: mov r1.w, l(1.000000) 19: dp4 o2.x, r1.xyzw, r0.xyzw |
cb1[8].xyz是相机位置,而r2.xyz是世界坐标。因此11行计算了视向量,之后12-15行计算了长度
1 |
length( worldPos - cameraPos) * 0.000001. |
v2.xyz是法线,16行将它从0-1解压到[-1,1],最后计算了世界坐标
1 |
finalWorldPos = worldPos + length( worldPos - cameraPos) * 0.000001 * normalVector |
1 2 3 4 5 6 7 8 9 10 11 12 |
... // final world-space position float3 vNormal = Input.NormalW * 2.0 - 1.0; float lencameratoworld = length( PositionL - g_cameraPos.xyz) * 0.000001; PositionL += vNormal*lencameratoworld; // SV_Posiiton float4x4 matModelViewProjection = mul(g_viewProjMatrix, matInstanceWorld ); Output.PositionH = mul( float4(PositionL, 1.0), transpose(matModelViewProjection) ); return Output; |
这样,模型会沿法线稍微膨胀。这里我将0.000001替换成了几个不同数值




Pixel shader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
ps_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb0[1], immediateIndexed dcl_constantbuffer cb2[3], immediateIndexed dcl_constantbuffer cb4[5], immediateIndexed dcl_input_ps linear v0.x dcl_input_ps linear v1.w dcl_output o0.xyzw dcl_temps 1 0: mad r0.x, cb0[0].x, cb4[4].x, v0.x 1: add r0.y, r0.x, l(-1.000000) 2: round_ni r0.y, r0.y 3: ishr r0.z, r0.y, l(13) 4: xor r0.y, r0.y, r0.z 5: imul null, r0.z, r0.y, r0.y 6: imad r0.z, r0.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 7: imad r0.y, r0.y, r0.z, l(146956042240.000000) 8: and r0.y, r0.y, l(0x7fffffff) 9: round_ni r0.z, r0.x 10: frc r0.x, r0.x 11: add r0.x, -r0.x, l(1.000000) 12: ishr r0.w, r0.z, l(13) 13: xor r0.z, r0.z, r0.w 14: imul null, r0.w, r0.z, r0.z 15: imad r0.w, r0.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 16: imad r0.z, r0.z, r0.w, l(146956042240.000000) 17: and r0.z, r0.z, l(0x7fffffff) 18: itof r0.yz, r0.yyzy 19: mul r0.z, r0.z, l(0.000000001) 20: mad r0.y, r0.y, l(0.000000001), -r0.z 21: mul r0.w, r0.x, r0.x 22: mul r0.x, r0.x, r0.w 23: mul r0.w, r0.w, l(3.000000) 24: mad r0.x, r0.x, l(-2.000000), r0.w 25: mad r0.x, r0.x, r0.y, r0.z 26: add r0.y, -cb4[2].x, cb4[3].x 27: mad_sat r0.x, r0.x, r0.y, cb4[2].x 28: mul r0.x, r0.x, v1.w 29: mul r0.yzw, cb4[0].xxxx, cb4[1].xxyz 30: mul r0.xyzw, r0.xyzw, cb2[2].wxyz 31: mul o0.xyz, r0.xxxx, r0.yzwy 32: mov o0.w, r0.x 33: ret |
好事是它不长,坏事是这在干嘛?
1 2 3 4 5 6 |
3: ishr r0.z, r0.y, l(13) 4: xor r0.y, r0.y, r0.z 5: imul null, r0.z, r0.y, r0.y 6: imad r0.z, r0.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 7: imad r0.y, r0.y, r0.z, l(146956042240.000000) 8: and r0.y, r0.y, l(0x7fffffff) |
实话说,这不是我一次看到它,但我第一次看到它很懵逼。
实际上你在好几个shader里都能看到它,我的答案是他是个整数随机数
1 2 3 4 5 6 7 |
// For more details see: http://libnoise.sourceforge.net/noisegen/ float integerNoise( int n ) { n = (n >> 13) ^ n; int nn = (n * (n * n * 60493 + 19990303) + 1376312589) & 0x7fffffff; return ((float)nn / 1073741824.0); } |
若你所见,它在pixelshader中执行两次。从那个网站中你可以找到更多的实现光滑噪声的方法。
看看这行,这里计算了uv动画
1 |
animation = elapsedTime * animationSpeed + TextureUV.x |
在做了floor之后,用于计算随机数。总体上而言我们计算了两个随机数,计算最终结果,并做了插值。
好了,这是个整数噪声,但是前面全是float,也没用过ftoi指令。我猜测CDPR的程序员用了asint之类的函数。
两个值的插值权重在10-11行计算
1 |
interpolationWeight = 1.0 - frac( animation ); |
这让我们可以用时间插值。为了产生光滑函数,后面用了SCurve函数
1 2 3 4 5 6 7 8 |
float s_curve( float x ) { float x2 = x * x; float x3 = x2 * x; // -2x^3 + 3x^2 return -2.0*x3 + 3.0*x2; } |

这就是smoothstep,但你从汇编中可以看到,他不是HLSL内置的smoothstep,内置的做了些clamp保证计算结果正确。但我们已经知道权重在0-1之间,所以可以安全跳过这些检查
计算最终结果包含几个乘法,注意最后输出的透明度可能变化,取决于噪声。这很省事,因为它会影响到闪电的透明度,就像实际的那样。
最后的pixel shader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
cbuffer cbPerFrame : register (b0) { float4 cb0_v0; float4 cb0_v1; float4 cb0_v2; float4 cb0_v3; } cbuffer cbPerFrame : register (b2) { float4 cb2_v0; float4 cb2_v1; float4 cb2_v2; float4 cb2_v3; } cbuffer cbPerFrame : register (b4) { float4 cb4_v0; float4 cb4_v1; float4 cb4_v2; float4 cb4_v3; float4 cb4_v4; } struct VS_OUTPUT { float2 Texcoords : Texcoord0; float4 InstanceLODParams : INSTANCE_LOD_PARAMS; float4 PositionH : SV_Position; }; // Shaders in TW3 use integer noise. // For more details see: http://libnoise.sourceforge.net/noisegen/ float integerNoise( int n ) { n = (n >> 13) ^ n; int nn = (n * (n * n * 60493 + 19990303) + 1376312589) & 0x7fffffff; return ((float)nn / 1073741824.0); } float s_curve( float x ) { float x2 = x * x; float x3 = x2 * x; // -2x^3 + 3x^2 return -2.0*x3 + 3.0*x2; } float4 Lightning_TW3_PS( in VS_OUTPUT Input ) : SV_Target { // * Inputs float elapsedTime = cb0_v0.x; float animationSpeed = cb4_v4.x; float minAmount = cb4_v2.x; float maxAmount = cb4_v3.x; float colorMultiplier = cb4_v0.x; float3 colorFilter = cb4_v1.xyz; float3 lightningColorRGB = cb2_v2.rgb; // Animation using time and X texcoord float animation = elapsedTime * animationSpeed + Input.Texcoords.x; // Input parameters for Integer Noise. // They are floored and please note there are using asint. // That might be an optimization to avoid "ftoi" instructions. int intX0 = asint( floor(animation) ); int intX1 = asint( floor(animation-1.0) ); float n0 = integerNoise( intX0 ); float n1 = integerNoise( intX1 ); // We interpolate "backwards" here. float weight = 1.0 - frac(animation); // Following the instructions from libnoise, we perform // smooth interpolation here with cubic s-curve function. float noise = lerp( n0, n1, s_curve(weight) ); // Make sure we are in [0.0 - 1.0] range. float lightningAmount = saturate( lerp(minAmount, maxAmount, noise) ); lightningAmount *= Input.InstanceLODParams.w; // 1.0 lightningAmount *= cb2_v2.w; // 1.0 // Calculate final lightning color float3 lightningColor = colorMultiplier * colorFilter; lightningColor *= lighntingColorRGB; float3 finalLightningColor = lightningColor * lightningAmount; return float4( finalLightningColor, lightningAmount ); } |
12- 愚蠢的天空trick
这部分和之前稍有不同,我会展示天空shader的一部分
为什么是愚蠢的trick而不是整个shader?首先一些原因,天空shader很大,2015版本有267行而血与酒有385行
其次,有很多参数对于逆向工程没必要但很艰难
因此我决定只展示一些trick,如果我发现了更多,我会补充在后面。
2015版本和血与酒的区别相当显著。比如,计算星星和闪烁,不同计算太阳的方式。血与酒还计算了夜间的银河。
我们从一些基本的开始。 一个晚上晴朗的天空
基础
像很多当代游戏一样,巫师3用天空穹顶。看一下这个半球,它的包围和是[0,0,0]到[1,1,1],有平滑的uv

天空穹顶和天空盒很相似,vertexshader中我们将穹顶随着观察者移动,产生很远的假象。
如果你读了整个系列了解巫师3用了反向深度,最远为0。为了让天空穹顶位于远平面,我们将视口参数的最远最近设为一样。

Vertex Shader
巫师3(2015)的shader是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer cb1[4], immediateIndexed dcl_constantbuffer cb2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_output o0.xy dcl_output o1.xyz dcl_output_siv o2.xyzw, position dcl_temps 2 0: mov o0.xy, v1.xyxx 1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 2: mov r0.w, l(1.000000) 3: dp4 o1.x, r0.xyzw, cb2[0].xyzw 4: dp4 o1.y, r0.xyzw, cb2[1].xyzw 5: dp4 o1.z, r0.xyzw, cb2[2].xyzw 6: mul r1.xyzw, cb1[0].yyyy, cb2[1].xyzw 7: mad r1.xyzw, cb2[0].xyzw, cb1[0].xxxx, r1.xyzw 8: mad r1.xyzw, cb2[2].xyzw, cb1[0].zzzz, r1.xyzw 9: mad r1.xyzw, cb1[0].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw 10: dp4 o2.x, r0.xyzw, r1.xyzw 11: mul r1.xyzw, cb1[1].yyyy, cb2[1].xyzw 12: mad r1.xyzw, cb2[0].xyzw, cb1[1].xxxx, r1.xyzw 13: mad r1.xyzw, cb2[2].xyzw, cb1[1].zzzz, r1.xyzw 14: mad r1.xyzw, cb1[1].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw 15: dp4 o2.y, r0.xyzw, r1.xyzw 16: mul r1.xyzw, cb1[2].yyyy, cb2[1].xyzw 17: mad r1.xyzw, cb2[0].xyzw, cb1[2].xxxx, r1.xyzw 18: mad r1.xyzw, cb2[2].xyzw, cb1[2].zzzz, r1.xyzw 19: mad r1.xyzw, cb1[2].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw 20: dp4 o2.z, r0.xyzw, r1.xyzw 21: mul r1.xyzw, cb1[3].yyyy, cb2[1].xyzw 22: mad r1.xyzw, cb2[0].xyzw, cb1[3].xxxx, r1.xyzw 23: mad r1.xyzw, cb2[2].xyzw, cb1[3].zzzz, r1.xyzw 24: mad r1.xyzw, cb1[3].wwww, l(0.000000, 0.000000, 0.000000, 1.000000), r1.xyzw 25: dp4 o2.w, r0.xyzw, r1.xyzw 26: ret |
在这个情况下,vertexshader仅仅输出了uv和世界坐标。血与酒中还有法线。简单起见我们用2015的版本。
看看cbuffer

这里有世界矩阵。没什么特别的,cb2_v4和cb2_v5是缩放/偏移系数,将坐标从[0,1]变换到[-1,1]。但这里z(上)方向系数会压缩他。

我们已经见到了相似的vertex shader,通用算法是传递uv,然后用缩放/偏移系数计算坐标,然后计算PositionW,最后用世界矩阵和投影矩阵计算裁剪空间坐标
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
struct InputStruct { float3 param0 : POSITION; float2 param1 : TEXCOORD; float3 param2 : NORMAL; float4 param3 : TANGENT; }; struct OutputStruct { float2 param0 : TEXCOORD0; float3 param1 : TEXCOORD1; float4 param2 : SV_Position; }; OutputStruct EditedShaderVS(in InputStruct IN) { OutputStruct OUT = (OutputStruct)0; // Simple texcoords passing OUT.param0 = IN.param1; // * Manually construct world and viewProj martices from float4s: row_major matrix matWorld = matrix(cb2_v0, cb2_v1, cb2_v2, float4(0,0,0,1) ); matrix matViewProj = matrix(cb1_v0, cb1_v1, cb1_v2, cb1_v3); // * Some optional fun with worldMatrix // a) Scale //matWorld._11 = matWorld._22 = matWorld._33 = 0.225f; // b) Translate // X Y Z //matWorld._14 = 520.0997; //matWorld._24 = 74.4226; //matWorld._34 = 113.9; // Local space - note the scale+bias here! //float3 meshScale = float3(2.0, 2.0, 2.0); //float3 meshBias = float3(-1.0, -1.0, -0.4); float3 meshScale = cb2_v4.xyz; float3 meshBias = cb2_v5.xyz; float3 Position = IN.param0 * meshScale + meshBias; // World space float4 PositionW = mul(float4(Position, 1.0), transpose(matWorld) ); OUT.param1 = PositionW.xyz; // Clip space - original approach from The Witcher 3 matrix matWorldViewProj = mul(matViewProj, matWorld); OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) ); return OUT; } |
Rendoc的好处是它可以注入你自己的shader而不影响管线,如你所见我改变了几何体的缩放和位移得到了一些有意思的结果

优化vertex shader
你发现vertex shader的问题了吗?逐顶点矩阵乘法没必要。我发现很多shader里都有了。我们可以直接用PositionW乘以投影矩阵把这个
1 2 3 |
// Clip space - original approach from The Witcher 3 matrix matWorldViewProj = mul(matViewProj, matWorld); OUT.param2 = mul( float4(Position, 1.0), transpose(matWorldViewProj) ); |
换成
1 2 |
// Clip space - optimized version OUT.param2 = mul( matViewProj, PositionW ); |
优化版本是这样:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
vs_5_0 dcl_globalFlags refactoringAllowed dcl_constantbuffer CB1[4], immediateIndexed dcl_constantbuffer CB2[6], immediateIndexed dcl_input v0.xyz dcl_input v1.xy dcl_output o0.xy dcl_output o1.xyz dcl_output_siv o2.xyzw, position dcl_temps 2 0: mov o0.xy, v1.xyxx 1: mad r0.xyz, v0.xyzx, cb2[4].xyzx, cb2[5].xyzx 2: mov r0.w, l(1.000000) 3: dp4 r1.x, r0.xyzw, cb2[0].xyzw 4: dp4 r1.y, r0.xyzw, cb2[1].xyzw 5: dp4 r1.z, r0.xyzw, cb2[2].xyzw 6: mov o1.xyz, r1.xyzx 7: mov r1.w, l(1.000000) 8: dp4 o2.x, cb1[0].xyzw, r1.xyzw 9: dp4 o2.y, cb1[1].xyzw, r1.xyzw 10: dp4 o2.z, cb1[2].xyzw, r1.xyzw 11: dp4 o2.w, cb1[3].xyzw, r1.xyzw 12: ret |
如你所见从26行变成了12行。我不明白为什么这个问题如此普遍。你可以在renderdoc中注入我优化过的shader,一点没有改变视觉效果。实话说我不明白为什么CDPR要做矩阵-矩阵乘法
太阳
巫师3计算大气散射和太阳用了两个drawcall



太阳的渲染和月亮很像,在几何体和混合/状态上。
另一方面血与酒的太阳和天空是一个pass渲染的


无论你如何渲染太阳,你都需要阳光方向。一个直觉的方法是用球面坐标。你只需要两个参数:phi和theta。可以假设r是1,我们可以计算为:
1 2 3 4 5 |
float3 vSunDir; vSunDir.x = sin(fTheta)*cos(fPhi); vSunDir.y = sin(fTheta)*sin(fPhi); vSunDir.z = cos(fTheta); vSunDir = normalize(vSunDir); |
正常来讲你也可以在应用阶段计算太阳方向然后传给cbuffer。有了方向我们就可以深入pixel shader
1 2 3 4 5 6 7 8 9 10 |
... 100: add r1.xyw, -r0.xyxz, cb12[0].xyxz 101: dp3 r2.x, r1.xywx, r1.xywx 102: rsq r2.x, r2.x 103: mul r1.xyw, r1.xyxw, r2.xxxx 104: mov_sat r2.xy, cb12[205].yxyy 105: dp3 r2.z, -r1.xywx, -r1.xywx 106: rsq r2.z, r2.z 107: mul r1.xyw, -r1.xyxw, r2.zzzz ... |
首先cb12[0].xyz是相机的方向。在r0.xyz中我们储存了顶点坐标。因此100行计算了世界到相机向量。看看105-107行,计算了相机到世界的归一化向量
1 |
120: dp3_sat r1.x, cb12[203].yzwy, r1.xywx |
然后计算了相机方向和太阳方向的点积,记得归一化它们。并且我们要clamp到0-1之间
我们有了点积之后:
1 2 3 4 |
152: log r1.x, r1.x 153: mul r1.x, r1.x, cb12[203].x 154: exp r1.x, r1.x 155: mul r1.x, r2.y, r1.x |
log,mul,exp做了指数运算。我们计算了点积的指数,原因是产生了模拟太阳的渐变光晕。有了渐变,就可以在天空颜色和太阳颜色间插值了。


注意到这个也可以用来模拟日食现象,然后也需要月亮方向的向量。
最后的代码:
1 2 3 4 5 6 |
float3 vCamToWorld = normalize( PosW – CameraPos ); float cosTheta = saturate( dot(vSunDir, vCamToWorld) ); float sunGradient = pow( cosTheta, sunExponent ); float3 color = lerp( skyColor, sunColor, sunGradient ); |
移动的星星
如果你在晴朗夜空上做一个延时,会发现星星不是静止的,而是慢慢再天空移动的。
我们先从星星开始,它是一张1024*1024*6的cubemap,如果你想下会发现有一个很简单的方式使用方向采样cubemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
159: add r1.xyz, -v1.xyzx, cb1[8].xyzx 160: dp3 r0.w, r1.xyzx, r1.xyzx 161: rsq r0.w, r0.w 162: mul r1.xyz, r0.wwww, r1.xyzx 163: mul r2.xyz, cb12[204].zwyz, l(0.000000, 0.000000, 1.000000, 0.000000) 164: mad r2.xyz, cb12[204].yzwy, l(0.000000, 1.000000, 0.000000, 0.000000), -r2.xyzx 165: mul r4.xyz, r2.xyzx, cb12[204].zwyz 166: mad r4.xyz, r2.zxyz, cb12[204].wyzw, -r4.xyzx 167: dp3 r4.x, r1.xyzx, r4.xyzx 168: dp2 r4.y, r1.xyxx, r2.yzyy 169: dp3 r4.z, r1.xyzx, cb12[204].yzwy 170: dp3 r0.w, r4.xyzx, r4.xyzx 171: rsq r0.w, r0.w 172: mul r2.xyz, r0.wwww, r4.xyzx 173: sample_indexable(texturecube)(float,float,float,float) r4.xyz, r2.xyzx, t0.xyzw, s0 |
为了计算方向向量,首先从世界相机向量开始。用月亮方向乘了两个叉积,最后执行了三个点积获得了最后的向量
1 2 3 4 5 6 7 8 9 10 11 12 |
float3 vWorldToCamera = normalize( g_CameraPos.xyz - Input.PositionW.xyz ); float3 vMoonDirection = cb12_v204.yzw; float3 vStarsSamplingDir = cross( vMoonDirection, float3(0, 0, 1) ); float3 vStarsSamplingDir2 = cross( vStarsSamplingDir, vMoonDirection ); float dirX = dot( vWorldToCamera, vStarsSamplingDir2 ); float dirY = dot( vWorldToCamera, vStarsSamplingDir ); float dirZ = dot( vWorldToCamera, vMoonDirection); float3 dirXYZ = normalize( float3(dirX, dirY, dirZ) ); float3 starsColor = texNightStars.Sample( samplerAnisoWrap, dirXYZ ).rgb; |
我需要更严谨地调查,读者们,如果你了解的话请让我知道(译者注:换了一套基座标。首先重建了一个月亮为天顶方向的坐标系,然后将视方向投影到这个坐标系采样星星。意义是星星贴图是相对月亮的坐标系的。)
闪烁星星
一个trick是闪烁的星星,如果你在novigrad的郊野漫步会发现星星在闪。我很好奇如何实现的,2015版本和血与酒有很多大差异,简化起见我们看2015版本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
174: mul r0.w, v0.x, l(100.000000) 175: round_ni r1.w, r0.w 176: mad r2.w, v0.y, l(50.000000), cb0[0].x 177: round_ni r4.w, r2.w 178: bfrev r4.w, r4.w 179: iadd r5.x, r1.w, r4.w 180: ishr r5.y, r5.x, l(13) 181: xor r5.x, r5.x, r5.y 182: imul null, r5.y, r5.x, r5.x 183: imad r5.y, r5.y, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 184: imad r5.x, r5.x, r5.y, l(146956042240.000000) 185: and r5.x, r5.x, l(0x7fffffff) 186: itof r5.x, r5.x 187: mad r5.y, v0.x, l(100.000000), l(-1.000000) 188: round_ni r5.y, r5.y 189: iadd r4.w, r4.w, r5.y 190: ishr r5.z, r4.w, l(13) 191: xor r4.w, r4.w, r5.z 192: imul null, r5.z, r4.w, r4.w 193: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 194: imad r4.w, r4.w, r5.z, l(146956042240.000000) 195: and r4.w, r4.w, l(0x7fffffff) 196: itof r4.w, r4.w 197: add r5.z, r2.w, l(-1.000000) 198: round_ni r5.z, r5.z 199: bfrev r5.z, r5.z 200: iadd r1.w, r1.w, r5.z 201: ishr r5.w, r1.w, l(13) 202: xor r1.w, r1.w, r5.w 203: imul null, r5.w, r1.w, r1.w 204: imad r5.w, r5.w, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 205: imad r1.w, r1.w, r5.w, l(146956042240.000000) 206: and r1.w, r1.w, l(0x7fffffff) 207: itof r1.w, r1.w 208: mul r1.w, r1.w, l(0.000000001) 209: iadd r5.y, r5.z, r5.y 210: ishr r5.z, r5.y, l(13) 211: xor r5.y, r5.y, r5.z 212: imul null, r5.z, r5.y, r5.y 213: imad r5.z, r5.z, l(0x0000ec4d), l(0.0000000000000000000000000000000000001) 214: imad r5.y, r5.y, r5.z, l(146956042240.000000) 215: and r5.y, r5.y, l(0x7fffffff) 216: itof r5.y, r5.y 217: frc r0.w, r0.w 218: add r0.w, -r0.w, l(1.000000) 219: mul r5.z, r0.w, r0.w 220: mul r0.w, r0.w, r5.z 221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000) 222: mad r0.w, r0.w, l(-2.000000), r5.z 223: frc r2.w, r2.w 224: add r2.w, -r2.w, l(1.000000) 225: mul r5.z, r2.w, r2.w 226: mul r2.w, r2.w, r5.z 227: mul r5.z, r5.z, l(3.000000) 228: mad r2.w, r2.w, l(-2.000000), r5.z 229: mad r4.w, r4.w, l(0.000000001), -r5.x 230: mad r4.w, r0.w, r4.w, r5.x 231: mad r5.x, r5.y, l(0.000000001), -r1.w 232: mad r0.w, r0.w, r5.x, r1.w 233: add r0.w, -r4.w, r0.w 234: mad r0.w, r2.w, r0.w, r4.w 235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx 236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0 237: log r4.xyz, r4.xyzx 238: mul r4.xyz, r4.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 239: exp r4.xyz, r4.xyzx 240: log r2.xyz, r2.xyzx 241: mul r2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000) 242: exp r2.xyz, r2.xyzx 243: mul r2.xyz, r2.xyzx, r4.xyzx |
我们来看看这段汇编
173行采样完starsColor,我们计算了一定的偏移数值。这个数值用于扰动采样方向,然后又采样了cubemap,做了gamma矫正乘了起来。
如此简单?考虑这个便宜数值,必须在天穹上很不一样,否则星星会一样的闪烁。
为了让偏移尽可能多样,我们需要用天穹的uv坐标和当前时间。如果你对那个吓人的ishr/xor/and不熟悉的话,看一下闪电那篇,会了解到更多整数噪声。如你所见,这里用了四次,不过和闪电不太一样的是,为了让结果更随机,使用的整数做了加和,反序。
好了我们开始
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
int getInt( float x ) { return asint( floor(x) ); } int getReverseInt( float x ) { return reversebits( getInt(x) ); } // * Inputs - UV and elapsed time in seconds float2 starsUV; starsUV.x = 100.0 * Input.TextureUV.x; starsUV.y = 50.0 * Input.TextureUV.y + g_fTime; // * Iteration 1 int iStars1_A = getReverseInt( starsUV.y ); int iStars1_B = getInt( starsUV.x ); float fStarsNoise1 = integerNoise( iStars1_A + iStars1_B ); // * Iteration 2 int iStars2_A = getReverseInt( starsUV.y ); int iStars2_B = getInt( starsUV.x - 1.0 ); float fStarsNoise2 = integerNoise( iStars2_A + iStars2_B ); // * Iteration 3 int iStars3_A = getReverseInt( starsUV.y - 1.0 ); int iStars3_B = getInt( starsUV.x ); float fStarsNoise3 = integerNoise( iStars3_A + iStars3_B ); // * Iteration 4 int iStars4_A = getReverseInt( starsUV.y - 1.0 ); int iStars4_B = getInt( starsUV.x - 1.0 ); float fStarsNoise4 = integerNoise( iStars4_A + iStars4_B ); |
4次的最后结果是:第一次 r5.x第二次 r4.w第三次 r1.w第四次 r5.y
最后我们有
1 2 3 4 5 6 7 8 9 10 11 12 |
217: frc r0.w, r0.w 218: add r0.w, -r0.w, l(1.000000) 219: mul r5.z, r0.w, r0.w 220: mul r0.w, r0.w, r5.z 221: mul r5.xz, r5.xxzx, l(0.000000001, 0.000000, 3.000000, 0.000000) 222: mad r0.w, r0.w, l(-2.000000), r5.z 223: frc r2.w, r2.w 224: add r2.w, -r2.w, l(1.000000) 225: mul r5.z, r2.w, r2.w 226: mul r2.w, r2.w, r5.z 227: mul r5.z, r5.z, l(3.000000) 228: mad r2.w, r2.w, l(-2.000000), r5.z |
这部分计算了权重的s曲线,基于uv的小数部分
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
float s_curve( float x ) { float x2 = x * x; float x3 = x2 * x; // -2x^3 + 3x^2 return -2.0*x3 + 3.0*x2; } ... // lines 217-222 float weightX = 1.0 - frac( starsUV.x ); weightX = s_curve( weightX ); // lines 223-228 float weightY = 1.0 - frac( starsUV.y ); weightY = s_curve( weightY ); |
如你期待那样,这个系数用来插值噪声并产生最后的结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
229: mad r4.w, r4.w, l(0.000000001), -r5.x 230: mad r4.w, r0.w, r4.w, r5.x float noise0 = lerp( fStarsNoise1, fStarsNoise2, weightX ); 231: mad r5.x, r5.y, l(0.000000001), -r1.w 232: mad r0.w, r0.w, r5.x, r1.w float noise1 = lerp( fStarsNoise3, fStarsNoise4, weightX ); 233: add r0.w, -r4.w, r0.w 234: mad r0.w, r2.w, r0.w, r4.w float offset = lerp( noise0, noise1, weightY ); 235: mad r2.xyz, r0.wwww, l(0.000500, 0.000500, 0.000500, 0.000000), r2.xyzx 236: sample_indexable(texturecube)(float,float,float,float) r2.xyz, r2.xyzx, t0.xyzw, s0 float3 starsPerturbedDir = dirXYZ + offset * 0.0005; float3 starsColorDisturbed = texNightStars.Sample( samplerAnisoWrap, starsPerturbedDir ).rgb |
一个对偏移计算简短的可视化
一旦我们有了starsColorDisturbed,最难的部分就结束了。
下一步是计算gamma教程并相乘
1 2 3 4 |
starsColor = pow( starsColor, 2.2 ); starsColorDisturbed = pow( starsColorDisturbed, 2.2 ); float3 starsFinal = starsColor * starsColorDisturbed; |
星星,最后
我们有了starsFinal在r1.xyz中,最后做的处理是
1 2 3 4 5 6 7 |
256: log r1.xyz, r1.xyzx 257: mul r1.xyz, r1.xyzx, l(2.500000, 2.500000, 2.500000, 0.000000) 258: exp r1.xyz, r1.xyzx 259: min r1.xyz, r1.xyzx, l(1.000000, 1.000000, 1.000000, 0.000000) 260: add r0.w, -cb0[9].w, l(1.000000) 261: mul r1.xyz, r0.wwww, r1.xyzx 262: mul r1.xyz, r1.xyzx, l(10.000000, 10.000000, 10.000000, 0.000000) |
这比前面简单多了,颜色做了幂运算,控制星星密度,最后保证颜色在1,1,1之间。
cb0[9].w用来控制可见性,因此白天是1,晚上是0.
参考资料
Reverse engineering the rendering of The Witcher 3: Index 9-12节