RSX shaders are usually used in bytecode form. The fixed hardware on the PS3 means there is no need to have a compilation and a linking steps at runtime. There are only two shader types: fragment and vertex shaders. The computational cores are not unified and two different assembler languages are used.

The fragment and vertex shaders are sufficiently different to warrant almost completely separate handling in the emulator.

The diagram below shows the process of transforming bytecode to a complete OpenGL program.

shaders

Cg Shaders

PS3 shaders are usually written in the high-level language called Cg. It was developed by Nvidia and its compiler has many different backends, including the RSX backend, that is used by the RSX GPU on PS3 consoles.

The language is similar to GLSL with mostly minor differences in types and syntax.

To illustrate the steps involved in processing a shader, the following shaders will be used.

The vertex shader:

void main
(
    float3 position : POSITION,
    float3 normal   : NORMAL,
    float4 color    : COLOR,

    uniform float4x4 ViewProjMatrix,
    uniform float4x4 ModelMatrix,
    uniform float4   clipPlane,

    out float4 ePosition : POSITION,
    out float3 oNormal   : TEXCOORD0,
    out float4 oColor    : TEXCOORD1,
    out float3 oPosition : TEXCOORD2,
    out float oClip0     : CLP0
)
{
    float4 vertexEye;
    ViewProjMatrix = transpose(ViewProjMatrix);
    ModelMatrix = transpose(ModelMatrix);
    vertexEye = mul(ModelMatrix, float4(position,1.f));
    ePosition = mul(mul(ViewProjMatrix, ModelMatrix), float4(position,1.f));
    oNormal = normal;
    oColor = color;
    oPosition = position;
    oClip0 = dot(vertexEye, clipPlane);
}

And the fragment shader:

void main
(
    float3 normal   : TEXCOORD0,
    float4 color    : TEXCOORD1,
    float3 position : TEXCOORD2,

    uniform float3    lightPosLocal,
    uniform float3    eyePosLocal,

    out float4 oColor : COLOR
)
{
    float shininess =  7.8954f;   //arbitrary shininess value
    float3 ambient  = float3( 0.7f, 0.7f, 0.7f );

    // basic diffuse lighting calculation
    float3 lightDirection = normalize(lightPosLocal - position);
    float  diffuseLight   = max( dot(normal, lightDirection),0.f );

    // Compute the specular lighting term
    float3 eyeDirection = normalize(eyePosLocal - position);
    float3 halfAngle = normalize(lightDirection + eyeDirection);
    float specularLight = pow( max(dot(halfAngle, normal), 0.f), shininess);

    // Compute the final fragment color
    float3 out_color = (color.rgb * (ambient + diffuseLight)) + specularLight;
    oColor = float4( out_color, 1.f );
}

Notice that the inputs/outputs of the shaders are not matched using variable names. They are matched using the special names like TEXCOORD0 and COLOR. Some of the inputs and outputs are reserved and are supplied by the pipeline, while others are available to the programmer, and can be used to pass data from the vertex shader to the fragment shader.

Bytecode

Cg shaders are compiled into bytecode, that is uploded separately to the RSX. The vertex outputs are assumed to match the fragment inputs.

Both types of shaders can be parameterized with uniforms aka constants. There is an important difference in how the constant uploading is implemented in the shader types.

The vertex shader consists of two memory locations. One is reserved for code and the other for constants. This makes the process of uploading the constants a matter of rewriting the memory area containing constants. No code modification is necessary.

The fragment shader doesn’t have a separate area for constants. The constants are encoding along the instructions as immediate parameters of the instructions. This means that the programmer needs to overwrite the program itself to update any uniforms. This doesn’t mean the RSX will need to flush its instruction buffers, but it forces the programmer to keep track of the positions of the uniforms inside the code. This information is preserved by the Cg compiler.

The size of the bytecode is 688 bytes for the vertex shader (without constants) and 416 bytes for the fragment shader (including uniform).

Assembly

Below is a disassembly listing of the vertex shaders

000: MOV o[8], v[3];
001: MOV o[7].xyz, v[2].xyzx;
002: MOV o[9].xyz, v[0].xyzx;
003: MOV R1.x, c[260].x;
004: MOV R1.y, c[261].x;
005: MOV R1.z, c[262].x;
006: MOV R1.w, c[263].x;
007: MOV R2.x, c[260].z;
008: MOV R2.y, c[261].z;
009: MOV R2.z, c[262].z;
010: MOV R2.w, c[263].z;
011: MOV R3.x, c[260].w;
012: MOV R3.y, c[261].w;
013: MOV R3.z, c[262].w;
014: MOV R3.w, c[263].w;
015: MOV R4.x, c[260].y;
016: MOV R4.y, c[261].y;
017: MOV R4.z, c[262].y;
018: MOV R4.w, c[263].y;
019: DPH R0.y, v[0].xyzx, R4;
020: MUL R6, R4, c[257].z;
021: MUL R7, R4, c[257].w;
022: MUL R5, R4, c[257].y;
023: MUL R4, R4, c[257].x;
024: DPH R0.w, v[0].xyzx, R3;
025: DPH R0.z, v[0].xyzx, R2;
026: DPH R0.x, v[0].xyzx, R1;
027: MAD R4, R1, c[256].x, R4;
028: MAD R5, R1, c[256].y, R5;
029: MAD R7, R1, c[256].w, R7;
030: MAD R1, R1, c[256].z, R6;
031: MAD R1, R2, c[258].z, R1;
032: MAD R6, R2, c[258].w, R7;
033: MAD R5, R2, c[258].y, R5;
034: MAD R2, R2, c[258].x, R4;
035: DP4 o[5].y, R0, c[467];
036: MAD R0, R3, c[259].x, R2;
037: MAD R2, R3, c[259].y, R5;
038: MAD R4, R3, c[259].w, R6;
039: MAD R1, R3, c[259].z, R1;
040: DPH o[0].w, v[0].xyzx, R4;
041: DPH o[0].z, v[0].xyzx, R1;
042: DPH o[0].y, v[0].xyzx, R2;
043: DPH o[0].x, v[0].xyzx, R0;

In addition to the R registers, several arrays are used. v are inputs, o are outputs, and c are constants. No vertex textures are used in this shader, so no samplers are referenced.

The fragment shader looks like this

000: MOVR R1.xyz, f[TEX2];
001: ADDR H0.xyz, -R1, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)};
003: NRMH H0.xyz, H0_n;
004: ADDR H1.xyz, -R1, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)};
006: NRMH H1.xyz, H1_n;
007: ADDR H1.xyz, H0, H1;
008: MOVR R2.xyz, f[TEX0];
009: DP3H H0.x, R2, H0;
010: NRMH H1.xyz, H1_n;
011: DP3R R0.z, R2, H1;
012: MAXR R0.w, R0.z, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)}.x;
014: MAXH H0.x, H0, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)}.x;
016: LG2R R0.y, R0.w;
017: ADDH H0.x, H0, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)}.x;
019: MULR R0.w, R0.y, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)}.x;
021: EX2R H0.y, R0.w;
022: MOVH H1.xyz, f[TEX1];
023: MADH R0.xyz, H1, H0.x, H0.y;
024: MOVH R0.w, {0x00000000(0), 0x00000000(0), 0x00000000(0), 0x00000000(0)}.x; # last instruction

Notice that the instruction set is different. There are two types of registers, that differ in their precision. They share the underlying hardware, and might conflict with each other when used without care.

The constants are embedded in code and will change at runtime.

The TEX* references has nothing to do with textures. They are just parameter names. They correspond to the outputs of the vertex shader. No textures are used in this shader.

AST

Once the disassembly is available, it can be converted into an AST. Its format is general enough to handle both types of the shaders. The majority of both vertex and fragment instructions are mapped onto GLSL functions or their combinations. Any flow control instructions are also converted into GLSL control flow operators at this point.

Partial GLSL

The AST closely corresponds to the resulting GLSL code and the transformation is purely mechanical.

The vertex shader:

v_out[8] = v_in[3];
v_out[7].xyz = (v_in[2].xyzx).xyz;
v_out[9].xyz = (v_in[0].xyzx).xyz;
r[1].x = (constants.c[260].xxxx).x;
r[1].y = (constants.c[261].xxxx).y;
r[1].z = (constants.c[262].xxxx).z;
r[1].w = (constants.c[263].xxxx).w;
r[2].x = (constants.c[260].zzzz).x;
r[2].y = (constants.c[261].zzzz).y;
r[2].z = (constants.c[262].zzzz).z;
r[2].w = (constants.c[263].zzzz).w;
r[3].x = (constants.c[260].wwww).x;
r[3].y = (constants.c[261].wwww).y;
r[3].z = (constants.c[262].wwww).z;
r[3].w = (constants.c[263].wwww).w;
r[4].x = (constants.c[260].yyyy).x;
r[4].y = (constants.c[261].yyyy).y;
r[4].z = (constants.c[262].yyyy).z;
r[4].w = (constants.c[263].yyyy).w;
r[0].y = dot(vec4((v_in[0].xyzx).xyz, 1), r[4]);
r[6] = (r[4] * (constants.c[257].zzzz));
r[7] = (r[4] * (constants.c[257].wwww));
r[5] = (r[4] * (constants.c[257].yyyy));
r[4] = (r[4] * (constants.c[257].xxxx));
r[0].w = dot(vec4((v_in[0].xyzx).xyz, 1), r[3]);
r[0].z = dot(vec4((v_in[0].xyzx).xyz, 1), r[2]);
r[0].x = dot(vec4((v_in[0].xyzx).xyz, 1), r[1]);
r[4] = ((r[1] * (constants.c[256].xxxx)) + r[4]);
r[5] = ((r[1] * (constants.c[256].yyyy)) + r[5]);
r[7] = ((r[1] * (constants.c[256].wwww)) + r[7]);
r[1] = ((r[1] * (constants.c[256].zzzz)) + r[6]);
r[1] = ((r[2] * (constants.c[258].zzzz)) + r[1]);
r[6] = ((r[2] * (constants.c[258].wwww)) + r[7]);
r[5] = ((r[2] * (constants.c[258].yyyy)) + r[5]);
r[2] = ((r[2] * (constants.c[258].xxxx)) + r[4]);
v_out[5].y = dot(r[0], constants.c[467]);
r[0] = ((r[3] * (constants.c[259].xxxx)) + r[2]);
r[2] = ((r[3] * (constants.c[259].yyyy)) + r[5]);
r[4] = ((r[3] * (constants.c[259].wwww)) + r[6]);
r[1] = ((r[3] * (constants.c[259].zzzz)) + r[1]);
v_out[0].w = dot(vec4((v_in[0].xyzx).xyz, 1), r[4]);
v_out[0].z = dot(vec4((v_in[0].xyzx).xyz, 1), r[1]);
v_out[0].y = dot(vec4((v_in[0].xyzx).xyz, 1), r[2]);
v_out[0].x = dot(vec4((v_in[0].xyzx).xyz, 1), r[0]);

The fragment shader:

r[1].xyz = f_TEX2.xyz;
h[0].xyz = ((-r[1]) + fconst.c[0]).xyz;
h[0].xyz = normalize(h[0]).xyz;
h[1].xyz = ((-r[1]) + fconst.c[1]).xyz;
h[1].xyz = normalize(h[1]).xyz;
h[1].xyz = (h[0] + h[1]).xyz;
r[2].xyz = f_TEX0.xyz;
h[0].x = dot(r[2].xyz, h[0].xyz);
h[1].xyz = normalize(h[1]).xyz;
r[0].z = dot(r[2].xyz, h[1].xyz);
r[0].w = max((r[0].zzzz), (fconst.c[2].xxxx)).w;
h[0].x = max(h[0], (fconst.c[3].xxxx)).x;
r[0].y = log2(((r[0].wwww).xxxx)).y;
h[0].x = (h[0] + (fconst.c[4].xxxx)).x;
r[0].w = ((r[0].yyyy) * (fconst.c[5].xxxx)).w;
h[0].y = exp2(((r[0].wwww).xxxx)).y;
h[1].xyz = f_TEX1.xyz;
r[0].xyz = ((h[1] * (h[0].xxxx)) + (h[0].yyyy)).xyz;
r[0].w = (fconst.c[6].xxxx).w;

The code isn’t optimized in any way. The host GPU’s driver should take care of that.

Complete GLSL

For the GLSL listing above to actually compile, more work is needed. The arrays should be defined and the variables initialized. If textures are used, the samples must be setup and texture wrapping emulated.

The resulting listing contains many details that are not immediately related to the shader rewriting, so I’m not putting it here. gcmviz can show the bytecode, disassembly and the complete GLGL listing.