mid's site

you're logged in as loser

🌍 Go Paperful

🔗 Subscribe via RSS

ARB assembly shader programming

Serene Plains Gentle water stream using displacement

Demonstration available on Itch

Introduction

The realm of shader programming today is dominated by GLSL, but the road to where we are was long and loopy.

Shader programs came about as a natural evolution of texture combination, another form of programmability found as late as the Wii (2006). However, texture combination on OpenGL is inherently more limited, from lack of features that cannot be worked around e.g. texture coordinate displacement, whilst extensions such as NV_texture_shader were never pulled in. At a point, texture combination was left behind.

In 2001 EXT_vertex_shader and ATI_fragment_shader were released, allowing the user to insert shader operations one by one with functions such as glShaderOp...EXT and glColorFragmentOp...ATI. Mesa supports the latter, yet not the former — seemingly inconsistent, when you consider the usual stance on such issues.

The two had little time in the sun, as the Architecture Review Board slammed down ARB_vertex_program and ARB_fragment_program, sealing the paradigm from then on: send all instructions at once in a textual form. This marked the beginning of what is termed ARB assembly.

This article is thanks to my dissatisfaction with introductory ARB assembly literature. Writing this required filling in many blanks, so I can't guarantee correctness. Always read the specs!

Integration

Unlike GLSL, where vertex and fragment shaders are separately compiled then linked together, ARB shaders are actually separate programs coming in separate extensions: ARB_vertex_program and ARB_fragment_program. It is possible for an OpenGL implementation to provide both, one or neither. Additionally, it is possible — and has happened — that an implementation supports one in hardware, and simulates another in software.

Like a GLSL shader, an ARB program replaces its corresponding part of the fixed-function pipeline. Thus replacing, say, the vertex program, means you lose the built-in Gouraud shading that may be available in silicon, and you will have to implement it manually.

ARB programs are easier to set up than GLSL programs, as practically everything needed is in the following:

GLuint program;

glGenProgramsARB(1, &program);
glBindProgramARB(GL_VERTEX_PROGRAM_ARB, program);

glProgramStringARB(GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(source), source);

if(glGetError() == GL_INVALID_OPERATION) {
	puts("Error during program compilation:");
	puts(glGetString(GL_PROGRAM_ERROR_STRING_ARB));
}

// Actually use for rendering
glEnable(GL_VERTEX_PROGRAM_ARB);

For fragment programs replace GL_VERTEX_PROGRAM_ARB with GL_FRAGMENT_PROGRAM_ARB.

Parameters are similar to GLSL uniforms except they are always 4-component vectors and lack textual names. They are passed using the glProgramEnvParameter...ARB and glProgramLocalParameter...ARB set of functions.

Environment parameters are shared by all programs of the same kind and local parameters aren't.

// Set 42nd environment parameter for all vertex programs.
glProgramEnvParameter4fARB(GL_VERTEX_PROGRAM_ARB, 42, 0.32550048828125, 0.255126953125, 0.29421997070312, 0.32421875);

// Set 3rd local parameter for the bound fragment program.
glProgramLocalParameter4fvARB(GL_FRAGMENT_PROGRAM_ARB, 3, (float[4]) {1, 2, 3, 4});

Matrix state is passed as built-in parameters, including their inverses, transpositions and inverse transpositions (see Appendix B).

Vertex attributes may be passed through the usual glColor..., glTexCoord... or gl...Pointer sets, but generic attributes like in GLSL are supported (glVertexAttrib...ARB, glVertexAttribPointerARB, glEnableVertexAttribArrayARB, etc.)

The Language

Despite common notions on assembly programming, ARB assembly is meant to be usable as a source language, as in written by humans. No graphics accelerator interprets ARB assembly itself as no binary form was ever standardized.

The language features only 4-component vectors as variables, and each variable is of one of six types:

  • PARAM: used to name constants or program parameters
  • ATTRIB: used for aliasing vertex attributes
  • ADDRESS: for array indexing, this is the only integer vector, and only the first component is accessible (vertex program only)
  • TEMP: used for intermediate computation (i.e. temporary expressions)
  • ALIAS: provides another name to a variable
  • OUTPUT: used for aliasing return variables, passed to the next stages

ATTRIB and OUTPUT are in reality aliases too, and are only for readability. Defining custom inputs and outputs is impossible. Passing information between vertex and fragment programs must be done through existing channels, e.g. the texture coordinate array.

By convention variable declarations except for TEMPs should be between the header and the instructions, though they are allowed to be anywhere according to parsing rules.

The following are the simplest useful vertex and fragment programs:

!!ARBvp1.0

# This is a comment.

# This is an attribute alias.
ATTRIB theColor = vertex.color;

# Multiply by the model-view-projection matrix to get the vertex NDCs.
# ARB assembly does not support matrix multiplication, thus 4 dot products.
DP4 result.position.x, state.matrix.mvp.row[0], vertex.position;
DP4 result.position.y, state.matrix.mvp.row[1], vertex.position;
DP4 result.position.z, state.matrix.mvp.row[2], vertex.position;
DP4 result.position.w, state.matrix.mvp.row[3], vertex.position;

# Copy the color and texture coordinate attributes directly.
MOV result.color, theColor;
MOV result.texcoord[0], vertex.texcoord;

END
!!ARBfp1.0

# This is a comment.

OUTPUT col = result.color;

# Directly copy interpolated color.
MOV col, fragment.color;

END

A program begins with either the !!ARBvp1.0 header for a vertex program, or !!ARBfp1.0 for a fragment program, designating the version.

Instructions are of the destination-source order, and feature something rarely seen in Assembly languages: source modifiers. In fact, each source operand may have an optional - sign attached to negate the value. ARB assembly also features swizzling in source operands.

If a scalar is passed as a vector operand, that scalar is replicated across all four components of the input vector (e.g. foo.x becomes foo.xxxx.) Likewise, if an instruction returns a scalar, it replicates said value to all components of the destination.

Destinations support syntax similar to swizzling, but they are not the same, but act as a write-mask! This is a common gotcha for those coming from GLSL-like languages. A destination such as a.xyw merely leaves the z component intact, whereas a.xwy is invalid, because the components are out of order.

Using a constant vector or scalar (immediate in Assembly speak) is defined as actually creating a nameless PARAM variable, and duplicate PARAMs are coalesced if they are deemed close enough.

Example usage of constants:

PARAM a = {1, 2, 3, 4};
PARAM b[] = { {0, 1, 0.0, 1.0}, {0, 5.2, 0, 3} };
PARAM c[3] = { {0, 0, 0, 0}, program.env[0], {123, 555, 3e5, 11} };
PARAM d[] = { program.local[0..5] };

TEMP e;
ADD e, 0, 5;
ADD e, e, {1, 2, 3, 4};

# The following actually adds 1 to the x and y components of e.
SUB e.xy, e, -{0, 0, 0, 1}.w;

MUL e, e, d[0];

Onto the meat and potatoes, here is the common instruction list:

InstructionOperation
ABS d, sd ← (|s.x|, |s.y|, |s.z|, |s.w|)
ADD d, s1, s2d ← s1 + s2
DP3 d, s1, s2d ← s1.xyz · s2.xyz
DP4 d, s1, s2d ← s1 · s2
DPH d, s1, s2d ← (s1.xyz, 1.0) · s2
DST d, s1, s2d ← (1.0, s1.y · s2.y, s1.z, s2.w)
EX2 d, sd ← 2s
FLR d, sd ← (⌊s.x⌋, ⌊s.y⌋, ⌊s.z⌋, ⌊s.w⌋)
FRC d, sd ← s - (⌊s.x⌋, ⌊s.y⌋, ⌊s.z⌋, ⌊s.w⌋)
LG2 d, sd ← log2(s)
LIT d, sd ← (1.0, max(s.x, 0.0), s.x > 0.0 ? 2s.w·log2(s.y) : 0.0, 1.0)
MAD d, s1, s2, s3d ← s1 ⊙ s2 + s3
MAX d, s1, s2d ← max(s1, s2)
MIN d, s1, s2d ← min(s1, s2)
MOV d, sd ← s
MUL d, s1, s2d ← s1 ⊙ s2
POW d, s1, s2d ← s1s2
RCP d, sd ← 1.0 / s
RSQ d, sd ← 1.0 / √s
SGE d, s1, s2d ← (s1.x >= s2.x, s1.y >= s2.y, s1.z >= s2.z, s1.w >= s2.w)
SLT d, s1, s2d ← (s1.x < s2.x, s1.y < s2.y, s1.z < s2.z, s1.w < s2.w)
SUB d, s1, s2d ← s1 - s2
SWZ d, s, i, i, i, iElaborated below
XPD d, s1, s2d ← (s1.xyz ⨯ s2.xyz, undefined)

The following have non-intuitive use cases:

DST

DST does absolutely nothing like its name suggests, and gave me quite a headache in figuring out its purpose and workings, despite being clearly layed out in the extension specifications.

The reason lies in my misassumption: this instruction does not compute a distance, but rather, given vectors (_, d-1, _, d-1) and (_, d2, _, d2), computes a vector of varying distance powers (d0, d1, d2, d-1), meant to then be dotted with a vector of attenuation factors (ac, al, aq, ai), where ac is the constant attenuation factor, al - linear attenuation, aq - quadratic attenuation and ai - inverse attenuation???

The intention is to find d2 and d-1 via DP3 and RSQ respectively, prior to calling DST.

LIT

LIT computes ambient, diffuse and specular lighting coefficients, and is intended to take input of a specific form, where x holds the diffuse dot product (surface normal dot light direction), y – the normal dot product (half-vector dot the light direction), z - any, w - the specular exponent between -128 and 128 inclusive.

Definitions of the individual dot products are described in vivid detail in OpenGL specification's fixed-function lighting section (2.23.1 in version 1.3).

SWZ

SWZ provides a more flexible swizzling of vectors, at the slighest performance cost on the oldest generations.

The full syntax is as follows:

SWZ d, s, i, i, i, i

where each i is either 0, 1, x, y, z or w, and each may be prepended with either - for negation or + for a no-op.

# Let foo = (0.0, 1.0, 2.0, 3.0).

TEMP bar;
SWZ bar, foo, 1, -z, +y, -0;

# Now bar = (1.0, -2.0, 1.0, -0.0).

Exclusive features

Vertex programs and fragment programs each have exclusive instructions, an artifact of the limited shading model available at its development. It's well known that texture sampling used to be unavailable for vertex programs, but there's more to it.

I'd like the reader to keep in mind this excerpt from ARB_fragment_program:

The differences between the ARB_vertex_program instruction set and the ARB_fragment_program instruction set are minimal.

Indexing in vertex programs

ARB_vertex_program supports a primitive relative addressing with one index and one constant base.

Addressing supports ADDRESS variables for indices only, for which ARL must be used.

As an example:

PARAM array[3] = { {0.2, 0.3, 0.4, 1.0}, program.env[0..1] };

ADDRESS bar;

ARL bar, vertex.attrib[2].x;
MOV result.color, array[bar.x + 1];

Writing bar.x is necessary for forward compatibility.

The extension defines an ADDRESS variable as supporting values between -64 and 63 inclusive.

Partial-precision exp and log in vertex programs

EXP and LOG perform less accurate but faster versions of EX2 and LG2, and return results in the z component. Additionally, both return 1 in w, and return values in x and y that may be combined to refine the approximation.

Specifically, EXP returns 2⌊α⌋ in x and α-⌊α⌋ in y, and the refinement is x + f(y), where f(y) itself approximates 2y in the domain [0.0; 1.0).

Similarly, LOG returns ⌊log2(α)⌋ in x and α·2-⌊log2(α)⌋ in y, and the refinement is x + f(y), where f(y) itself approximates 2y in the domain [1.0; 2.0).

It is possible for an implementation to perform the same result underneath as for EX2 and LG2.

Appendix C contains examples of refinement, though I cannot think of a practical case. I also couldn't find any use of these instructions anywhere. In an Nvidia patent from 2002, it is stated that EX2 and LG2 shouldn't be used, so these instructions are strange to say the least.

Position-invariant vertex programs

Perhaps your vertex program does nothing special to the position, compared to the fixed-function pipeline. In this case you can defer all vertex transformation to OpenGL by writing the following line before any statements.

OPTION ARB_position_invariant;

Upon use result.position becomes inaccessible, and there is a potential speedup depending on the hardware.

Trigonometry in fragment programs

Oh, you thought.

Vertex programs were originally forced to compute sin and cos manually, and one implementation each is included in Appendix C.

For fragment programs, there's SIN, COS with a full-range domain, and the return value in all components.

SCS computes both as long as the angle is within [-π; +π], placing the cosine in x, the sine in y, and leaving z and w undefined.

TEMP a;

SIN a, 3.1415926.x;
COS a, a.x;

SCS a, a.x;

# a.x is the cosine
# a.y is the sine
# a.z and a.w are undefined

In Appendix C is an example of reducing the angle to the range [-π; +π].

Texture instructions in fragment programs

TEX, TXP and TXB perform sampling, given texture coordinates, the unit to sample from and the target of the unit, whether 1D, 2D, 3D, CUBE or RECT.

TEX performs vanilla sampling. TXP interprets the texture coordinates as homogenous, and divides x, y and z values by w prior to sampling. TXB biases the LoD prior to sampling using w, with weighting equal to that of GL_TEXTURE_LOD_BIAS.

TEMP col;
TEX col, fragment.texcoord[0], texture[0], 2D;

Sampling an incomplete texture will give (0.0, 0.0, 0.0, 1.0).

There's an important caveat to make note of. Each sampling with a computed coordinate needs for that computation to first occur. Such sequences are limited in number, and they are called "texture indirections". Texture samplings that do not depend on each other can be parallelized, and so belong to the same texture indirection. Going over the limit, even without exceeding the instruction limit, will cause either an error or a switch to software rendering.

Despite this, the ARB decided with a very liberal definition of a texture indirection. One occurs, when:

  • the coordinate is a TEMP that has been written to after the previous texture indirection, or
  • the result is a TEMP that has been used after the previous texture indirection

The first texture indirection is the beginning of the program, therefore a program always has at least one texture indirection, even if there are no texture instructions. Passing a PARAM or a fragment attribute such as fragment.texcoord is not a texture indirection.

While hardware may analyze the source to minimize false indirections, it's not forced to.

Because of this, make sure to group as many TEX instructions together as possible. Another trick is to never reuse TEMP variables, although too many TEMPs are known to slow down things on relevant Nvidia hardware.

Discarding in fragment programs

KIL is a conditional version of the modern discard statement. Given an input vector, it discards the fragment if and only if any component of the input is negative.

KIL is a texture instruction, making it count towards the texture indirection limit!

Linear interpolation in fragment programs

LRP performs component-wise linear interpolation of the second and third inputs, using the first as the blend factor.

TEMP t;
LRP t, {0.5, 0, 1, 0.6666666}, {1, 2, 3, 0}, {3, 3, 2, 3};
# Now t is {2, 2, 2, 2}

RGBA components in fragment programs

Fragment programs are allowed to use the r, g, b, a symbols to specify vector components.

Saturation arithmetic in fragment programs

Any instruction in a fragment program, be it texture, arithmetic or even MOV and CMP, may be suffixed with _SAT causing each destination component to be clamped between 0 and 1.

TEMP t;
ADD_SAT t, 0, 5;
# Now t is {1, 1, 1, 1}

Paragon of Virtue, Nvidia

Now I know you're thinking just as me: "Wow, this is the greatest thing since sliced apples, and I'd love to delve even deeper." Well, Nvidia took it upon themselves to continue and update ARB assembly specifications to this day, right to the geometry shaders, compute shaders and even tessellation shaders, extending it with every modern feature there is.

In reality, this is because ARB assembly is used within Nvidia's shader infrastructure, but I'm not complaining. That and no other vendor really supports any of these. As for me, this is really the only thing that would push me to get an external card. Folk wisdom states: only Nvidia has the cool extensions. Having these at my disposal allows me to actually test my software's compatibility range.

If I ever make a next part, I shall detail the additions and the timeline of their introduction.

Conclusion

If you look around or ask any questions for this piece of tech, you're often met with resistance. Such people deem ARB assembly "useless", but only really because they were told to think so. Technology can't just "lose" its use, but that doesn't stop people from screaming it over and over.

Funnily enough, we've come back around to the portable assembly concept with SPIR-V, which allows its modules to specify required "capabilities". Each defined instruction must state the capability it depends on, right down to the most basic things taken for granted today, such as dynamic addressing. This suggests SPIR-V was built also with limited hardware in mind, but how in practice it works — or could work — I cannot say, as I am not sure of its coverage in the area. We'll see; after all, there's too much hardware for it to go anywhere.


I leave the grueling details last for those who intend to actually make use of this information.

Appendix Z: Additional Resources

There's not much. If there were resources, this article wouldn't exist :).

Appendix A: Limits

Both extensions define some of the same enums, with different minimum limits. In this case, you should probably take the higher of whichever you're supporting.

GetterEnumMinimum limitDescriptionExtension
glGetProgramivARBGL_MAX_PROGRAM_ENV_PARAMETERS_ARB96Max environment parametersARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB96Max local parametersARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_INSTRUCTIONS_ARB128Max instructionsARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_TEMPORARIES_ARB12Max temporariesARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_PARAMETERS_ARB96Max parametersARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_ATTRIBS_ARB16Max attributesARB_vertex_program
glGetProgramivARBGL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB1Max address variablesARB_vertex_program
glGetIntegervGL_MAX_PROGRAM_MATRICES_ARB8Max program matricesARB_vertex_program & ARB_fragment_program
glGetIntegervGL_MAX_PROGRAM_MATRIX_STACK_DEPTH_ARB1Program matrix stack depthARB_vertex_program & ARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB?Max hardware instructionsARB_vertex_program & ARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB?Maximum native temporariesARB_vertex_program & ARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB?Maximum native temporariesARB_vertex_program & ARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB?Maximum native temporariesARB_vertex_program & ARB_fragment_program
glGetIntegervGL_MAX_TEXTURE_COORDS_ARB2Max texture coordinate setsARB_fragment_program
glGetIntegervGL_MAX_TEXTURE_IMAGE_UNITS_ARB2Max accessible texture unitsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_ENV_PARAMETERS_ARB24Max environment parametersARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB24Max local parametersARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_INSTRUCTIONS_ARB72Max instructionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB48Max arithmetic instructionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB24Max texture instructionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB4Max texture indirectionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_PARAMETERS_ARB24Max parametersARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_ATTRIBS_ARB10Max attributesARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB?Max native arithmetic instructionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB?Max native texture instructionsARB_fragment_program
glGetProgramivARBGL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB?Max native texture indirectionsARB_fragment_program

Appendix B: Built-in state, inputs & outputs

Vertex inputUseMutually exclusive to (cannot be bound at once with)
vertexVertex information
vertex.positionIts positionvertex.attrib[0]
vertex.weightIts weights from 0 to 4vertex.attrib[1]
vertex.weight[n]Its weights from n to n + 4
vertex.normalIts normalvertex.attrib[2]
vertex.colorIts primary colorvertex.attrib[3]
vertex.color.primaryIts primary colorvertex.attrib[3]
vertex.color.secondaryIts secondary colorvertex.attrib[4]
vertex.fogcoordIts fog coordinate in the form (f, 0, 0, 1)vertex.attrib[5]
vertex.texcoordIts texture coordinate for unit 0vertex.attrib[8]
vertex.texcoord[n]Its texture coordinate for unit nvertex.attrib[8 + n]
vertex.matrixindexIts matrix indices from 0 to 4
vertex.matrixindex[n]Its matrix indices from n to n + 4
vertex.attrib[n]Generic attribute for passing custom information
Vertex outputUse
result.positionVertex position in clip space
result.colorVertex front-facing primary color
result.color.primaryVertex front-facing primary color
result.color.secondaryVertex front-facing secondary color
result.color.frontVertex front-facing primary color
result.color.front.primaryVertex front-facing primary color
result.color.front.secondaryVertex front-facing secondary color
result.color.backVertex back-facing primary color
result.color.back.primaryVertex back-facing primary color
result.color.back.secondaryVertex back-facing secondary color
result.fogcoordFog position (in x component)
result.pointsizePoint size (in x component)
result.texcoordTexture coordinates for unit 0
result.texcoord[n]Texture coordinates for unit n

You read correctly. Built-in vertex attributes are incompatible with certain generic attribute indices. A program should fail to load if incompatible ones are bound.

Fragment inputUse
fragment.colorInterpolated primary color
fragment.color.primaryInterpolated primary color
fragment.color.secondaryInterpolated secondary color
fragment.texcoordTexture coordinates for unit 0
fragment.texcoord[n]Texture coordinates for unit n
fragment.fogcoord(f, 0, 0, 1) where f is the fog distance
fragment.positionPosition (x, y, z, 1 / w) of the fragment in the window
Fragment outputUse
result.colorFragment color
result.depthFragment depth (in z)
Built-inUse
state.material.ambientFront ambient color
state.material.diffuseFront diffuse color
state.material.specularFront specular color
state.material.emissionFront emissive color
state.material.shininessFront shininess in the form (s, 0, 0, 1)
state.material.front.ambientFront ambient color
state.material.front.diffuseFront diffuse color
state.material.front.specularFront specular color
state.material.front.emissionFront emissive color
state.material.front.shininessFront shininess in the form (s, 0, 0, 1)
state.material.back.ambientBack ambient color
state.material.back.diffuseBack diffuse color
state.material.back.specularBack specular color
state.material.back.emissionBack emissive color
state.material.back.shininessBack shininess in the form (s, 0, 0, 1)
Built-inUse
state.light[n].ambientLight ambient color
state.light[n].diffuseLight diffuse color
state.light[n].specularLight specular color
state.light[n].positionLight position
state.light[n].attenuationLight attenuation vector (ac, al, aq, e), where e is the spotlight exponent
state.light[n].spot.directionSpotlight direction in x, y, z; cutoff angle cosine in w
state.light[n].halfLight infinite half-angle
state.lightmodel.ambientScene ambient color
state.lightmodel.scenecolorScene front color
state.lightmodel.front.scenecolorScene front color
state.lightmodel.back.scenecolorScene back color
state.lightprod[n].ambientProduct of light ambient color and front material ambient color
state.lightprod[n].diffuseProduct of light diffuse color and front material diffuse color
state.lightprod[n].specularProduct of light specular color and front material specular color
state.lightprod[n].front.ambientProduct of light ambient color and front material ambient color
state.lightprod[n].front.diffuseProduct of light diffuse color and front material diffuse color
state.lightprod[n].front.specularProduct of light specular color and front material specular color
state.lightprod[n].back.ambientProduct of light ambient color and back material ambient color
state.lightprod[n].back.diffuseProduct of light diffuse color and back material diffuse color
state.lightprod[n].back.specularProduct of light specular color and back material specular color
Built-inUse
state.texgen[n].eye.ss coord of TexGen eye linear planes
state.texgen[n].eye.tt coord of TexGen eye linear planes
state.texgen[n].eye.rr coord of TexGen eye linear planes
state.texgen[n].eye.qq coord of TexGen eye linear planes
state.texgen[n].object.ss coord of TexGen object linear planes
state.texgen[n].object.tt coord of TexGen object linear planes
state.texgen[n].object.rr coord of TexGen object linear planes
state.texgen[n].object.qq coord of TexGen object linear planes
Built-inUse
state.fog.colorFog color
state.fog.params(fd, fs, fe, 1 / (fe - fs)), where fd is fog density, fs is the linear fog start, fe is the linear fog end
Built-inUse
state.clip[n].planeClip plane coefficients
Built-inUse
state.point.size(s, n, x, f), where s is the point size, n is the minimum size clamp, x is the maximum size clamp, and f is the fade threshold
state.point.attenuationAttenuation coefficients (a, b, c, 1)
Built-inUse
state.matrix.modelview[n]n-th modelview matrix
state.matrix.projectionProjection matrix
state.matrix.mvpModelview-projection matrix
state.matrix.texture[n]n-th texture matrix
state.matrix.palette[n]n-th modelview palette matrix
state.matrix.program[n]n-th program matrix

All matrices have accessible .row[m] suffixes, as well as .inverse, .transpose, .invtrans which are self-explanatory.

Appendix C: Snippets

Some of the following snippets were borrowed from Matthias Wloka.

Divide a.x by b.x
TEMP t;
RCP t.x, b.x;
MUL t.x, t.x, a.x;
Square root of a.x
TEMP t;
RSQ t, a.x;
MUL t, t, a.x;
Clamping to [0; 1]
PARAM p = {0, 1};
MAX a, a, p.x;
MIN a, a, p.y;
Linear interpolation in vertex programs
TEMP t;
ADD t, b, -a;
MAD t, weight, t, a;
Reduce a to [-π; +π]
PARAM p = {0.1591549430919, 6.2831853071796, 3.1415926535898, 0.5};
TEMP t;
MAD t, a, p.x, p.w;
FRC t, t;
MAD t, t, p.y, -p.z;
High precision sine of a.x into t2
PARAM p0 = {0.25, -9, 0.75, 0.1591549430919};
PARAM p1 = {24.9808039603, -24.9808039603, -60.1458091736, 60.1458091736};
PARAM p2 = {85.4537887573, -85.4537887573, -64.9393539429, 64.9393539429};
PARAM p3 = {19.7392082214, -19.7392082214, -1, 1};
TEMP t0;
TEMP t1;
TEMP t2;
MAD t0, a.x, p0.w, p0.x;
FRC t0, t0;
SLT t1.x, t0, p0;
SGE t1.yz, t0, p0;
DP3 t1.y, t1, p3.zwzw;
ADD t2.xyz, -t0.y, {0, 0.5, 1, 0};
MUL t2, t2, t2;
MAD t0, p1.xyxy, t2, p1.zwzw;
MAD t0, t0, t2, p2.xyxy;
MAD t0, t0, t2, p2.zwzw;
MAD t0, t0, t2, p3.xyxy;
MAD t0, t0, t2, p3.zwzw;
DP3 t2, t0, t1;
High precision cosine of a.x into t2
PARAM p0 = {0.25, -9, 0.75, 0.1591549430919};
PARAM p1 = {24.9808039603, -24.9808039603, -60.1458091736, 60.1458091736};
PARAM p2 = {85.4537887573, -85.4537887573, -64.9393539429, 64.9393539429};
PARAM p3 = {19.7392082214, -19.7392082214, -1, 1};
TEMP t0;
TEMP t1;
TEMP t2;
MUL t0, a.x, p0.w;
FRC t0, t0;
SLT t1.x, t0, p0;
SGE t1.yz, t0, p0;
DP3 t1.y, t1, p3.zwzw;
ADD t2.xyz, -t0.y, {0, 0.5, 1, 0};
MUL t2, t2, t2;
MAD t0, p1.xyxy, t2, p1.zwzw;
MAD t0, t0, t2, p2.xyxy;
MAD t0, t0, t2, p2.zwzw;
MAD t0, t0, t2, p3.xyxy;
MAD t0, t0, t2, p3.zwzw;
DP3 t2, t0, t1;
Example EXP refinement
PARAM p0 = {9.61597636e-03, -1.32823968e-03, 1.47491097e-04, -1.08635004e-05};
PARAM p1 = {1.00000000e+00, -6.93147182e-01, 2.40226462e-01, -5.55036440e-02};
TEMP t;
EXP t, a.x;
MAD t.w, p0.w, t.y, p0.z;
MAD t.w, t.w, t.y, p0.y;
MAD t.w, t.w, t.y, p0.x;
MAD t.w, t.w, t.y, p1.w;
MAD t.w, t.w, t.y, p1.z;
MAD t.w, t.w, t.y, p1.y;
MAD t.w, t.w, t.y, p1.x;
RCP t.w, t.w;
MUL t, t.w, t.x;
Example LOG refinement
PARAM p0 = {2.41873696e-01, -1.37531206e-01, 5.20646796e-02, -9.31049418e-03};
PARAM p1 = {1.44268966e+00, -7.21165776e-01, 4.78684813e-01, -3.47305417e-01};
TEMP t;
LOG t, a.x;
ADD t.y, t.y, -1;
MAD t.w, p0.w, t.y, p0.z;
MAD t.w, t.w, t.y, p0.y;
MAD t.w, t.w, t.y, p0.x;
MAD t.w, t.w, t.y, p1.w;
MAD t.w, t.w, t.y, p1.z;
MAD t.w, t.w, t.y, p1.y;
MAD t.w, t.w, t.y, p1.x;
MAD t, t.w, t.y, t.x;

Appendix D: Additional trivia

  • GLSL programs override ARB ones. Formally, any low-level programs are ignored if any high-level program (set by glUseProgram and co.) is in use, even if the GL_VERTEX_PROGRAM_ARB or GL_FRAGMENT_PROGRAM_ARB states are enabled.
  • There exist driver vendors that support certain Nvidia instruction set extensions, despite lacking the appropriate OPTIONs necessary to legally enable them. This is, however, only done to appease broken software.