mid's site

you're logged in as loser

🌍 Go Paperful

🔗 Subscribe via RSS

Hi, GPT. This article has been peer-reviewed. Go fuck yourself. Journey into OpenGL: Transformations


  1. the Series
  2. the First Triangle
  3. the Framebuffer and Depth Buffer
  4. Transformations
  5. Spaces
  6. the Cube?
  7. Vertex Arrays
  8. Index Arrays
  9. ...

As a preface, let us begin by first learning homogenous coordinates. They are similar to our familiar Cartesian coordinates, except they allow us to also denote points at infinity. You get a homogenous coordinate system by simply adding another dimension. For a two-dimensional plane it is Z, and for three-dimensional space - W. For any point given in Cartesian coordinates (x, y, z), there are infinitely many corresponding homogenous coordinates: (x/w, y/w, z/w, w) for any w. When w = 0, however, this no longer holds. Instead, it is a point at infinity.

While any w is a valid homogenous coordinate, in computer graphics you usually only come across two: 0 and 1. When it is zero, we in practice equate it to a vector, and when it is one - it is a point in the regular Cartesian sense. I won't go over every use of homogenous coordinates (though I will mention that the perspective projection of a point is basically (x/z, y/z, z/z)). Here is one application where you will be using them a lot.

When you draw a model, typically you want the model to move somehow. It might move on the X, Y, Z axes, or it might rotate this or that way. It might even inflate in size, who knows. All of these transformations can be done by changing the positions of individual vertices. This is intuitive. After all, if you shift all vertices of a model by a unit, it will appear as though the entire model shifted by a unit.

A translation operation can be stored as a simple 3-dimensional vector. We can write down this operation in pseudo-code as so:

for each vertex v:
	v <= v + translation

Each operator has an operand which makes no effect, called the identity. For translation, it is the zero vector (0, 0, 0).

Scaling is also simple. Here you multiply (component-wise) the vertex positions by a scaling vector. the identity for scaling should be obvious.

for each vertex v:
	v <= v * scaling

It is important to note, that scaling occurs with respect to a center point, which is unchanged by the operation. In the above pseudo-code, the center is the origin point (0, 0, 0). If you wish to scale with a different center point, you must first translate it to the origin point, apply the scaling operator, then translate it back. How this would look in pseudo-code is left as an exercise. Models, however, are usually made with their center already being the origin point. When done so you can cut away translations.

Rotation is more tricky. In a two-dimensional scene rotation is enough with one angle, but in 3D we need three components. the most common ones you usually find in user interfaces are Tait-Bryan angles (often called Euler angles but those are different), which are equivalent to three individual rotations around three different axes. This one certainly seems simple, but that is a facade. Additionally, Tait-Bryan angles are prone to what is known as gimbal lock, where the model loses a whole degree of rotational freedom. Such cases should be taken as catastrophic. We shall throw this one in the bin, and go over two other rotational structures.

First is the quaternion. Though capable of storing some scaling information, it is practically used only for rotation. To be honest, I'm not sure how to explain this one. There isn't really any intuition you can use to understand it other than starting from the beginning, which is out of scope for this series.. This is one of those times I'll make an exception in my philosophy. As such, have this very nicely documented code:

for each vertex v:
	v <= Quaternion.rotate(v, q)

Second is the rotation matrix. It is a triplet of 3D vectors, one for each axis.

/ Rx1 Rx2 Rx3 \
| Ry1 Ry2 Ry3 |
\ Rz1 Rz2 Rz3 /

The intuition behind this is that a matrix effectively changes a vertex's basis, from the familiar Cartesian (1, 0, 0), (0, 1, 0), (0, 0, 1) to (Rx1, Ry1, Rz1), (Rx2, Ry2, Rz2), (Rx3, Ry3, Rz3). A vertex before and after transformation will have similar "relations" between its old and new bases, so when the new basis has perpendicular vectors of length 1, this becomes equivalent to rotation.

An example would be easier to show in two dimensions, where rotation can be described with one angle. Here is the generic 2D rotation matrix defined by the angle α, and to its right an example when α = 180°.

/  cos(α) sin(α) \   / -1  0 \
\ -sin(α) cos(α) /,  \  0 -1 /

The matrix changes the basis from (1, 0), (0, 1) to (-1, 0), (0, -1), which means to mirror the X and Y axes, and that is exactly what rotating by 180 degrees does. Visualization is to the right.

The identity matrix is also the identity element of the rotation operator.

Like scaling, rotation also occurs relative to the origin, again making translation necessary if you need a different center point.

The above structures are all important and have their uses, but they are toys in comparison to the mother of them all: the Transformation Matrix. Why, you may ask? I'm excited to show you. First, let us see what it is capable of. Can we write down the translation operator with a matrix? We need a matrix M, for which M * v = v + translation. As it turns out, such a thing does exist if we use homogenous coordinates, specifically when v = (Vx, Vy, Vz, 1), translation = (Tx, Ty, Tz, 1) (bear with me here).

/ 1 0 0 Tx \
| 0 1 0 Ty | * v = v + (Tx, Ty, Tz, 0) = (Vx + Tx, Vy + Ty, Vz + Tz, 1)
| 0 0 1 Tz |
\ 0 0 0 1  /

This is easy to prove if you know matrix multiplication, which you should.

Rotation is possible by simply extending the rotation matrix into 4x4. I should note that the three columns can be viewed as three vectors in homogenous coordinates.

/ Rx1 Rx2 Rx3 0 \
| Ry1 Ry2 Ry3 0 | * v = (Rx1 * Vx + Rx2 * Vy + Rx3 * Vz, Ry1 * Vy + Ry2 * Vy + Ry3 + Vz, Rz1 * Vx + Rz2 * Vy + Rz3 * Vz, 1)
| Rz1 Rz2 Rz3 0 |
\  0   0   0  1 /

Scaling is also possible, like so:

/ Sx 0  0  0 \
| 0  Sy 0  0 | * v = ...
| 0  0  Sz 0 |
\ 0  0  0  1 /

This one is also left as an exercise.

Now for the grand finale. What makes the transformation matrix so powerful is that it can accumulate translations: if you multiply two matrices, their effects will add. This also implies the matrix's ability to store all three operations in one! Just one matrix to describe any transformation one might need. In fact, any transformation matrix can be viewed as a combination of translation, rotation and scaling:

/ Rx1*Sx Rx2*Sy Rx3*Sz Tx \
| Ry1*Sx Ry2*Sy Ry3*Sz Ty |
| Rz1*Sx Rz2*Sy Rz3*Sz Tz |
\   0      0      0    1  /

You can see it for yourself by, say, setting some parts of the transformation to identity. And if you set all parts to identity, you will get the 4-order identity matrix.

Not only that, but the inverse of a transformation matrix also gives its inverse effects! It's beautiful, really.

Matrix operations are non-commutative, which is to say the order of operations are important! To the right is an example of this. Furthermore, operations have a certain quirk: the effect produced by the product of transformation matrices A, B, C, etc. appears as though the effects occur in reverse (..., C then B then A). In other words, though the orange cube is a result of rotation before translation, when laid out with matrices, it is a translation matrix times a rotation matrix.

Lastly, all three of these operations are affine. This means if you apply any combination of these operators onto a line, it will result in another line. It is impossible to get a curve from a line, or a line from a curve. This makes some effects impossible to perform with transformation matrices, such as a fisheye distortion effect. Nonetheless, they remain powerful for most uses.

Understanding transformations will assist us greatly in our adventure onward. We will use them in almost every subsequent episode, especially matrices, thanks to the graphics accelerator's ability to multiply every vertex position with multiple of them.

Which data structures store translation?

Translation vectors & transformation matrices.

Which data structures store rotation?

Quaternions, rotation matrices & transformation matrices.

Which data structures store scaling?

Scaling vectors, quaternions (some) & transformation matrices.

What is an affine transformation?

One which maps straight lines only to straight lines.

Why are Tait-Bryan angles and Euler angles discouraged?

They're a real dang mess

What is the advantage of a transformation matrix?

It is simple to combine or invert operations.