Transformations#
In graphics applications, it is common to apply transformations such as scaling, rotation, and translation to objects before displaying them on the screen. For example, let’s consider a scenario where you want to move an object to a different position in the scene. In order to achieve this, you need to move all the vertices of the object by applying a transformation on them. Matrices are mathematical entities that represent transformations. This means that if we treat vertex positions as vectors in a 3D space, we can multiply them by a matrix to change their position, orientation and magnitude in the scene. In the example given, we would perform a matrix multiplication between each vertex position \(\mathbf{v}=(x,y,z)\) stored in the vertex buffer and a translation matrix \(T\) in order to change its coordinates.
Here, \(\mathbf{w}\) represents the new position of the vertex \(\mathbf{v}\) after the transformation \(T\).
Now, it’s interesting how we can visualize it in two different ways. Obviously, we can move (translate) \(\mathbf{v}\) until its coordinate match those of \(\mathbf{w}\). Alternatively, we can translate the entire frame to achieve the same result. In Vectors, we established that points (and therefore positions) are vectors bound to the origin of a frame, which can be described by the coordinates of the corresponding arrowheads. In the illustration below, on the left side, we translate the arrowhead of \(\mathbf{v}\) by changing its coordinates, resulting in the vector\position \(\mathbf{w}\). On the right side of the illustration, we apply the same translation to the origin of the frame. As you can observe, the coordinates of \(\mathbf{v}\) with respect to the transformed frame A remain the same because the vector shares the same fate (transformation) as its frame of reference. However, now the original frame B “see” \(\mathbf{v}\) as being applied to the origin of frame A. Consequently, the coordinates of \(\mathbf{v}\) with respect to the original frame B are the same as those of \(\mathbf{w}\). Thus, we can consider the transformed frame A as the starting frame, while the original frame B as the new frame of reference.
The same concept applies when rotating or scaling a vertex position. As we will explore in this tutorial, vector transformations and changes of coordinate systems are mathematically equivalent. In other words, transforming an object is equivalent to transforming its frame of reference, and vice versa.
For a more extensive and detailed presentation on transformations, you may find valuable information in [FH21] and [SE02].
Linear transformations#
A transformation \(L\) is linear if it satisfies the following two conditions for any two vectors \(\mathbf{u}\) and \(\mathbf{v}\), and any scalar \(k\).
In Vectors, it was demonstrated that a generic bound vector \(\mathbf{v}=(x,y,z)\) can be expressed as a linear combination as follows:
Here, \(\mathbf{i}\), \(\mathbf{j}\), and \(\mathbf{k}\) are the basis vectors that describe a frame, and \(x\), \(y\), and \(z\) are the coefficients that determine the magnitude and direction of \(\mathbf{v}\) along each basis vector. Therefore, following the linearity conditions illustrated above, we can express the transformation of \(\mathbf{v}\) as:
Here, \(\mathbf{f}\), \(\mathbf{g}\), and \(\mathbf{h}\) represent the transformed standard basis vectors. These vectors correspond to the standard basis vectors of the transformed frame. It’s important to note that the coordinates of \(\mathbf{f}\), \(\mathbf{g}\), and \(\mathbf{h}\) are expressed with respect to the original frame. For example, in the illustration below, a linear transformation \(L\) is applied to the standard basis vectors \(\mathbf{i}=(1,0)\) and \(\mathbf{j}=(0,1)\) of a frame B. This create a new frame A (since the transformed standard basis vector will represent the standard basis vector in a new coordinate system). However, observe that, in the transformed frame A, the coordinates of the transformed \(\mathbf{i}\) remain \((1,0)\) ─ the same applies to \(L(\mathbf{i})\) in frame A. On the other hand, we have \(L(\mathbf{i})=\mathbf{f}=(x',y')\) in frame B, which represents the original frame ─ again, the same applies to \(L(\mathbf{i})\) in frame B. Therefore, we can consider the transformed frame A as our initial starting point, where \(\mathbf{i}\) and \(\mathbf{j}\) were \((1,0)\) and \((0,1)\), while the frame B represents the final result after applying the transformation \(L\) to both \(\mathbf{i}\) and \(\mathbf{j}\).
Then, we can conclude that applying a linear transformation to a vector \(\mathbf{v}\) is equivalent to applying the same transformation to the initial frame, represented by its standard basis vectors. This is crucial because it means that transforming a vector \(\mathbf{v}\) with a linear transformation can be achieved by transforming the standard basis vectors of the frame in which the vector is defined.
Furthermore, it has been shown that the coordinates of the transformed vector \(\mathbf{w}\) can be obtained as a linear combination of the transformed basis vectors, with the components of \(\mathbf{v}\) used as coefficients. From Matrices, we know that a linear combination can be expressed as the product of a row vector and a column vector, which is related to the concept of dot product. Therefore, we can express the transformation as follows:
where \(\mathbf{v}=(x, y, z)\), \(\mathbf{m}=(\mathbf{f},\mathbf{g},\mathbf{h})\), \(\mathbf{f}=\left\lbrack\matrix{M_{00}&M_{01}&M_{02}}\right\rbrack\), \(\mathbf{g}=\left\lbrack\matrix{M_{10}&M_{11}&M_{12}}\right\rbrack\) and \(\mathbf{h}=\left\lbrack\matrix{M_{20}&M_{21}&M_{22}}\right\rbrack\).
That’s exactly what we have seen in Matrices. Indeed, if we perform the multiplication between the row vector \(\mathbf{v}\) and the matrix \(\mathbf{M}\), we have
We just found that to transform a vector \(\mathbf{v}\) we need to multiply it by a matrix \(\mathbf{M}\) whose rows are the transformed standard basis vectors. In other words, we can express this relationship as:
where \(\mathbf{M}\) is the matrix that transforms a vector \(\mathbf{v}\) into another vector \(\mathbf{w}\).
However, we can also interpret \(\mathbf{M}\) as the matrix to go from a frame A (the starting one) to a frame B (the new one), which allows a vector \(\mathbf{v}\) bound to the origin of frame A to be expressed with respect to frame B. We can demonstrate this intricated conclusion with a simple illustration.
On the left side of the image above, we have a vector \(\mathbf{p}_A=(3,2)\) applied to the origin of a frame A. Recall that a vector can be expressed as a linear combination of the standard basis vectors, using its components as coefficients. Thus, in frame A, we have:
This representation corresponds to the diagonal of the parallelogram formed by scaling the vectors \(\mathbf{i}\) and \(\mathbf{j}\) using the components of \(\mathbf{p}_A\) as coefficients.
Now, imagine scaling frame A twice along both the x-axis and y-axis, followed by a rotation around the origin. This transformation results in the vector \(\mathbf{p}_A\) being doubled in size and rotated with respect to the original frame, which can therefore be considered as a new frame B because in the transformed one (A) the coordinates of \(\mathbf{p}_A\) are still \((3,2)\). As stated earlier, \(L(\mathbf{i})\) and \(L(\mathbf{j})\) are expressed in coordinates with respect to the original frame B. Now, our goal is to demonstrate that using these transformed basis vectors as rows of a matrix, we can compute the coordinates of \(\mathbf{p}_A\) with respect to frame B. In other words, we aim to show that this matrix enables us to transition from frame A to frame B. As depicted on the right side of the illustration above, we have that
That is, \(\mathbf{p}_A\) with respect to the frame B is the diagonal of the transformed standard basis vectors, scaled using the components of \(\mathbf{p}_A\). In other words, the coordinates of \(\mathbf{p}_A\) with respect to frame B can be expressed as a linear combination of the transformed basis vectors, using the components of \(\mathbf{p}_A\) as coefficients. This is exactly what we formally proved in equation \(\eqref{eq:ATransforms1}\), where we used the transformed basis vectors as rows of the matrix \(\mathbf{M}\).
Thus, we have just demonstrated that \(\mathbf{M}\) is the matrix to express the coordinates of a vector in a frame A with respect to a frame B. That is, we have shown that \(\mathbf{M}\) allows to go from frame A (the original one) to frame B (the transformed one). At this point, it should come as no surprise that the inverse matrix \(\mathbf{M}^{-1}\) represents the transformation to go from frame B back to frame A.
Scaling#
In a 3D Cartesian coordinate system, vectors can be scaled independently in three directions by scaling their respective components. That is, given a scaling \(S\) and a vector \(\mathbf{v}=(x,y,z)\) we have
Here, \(s_x\), \(s_y\), and \(s_z\) represent the scaling factors associated with the x, y, and z directions, respectively. This allows us to stretch or shrink vectors in each direction independently, altering their overall size and proportions.
The scaling is uniform if the scaling factors are equal \((s_x=s_y=s_z)\), otherwise it’s non-uniform. We can formally prove the scaling is a linear transformation by showing that it satisfies the two conditions of linearity.
Proof
We know that a linear transformation can be associated with a matrix whose rows are the transformed standard basis vectors. In 3D Cartesian coordinate systems, we have that
Then, the matrix \(\mathbf{S}\) associated with a scaling is
This matrix is associated with the scaling operation. Consequently, we can apply scaling to any 3D vector by multiplying it with the scaling matrix \(\mathbf{S}\). If the intention is to restore the vector to its original size, we can accomplish this by multiplying the scaled vector by the inverse of the scaling matrix, denoted as \(\mathbf{S}^{-1}\).
Given a minimum point \(\mathbf{p}=(-2,0,-2)\) and a maximum point \(\mathbf{q}=(2,0,2)\) defining a square, if you want to scale it by a factor of \(2\) along the x-axis, \(0.5\) along the z-axis, and leave the y-coordinate unchanged, you can use the corresponding scaling matrix:
To scale the square, we need to multiply both \(\mathbf{p}\) and \(\mathbf{q}\) by \(\mathbf{S}\).
The following illustration shows the result of these transformations
Rotation#
A rotation is a linear transformation, but a formal proof of this won’t be provided since it is simpler to observe, from the illustration below, that the rotation of the sum of two vectors (that is, the rotation of the diagonal of the parallelogram defined by the two vectors) is equivalent to the sum of rotations of the two vectors (that is, the diagonal of the rotated parallelogram). At the same time, rotating a uniformly scaled vector is equivalent to first rotating the vector and then applying the uniform scaling.
Finding the matrix associated with a rotation can be slightly more challenging compared to scaling. However, for the purpose of our discussion, we will consider \(R_\mathbf{n}(\mathbf{v})\) as a clockwise rotation of a vector \(\mathbf{v}\) around a unit vector \(\mathbf{n}\) by an angle \(\theta\). That is, you will see the vector \(\mathbf{v}\) rotating clockwise when looking down the positive direction of the unit vector \(\mathbf{n}\), which determines the axis of rotation. Obviously, the result of \(R_\mathbf{n}(\mathbf{v})\) is the rotated version of the vector \(\mathbf{v}\) passed as an argument.
If a counterclockwise rotation is desired, the angle \(\theta\) can simply be negated. Additionally, if \(\mathbf{n}\) is not a unit vector, we can always normalize it, allowing us to extend our results to the more general case.
Now, let’s consider the left side of the illustration below.
From Vectors, we know that \(\mathbf{n}\times\mathbf{v}\) is orthogonal to the plane defined by \(\mathbf{n}\) and \(\mathbf{v}\). During the rotation, the vector \(\mathbf{v}\) traces out an arc of a circle, which is also orthogonal to the plane defined by \(\mathbf{n}\) and \(\mathbf{v}\). Therefore, the vector \(\mathbf{n}\times\mathbf{v}\) and the circle drawn by \(\mathbf{v}\) lie in the same plane.
Now, let’s consider the projection of the vector \(\mathbf{v}\) onto \(\mathbf{n}\). It can be expressed as the difference between two vectors; this result can also be derived from Gram-Schmidt Orthogonalization.
where \(\mathbf{v}_\bot\) is the orthogonal component of \(\mathbf{v}\) with respect to \(\mathbf{n}\). Therefore, we have that
The vector \(\mathbf{v}_\bot\) lies in the plane defined by \(\mathbf{n}\) and \(\mathbf{v}\) (since it’s the subtraction of these two vectors) and therefore it’s orthogonal to \(\mathbf{n}\times\mathbf{v}\). \(\ R_\mathbf{n}(\mathbf{v}_\bot)\) is the orthogonal component of \(R_\mathbf{n}(\mathbf{v})\) with respect to \(\mathbf{n}\). Moreover, observe that
That is, \(\mathbf{n}\times\mathbf{v}\) and \(\mathbf{v}_\bot\) have the same length, and are orthogonal to each other. So, they can define a 2D frame (just like the standard basis vectors \(\mathbf{i}\) and \(\mathbf{j}\)) where we can compute the coordinates of \(R_\mathbf{n}(\mathbf{v}_\bot)\), which can be considered as the rotation of the basis vector \(\mathbf{v}_\bot\) about \(\mathbf{n}\). This means that \(R_\mathbf{n}(\mathbf{v}_\bot)\) has the same length of both \(\mathbf{n}\times\mathbf{v}\) and \(\mathbf{v}_\bot\). As you can see in the image above (on the right), the orthogonal projection of \(R_\mathbf{n}(\mathbf{v}_\bot)\) onto \(\mathbf{v}_\bot\) is
In a similar way, the orthogonal projection of \(R_\mathbf{n}(\mathbf{v} _\bot)\) onto \(\mathbf{n}\times\mathbf{v}\) is
Therefore, we have that
Now, we can compute \(R_\mathbf{n}(\mathbf{v})\) as a sum of \(\text{proj}_\mathbf{n}(\mathbf{v})\) and \(R_\mathbf{n}(\mathbf{v}_\bot)\).
As you can see, the orthogonal component \(\mathbf{v}_\bot\) has disappeared from the equation, so we can compute any rotation by knowing only the vector to rotate \((\mathbf{v})\), the angle of rotation \((\theta)\), and the unit vector \((\mathbf{n})\) indicating the axis of rotation.
At this point, we can transform the standard basis vectors to compute the row vectors that make up the rotation matrix \(\mathbf{R}_\mathbf{n}\).
where \(\mathbf{n}=(x, y, z)\), \(\mathbf{i}=(1,0,0)\), \(\mathbf{j}=(0,1,0)\), and \(\mathbf{k}=(0,0,1)\).
Therefore, we have that
where \(c=\cos\theta\), and \(s=sin\theta\).
For example, if we want to rotate about the x-, y- and z- axes, we need to set \(\mathbf{n}=\mathbf{i}=(1,0,0)\), \(\mathbf{n}=\mathbf{j}=(0,1,0)\), and \(\mathbf{n}=\mathbf{k}=(0,0,1)\), respectively. So, the corresponding rotation matrices are
Given a minimum point \(\mathbf{p}=(-2,0,-2)\) and a maximum point \(\mathbf{q}=(2,0,2)\) of a square, suppose you want to rotate it \(-45°\) clockwise (that is, \(45°\) counterclockwise ) about the y-axis. The corresponding rotation matrix is:
To rotate the square we need to multiply \(\mathbf{p}\) and \(\mathbf{q}\) by the rotation matrix \(\mathbf{R}_y\).
The following illustration shows the result of these transformations
You can easily verify that the rows of \(\mathbf{R}_x\) are unit vectors and orthogonal to each other (i.e., they make up an orthonormal set). The same applies to the rows of \(\mathbf{R}_y\), \(\mathbf{R}_z\) and the general rotation matrix \(\mathbf{R}_\mathbf{n}\) (although, this last one is a little trickier to prove). A matrix whose rows compose an orthonormal set is called orthogonal. Orthogonal matrices are of particular interest since their inverse is equal to their transpose. That is, we have that
Affine transformations#
Now that we know how to scale and rotate vectors, we’d also like to move them. However, we face two challenges when it comes to moving vectors:
We certainly can use translation to move bound vectors (points, e.g., vertex positions) to relocate 3D objects in the scene. However, for free vectors, this transformation doesn’t make any sense since direction and magnitude don’t change after a translation (the point of application for free vectors is irrelevant). Therefore, we need a way to distinguish between points (bound vectors) and (free) vectors.
Moving a point can’t be expressed as a linear combination of the standard basis vectors. In other words, translation is not a linear transformation, so we can’t associate a \(3\times 3\) matrix with it. Fortunately, we can still find a way to incorporate translations into our matrix equation \(\mathbf{w}=\mathbf{vM}\).
Affine transformations extend linear ones by adding translations. The next section will explore this type of transformation, aiming to resolve the aforementioned issues.
Translation#
To move a bound vector (point) \(\mathbf{p}=(p_x, p_y, p_z)\) using a translation \(T\), we simply add a displacement vector \(\mathbf{t}=(t_x, t_y, t_z)\) to the point.
For example, let’s consider a 2D point \(\mathbf{p}=(3, 3)\). If we want to translate it by \(+5\) units along the x-axis and \(+2\) units along the y-axis, we can add the displacement vector \(\mathbf{t}=(5, 2)\) to \(\mathbf{p}\), as shown in the following illustration.
At the same time, we can translate the frame by using the same displacement vector \(\mathbf{t}\), that now specifies the offset where to move the origin of the translated frame A (the starting one) with respect to the original frame B (the new one). In frame A, the coordinates of the bound vector (point) \(\mathbf{p}\) are still the same, while in frame B the coordinates of \(\mathbf{p}\) can be seen as the sum of \(\mathbf{p}\) and \(\mathbf{t}\).
So, it seems that translation of vectors and translation of coordinate systems are mathematically equivalent, just like with linear transformations (in the next section we will formally prove that this is true in general for affine transformations). This means that we can translate a frame to apply the same transformation to vectors defined in that frame. So far so good, but the question is: can we find a \(3\times 3\) matrix \(\mathbf{T}\) to associate with translations in 3D spaces? Unfortunately, the answer is no.
To understand why, just take a look again at the definition of \(T(\mathbf{p})\) provided in equation \(\eqref{eq:ATransforms2}\). It’s a simple sum of vectors. On the other hand, in a vector-matrix multiplication, we have the dot product of two vectors (the row vector and a column of the matrix). Therefore, there is no way we can find a \(3\times 3\) matrix \(\mathbf{T}\) so that \(\mathbf{w}=\mathbf{p}+\mathbf{t}=\mathbf{pT}\). This is due to the fact that translations are not linear transformations.
Proof
Therefore, \(T(\mathbf{u}+\mathbf{v}) \neq T(\mathbf{u})+T(\mathbf{v})\), indicating that the first condition of linearity does not hold.
However, we can still find a way to incorporate translations into our matrix equation \(\mathbf{w}=\mathbf{vM}\) by employing homogeneous coordinates. This allows us to mantain the form \(\mathbf{w}=\mathbf{p}+\mathbf{t}=\mathbf{pT}\), where \(\mathbf{T}\) is a translation matrix.
Homogeneous coordinates#
As explained in Vectors, homogeneous coordinates introduce an “extra” coordinate. This means that starting from a 3D Cartesian systems, we step into the 4-th dimension. Fortunately, we just pop in to pick up what we need, and leave immediately after.
A point in 3D space can be represented in homogeneous coordinates by the tuple \((x,y,z,w)\), where the related 3D Cartesian coordinates are \((x/w,\ y/w,\ z/w,\ w/w)\). So, the generic 3D Cartesian coordinates \((x, y, z)\) can be written as \((x, y, z, 1)\) in homogeneous coordinates. This way, we can use the first three components as regular 3D Cartesian coordinates, and the last component to differentiate between points and vectors.
Points: \(\ (x, y, z, 1)\)
Vectors: \(\ (x, y, z, 0)\)
We need this distinction between points and vectors as we want to apply translations to points without affecting vectors. Now, since we are using four components, we can try to find a \(4\times 4\) matrix \(\mathbf{T}\) associated with translations in 3D spaces. Let’s start with the identity matrix, using the offset of the translated frame \(\mathbf{t}=(t_x, t_y, t_z, 1)\) as the last row.
Let’s see what happens if we multiply a generic point \(\mathbf{p}=(p_x, p_y, p_z, 1)\) by \(\mathbf{T}\).
That’s exactly the definition of \(T(\mathbf{p})\) in equation \(\eqref{eq:ATransforms2}\), but expressed in homogeneous coordinates (that is, with the last component equal to \(1\)). Now, let’s see what happens if we multiply a generic vector \(\mathbf{v}=(v_x, v_y, v_z, 0)\) by \(\mathbf{T}\).
The vector \(\mathbf{v}\) is not affected by the translation. It looks like we just found a matrix associated with translation of points, but that does not affect directions. Of course, the inverse of the translation matrix is
However, we’d like to also include linear transformations (scaling and rotation). For this purpose, we can try to embed the \(3\times 3\) matrix \(\mathbf{A}\) associated with linear transformations in the upper left position of \(\mathbf{T}\) as follows:
Now, if we multiply a generic vector \(\mathbf{v}=(x, y, z, 0)\) by \(\mathbf{M}\) we have
where \(\mathbf{f}=(A_{00}, A_{01}, A_{02}, 0)\), \(\mathbf{g}=(A_{10}, A_{11}, A_{12}, 0)\) and \(\mathbf{h}=(A_{20}, A_{21}, A_{22}, 0)\) are the linearly transformed standard basis vectors \(\mathbf{i}=(1,0,0,0)\), \(\mathbf{j}=(0,1,0,0)\) and \(\mathbf{k}=(0,0,1,0)\), while \(\mathbf{t}=(t_x, t_y, t_z, 1)\) is still the position of the translated frame (the starting one) with respect to the original frame (the new one).
As you can see, the translation doesn’t affect the vector, and the result is the same we showed in equation \(\eqref{eq:ATransforms1}\): \(\mathbf{w}=\mathbf{vM}\). Indeed, when you transform vectors with \(\mathbf{M}\), these can only be affected by the linear transformation embedded within the matrix. Moreover, the result is still a vector: the last component is zero because it’s the result of the dot product between \(\mathbf{v}=(v_x, v_y, v_z, 0)\) and \((0,0,0,1)\), the last column of \(\mathbf{M}\).
On the other hand, if we multiply a generic point \(\mathbf{q}=(x, y, z, 1)\) by \(\mathbf{M}\) we have
Here, we transform \(\mathbf{q}\) with a linear transformation, which results in an intermediate vector \(\mathbf{q}_1=x\mathbf{f}+y\mathbf{g}+z\mathbf{h}\) to which we add the translation \(\mathbf{t}\). That’s exactly what we stated in equation \(\eqref{eq:ATransforms2}\): a translation is the sum of a point ad a vector. However, in equation \(\eqref{eq:ATransforms3}\) the point is \(\mathbf{t}\), as it specifies the origin of the starting frame with respect to the new one. Also, the result is still a point: the last component is \(1\) because it’s the result of the dot product between \(\mathbf{q}=(x, y, z, 1)\) and \((0, 0, 0, 1)\), the last row of \(\mathbf{M}\).
Furthermore, the difference of two points is the vector from a point to the other one. Indeed, from equation \(\eqref{eq:ATransforms3}\), we have that
which is a vector, as it’s not bound to the origin of the frame of reference (see the illustration below, on the left side). On the other hand, the sum of two points is a point as well, but it doesn’t make any sense (unlike the sum of two vectors, which is the resultant force, direction or speed).
In conclusion, the matrix \(\mathbf{M}\) we built in this section can be used to transform both vectors and points with affine transformations, as it embeds a linear transformation, plus a translation. The first three rows represent the transformed standard basis vectors, while the four-th is a displacement that specifies the origin of the starting frame with respect to a new one. Most importantly, \(\mathbf{M}\) still allows us to express the coordinates of a vector in a starting frame with respect to a new one. As explained at the beginning of the tutorial, this implies that \(\mathbf{M}\) is a matrix that allows us to go from a frame to another, which means that change of coordinates and change of coordinate system are still mathematically equivalent in the context of affine transformations.
Composition of transformations#
Imagine you want to scale, rotate and translate a vector \(\mathbf{v}\), obtaining a vector \(\mathbf{w}\). Mathematically, this can be expressed in matrix form as follows:
The vector \(\mathbf{v}\) and the resultant vector \(\mathbf{w}\) can be considered row vectors of dimension \(1\times n\), while the matrices \(\mathbf{S}\), \(\mathbf{R}\) and \(\mathbf{T}\) are all \(n\times n\).
In vector-matrix multiplication, we need to perform \((2n-1)\) operations for each component of the resultant vector: \((n-1)\) sums and \(n\) products. So, to obtain the total operations involved, we must multiply \((2n-1)\) by the \(n\) components of the resultant vector. Therefore, we have \(n(2n-1)\) operations to perform each vector-matrix multiplication. In equation \(\eqref{eq:ATransforms4}\), we need to perform three of these multiplications, so the total cost is \(3n(2n-1)=6n^2-3n=O(n^2)\).
However, thanks to the associative property of matrix multiplication we can re-write equation \(\eqref{eq:ATransforms4}\) as follows:
Now, we have two matrix multiplications and a vector-matrix multiplication.
Each matrix multiplication needs \((2n-1)\) operations for each element of the resultant matrix. So, to obtain the total operations involved, we must multiply \((2n-1)\) by the \(n^2\) elements of the resultant matrix. Therefore, we have \((2n-1)n^2\) operations to perform a matrix multiplication. In equation \(\eqref{eq:ATransforms5}\), we have two of these multiplications, plus a vector-matrix multiplication, so the total cost is \(2(2n^3-n^2)+n(2n-1)=4n^3-n=O(n^3)\).
It seems that we need more operations to perform if we first multiply the matrices. However, imagine you want to transform 10 thousand vectors by the same matrices \(\mathbf{S}\), \(\mathbf{R}\) and \(\mathbf{T}\). In the first case we need to perform \(10000(3n(2n-1))\) operations, while in the second case we only need \(10000(n(2n-1))+2(2n^3-n^2)\) operations, as the double matrix multiplication \((\mathbf{SRT})\) must be computed only once. That is, we can reuse the result of \((\mathbf{SRT})\) as a matrix to transform every vector. For example, if \(n=4\), we have \(10000(3n(2n-1))=840\text{k}\), while \(10000(n(2n-1))+2(2n^3-n^2)=280\text{k}\).
In conclusion, it is strongly recommended to transform vectors with a matrix which is a composition of all the transformations you want to apply to the vectors.
Transformation in DirectX#
DirectXMath offers convenient helper functions for constructing \(4\times 4\) matrices to transform (i.e., scale, rotate and translate) both points (e.g., vertex positions) and vectors (e.g., light directions) in homogeneous coordinates. These functions often come in two flavors:
No-Intrinsics implementation: Written in standard C++, utilizing scalar operations. It works on any CPU regardless of its architecture, but may be slower than alternative options.
SSE-Intrinsics implementation: Leverages the power of Single Instruction, Multiple Data (SIMD) instructions available on modern CPUs, exposed by intrinsics functions provided by the Microsoft C++ compiler. This allows the CPU to perform calculations on multiple data elements simultaneously, resulting in significant performance gains.
Scaling#
XMMatrixScaling returns a matrix associated with a scaling operation, similar to the one examined in Scaling.
inline XMMATRIX XM_CALLCONV XMMatrixScaling
(
float ScaleX,
float ScaleY,
float ScaleZ
) noexcept
{
#if defined(_XM_NO_INTRINSICS_)
XMMATRIX M;
M.m[0][0] = ScaleX;
M.m[0][1] = 0.0f;
M.m[0][2] = 0.0f;
M.m[0][3] = 0.0f;
M.m[1][0] = 0.0f;
M.m[1][1] = ScaleY;
M.m[1][2] = 0.0f;
M.m[1][3] = 0.0f;
M.m[2][0] = 0.0f;
M.m[2][1] = 0.0f;
M.m[2][2] = ScaleZ;
M.m[2][3] = 0.0f;
M.m[3][0] = 0.0f;
M.m[3][1] = 0.0f;
M.m[3][2] = 0.0f;
M.m[3][3] = 1.0f;
return M;
#elif defined(_XM_SSE_INTRINSICS_)
XMMATRIX M;
M.r[0] = _mm_set_ps(0, 0, 0, ScaleX);
M.r[1] = _mm_set_ps(0, 0, ScaleY, 0);
M.r[2] = _mm_set_ps(0, ScaleZ, 0, 0);
M.r[3] = g_XMIdentityR3.v;
return M;
#endif
}
In the previous code snippet, g_XMIdentityR3
is a XMVECTORF32 global constant defined as {0.0f, 0.0f, 0.0f, 1.0f}
. In contrast, the _mm_set_ps
intrinsic function returns a __m128 value initialized (in a single instruction) using the floating-point values provided as arguments. Below, you’ll find the declaration of the _mm_set_ps
function, along with a concise explanation of how its parameters map to the bits of the return value (denoted as dst
). In particular, observe that the last parameter e0
is used to initialize the least significant 32 bits of the returned __m128 value, corresponding to the bits of the first component of a 4-component vector of floating-point values.
__m128 _mm_set_ps (float e3, float e2, float e1, float e0);
dst[31:0] = e0
dst[63:32] = e1
dst[95:64] = e2
dst[127:96] = e3
Rotation#
XMMatrixRotationNormal returns a matrix that represents a rotation around an axis specified by a unit vector. The resultant matrix is similar to the one examinated in Rotation.
1inline XMMATRIX XM_CALLCONV XMMatrixRotationNormal
2(
3 FXMVECTOR NormalAxis,
4 float Angle
5) noexcept
6{
7#if defined(_XM_NO_INTRINSICS_)
8
9 float fSinAngle;
10 float fCosAngle;
11 XMScalarSinCos(&fSinAngle, &fCosAngle, Angle);
12
13 XMVECTOR A = XMVectorSet(fSinAngle, fCosAngle, 1.0f - fCosAngle, 0.0f);
14
15 XMVECTOR C2 = XMVectorSplatZ(A);
16 XMVECTOR C1 = XMVectorSplatY(A);
17 XMVECTOR C0 = XMVectorSplatX(A);
18
19 XMVECTOR N0 = XMVectorSwizzle<XM_SWIZZLE_Y, XM_SWIZZLE_Z, XM_SWIZZLE_X, XM_SWIZZLE_W>(NormalAxis);
20 XMVECTOR N1 = XMVectorSwizzle<XM_SWIZZLE_Z, XM_SWIZZLE_X, XM_SWIZZLE_Y, XM_SWIZZLE_W>(NormalAxis);
21
22 XMVECTOR V0 = XMVectorMultiply(C2, N0);
23 V0 = XMVectorMultiply(V0, N1);
24
25 XMVECTOR R0 = XMVectorMultiply(C2, NormalAxis);
26 R0 = XMVectorMultiplyAdd(R0, NormalAxis, C1);
27
28 XMVECTOR R1 = XMVectorMultiplyAdd(C0, NormalAxis, V0);
29 XMVECTOR R2 = XMVectorNegativeMultiplySubtract(C0, NormalAxis, V0);
30
31 V0 = XMVectorSelect(A, R0, g_XMSelect1110.v);
32 XMVECTOR V1 = XMVectorPermute<XM_PERMUTE_0Z, XM_PERMUTE_1Y, XM_PERMUTE_1Z, XM_PERMUTE_0X>(R1, R2);
33 XMVECTOR V2 = XMVectorPermute<XM_PERMUTE_0Y, XM_PERMUTE_1X, XM_PERMUTE_0Y, XM_PERMUTE_1X>(R1, R2);
34
35 XMMATRIX M;
36 M.r[0] = XMVectorPermute<XM_PERMUTE_0X, XM_PERMUTE_1X, XM_PERMUTE_1Y, XM_PERMUTE_0W>(V0, V1);
37 M.r[1] = XMVectorPermute<XM_PERMUTE_1Z, XM_PERMUTE_0Y, XM_PERMUTE_1W, XM_PERMUTE_0W>(V0, V1);
38 M.r[2] = XMVectorPermute<XM_PERMUTE_1X, XM_PERMUTE_1Y, XM_PERMUTE_0Z, XM_PERMUTE_0W>(V0, V2);
39 M.r[3] = g_XMIdentityR3.v;
40 return M;
41
42#elif defined(_XM_SSE_INTRINSICS_)
43 float fSinAngle;
44 float fCosAngle;
45 XMScalarSinCos(&fSinAngle, &fCosAngle, Angle);
46
47 XMVECTOR C2 = _mm_set_ps1(1.0f - fCosAngle);
48 XMVECTOR C1 = _mm_set_ps1(fCosAngle);
49 XMVECTOR C0 = _mm_set_ps1(fSinAngle);
50
51 XMVECTOR N0 = XM_PERMUTE_PS(NormalAxis, _MM_SHUFFLE(3, 0, 2, 1));
52 XMVECTOR N1 = XM_PERMUTE_PS(NormalAxis, _MM_SHUFFLE(3, 1, 0, 2));
53
54 XMVECTOR V0 = _mm_mul_ps(C2, N0);
55 V0 = _mm_mul_ps(V0, N1);
56
57 XMVECTOR R0 = _mm_mul_ps(C2, NormalAxis);
58 R0 = _mm_mul_ps(R0, NormalAxis);
59 R0 = _mm_add_ps(R0, C1);
60
61 XMVECTOR R1 = _mm_mul_ps(C0, NormalAxis);
62 R1 = _mm_add_ps(R1, V0);
63 XMVECTOR R2 = _mm_mul_ps(C0, NormalAxis);
64 R2 = _mm_sub_ps(V0, R2);
65
66 V0 = _mm_and_ps(R0, g_XMMask3);
67 XMVECTOR V1 = _mm_shuffle_ps(R1, R2, _MM_SHUFFLE(2, 1, 2, 0));
68 V1 = XM_PERMUTE_PS(V1, _MM_SHUFFLE(0, 3, 2, 1));
69 XMVECTOR V2 = _mm_shuffle_ps(R1, R2, _MM_SHUFFLE(0, 0, 1, 1));
70 V2 = XM_PERMUTE_PS(V2, _MM_SHUFFLE(2, 0, 2, 0));
71
72 R2 = _mm_shuffle_ps(V0, V1, _MM_SHUFFLE(1, 0, 3, 0));
73 R2 = XM_PERMUTE_PS(R2, _MM_SHUFFLE(1, 3, 2, 0));
74
75 XMMATRIX M;
76 M.r[0] = R2;
77
78 R2 = _mm_shuffle_ps(V0, V1, _MM_SHUFFLE(3, 2, 3, 1));
79 R2 = XM_PERMUTE_PS(R2, _MM_SHUFFLE(1, 3, 0, 2));
80 M.r[1] = R2;
81
82 V2 = _mm_shuffle_ps(V2, V0, _MM_SHUFFLE(3, 2, 1, 0));
83 M.r[2] = V2;
84 M.r[3] = g_XMIdentityR3.v;
85 return M;
86#endif
87}
Regarding the No-Intrinsics implementation:
XMScalarSinCos calculates the sine and cosine of an angle ─ in this case, the rotation angle.
XMVectorSet returns an XMVECTOR with components initialized using the arguments passed to the function. So, we have \(A = \{\sin\theta,\ \cos\theta,\ (1-\cos\theta),\ 0\}\).
XMVectorSplat[X/Y/Z] returns an XMVECTOR with components all initialized using the [X/Y/Z] component of the vector passed to the function. Thus, \(C0 = \{\sin\theta,\ \sin\theta,\ \sin\theta,\ 0\}\), \(C1 = \{\cos\theta,\ \cos\theta,\ \cos\theta,\ 0\}\) and \(C2 = \{(1-\cos\theta),\ (1-\cos\theta),\ (1-\cos\theta),\ 0\}\).
XMVectorSwizzle returns an XMVECTOR with components specified by swizzling the vector passed to the function. In this case, we simply reorder the components of the vector \(\mathbf{n}\), specifying the rotation axis, so that \(N0 = \{y, z, x, w\}\) and \(N1 = \{z, x, y, w\}\).
XMVectorMultiply returns an XMVECTOR obtained by multiplying the two XMVECTORs passed to the function. In this case, after the line 23 of the listing above, we have
XMVectorMultiplyAdd is similar to XMVectorMultiply but, after multiplying the first two vectors, it adds the third vector passed to the function to the result. In this case, after line 26, we have
XMVectorNegativeMultiplySubtract is similar to XMVectorMultiply but, after multiplying the first two vectors, it negates the result and subtracts the third vector passed to the function. In this case, after line 29, we have
XMVectorSelect returns an XMVECTOR with components selected between the two vectors passed to the function, based on the mask vector passed as the third argument. If a component of the mask vector is zero, the corresponding component of the first vector will be selected. If a component of the mask has a value of 0xFF, the corresponding component of the second vector will be selected. g_XMSelect1110 is a constant vector defined as {uint(-1), uint(-1), uint(-1), 0}
. So, after line 31, we have
XMVectorPermute returns an XMVECTOR with components selected from the two vectors passed to the function based on the values of the four template parameters. Each of these parameters is an index from 0 to 7, indicating where the corresponding component of the new vector should be copied from. In fact, the components of the two vectors passed to the function are treated as a contiguous array of 8 elements. The constants XM_PERMUTE_0[X-W]
are in the range \([0,3]\) and select components from the first vector, while the constants XM_PERMUTE_1[X-W]
are in the range \([4,7]\) and select components from the second vector. So, after line 33, we have
Using XMVectorPermute again, along with the vectors \(V0\), \(V1\) and \(V2\), XMMatrixRotationNormal builds the rotation matrix around the NormalAxis vector. Compare the result with the rotation matrix presented in Rotation.
Regarding the SSE-Intrinsics implementation, the result is the same but achieved in a faster and more efficient way. Refer to [Mica] and [Micb] for implementation details.
Additionally, DirectXMath provides other rotation methods such as XMMatrixRotationX, XMMatrixRotationY, and XMMatrixRotationZ to rotate around the X-, Y-, and Z-axes of the coordinate system.
Translation#
XMMatrixTranslation returns a translation matrix built from the specified offsets (components of the dislpacement vector). The resultant matrix is similar to the one examinated in Translation.
inline XMMATRIX XM_CALLCONV XMMatrixTranslation
(
float OffsetX,
float OffsetY,
float OffsetZ
) noexcept
{
#if defined(_XM_NO_INTRINSICS_)
XMMATRIX M;
M.m[0][0] = 1.0f;
M.m[0][1] = 0.0f;
M.m[0][2] = 0.0f;
M.m[0][3] = 0.0f;
M.m[1][0] = 0.0f;
M.m[1][1] = 1.0f;
M.m[1][2] = 0.0f;
M.m[1][3] = 0.0f;
M.m[2][0] = 0.0f;
M.m[2][1] = 0.0f;
M.m[2][2] = 1.0f;
M.m[2][3] = 0.0f;
M.m[3][0] = OffsetX;
M.m[3][1] = OffsetY;
M.m[3][2] = OffsetZ;
M.m[3][3] = 1.0f;
return M;
#elif defined(_XM_SSE_INTRINSICS_)
XMMATRIX M;
M.r[0] = g_XMIdentityR0.v;
M.r[1] = g_XMIdentityR1.v;
M.r[2] = g_XMIdentityR2.v;
M.r[3] = XMVectorSet(OffsetX, OffsetY, OffsetZ, 1.f);
return M;
#endif
}
In DirectXMath, g_XMIdentityR0 is a constant vector defined as {1.0f, 0.0f, 0.0f, 0.0f}
, while g_XMIdentityR1 e g_XMIdentityR2 are defined as {0.0f, 1.0f, 0.0f, 0.0f}
e {0.0f, 0.0f, 1.0f, 0.0f}
, respectively.
DirectXMath provides many other helper functions to perform transformations on matrices and vectors. See [Mica] for a reference guide to these functions.
Support this project
If you found the content of this tutorial somewhat useful or interesting, please consider supporting this project by clicking on the Sponsor button below. Whether a small tip, a one-time donation, or a recurring payment, all contributions are welcome! Thank you!
References#
Gerald E. Farin and Dianne Hansford. Practical Linear Algebra: A Geometry Toolbox. CRC Press, 4th edition, 2021.
Microsoft. Directxmath. https://learn.microsoft.com/en-us/windows/win32/dxmath/directxmath-portal.
Microsoft. X64 (amd64) intrinsics list. https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list.
Philip Schneider and David H. Eberly. Geometric Tools for Computer Graphics. Morgan Kaufmann, 1st edition, 2002.