Spaces#
The magic of computer graphics lies in the ability to transform and manipulate objects within cartesian coordinate systems, also known as spaces for brevity. Indeed, a fundamental concept that forms the backbone of many graphics pipelines is the transformation of 3D objects to represent them in various spaces in order to bring virtual scene to be shown on the screen.
In Transformations, we showed that to transform a points and vectors we can transform the basis vectors representing the starting frame, so that we can express the coordinates with respect to a new space. Building upon this foundation, itβs now interesting to look at the common spaces typically employed in the graphics pipeline and how to go from a space to another as well in order to project the 3D scenes onto a 2D surface before showing the result on the screen.
From the initial local space where each object is defined, to the all-encompassing world space, and the camera space that offers a unique perspective, each space plays a crucial role in the intricate process of rendering. For a more extensive and detailed presentation on spaces used in computer graphics and transformations to go from one space to another, you may find valuable information in [MS22].
Object space#
The object space, also known as local space, is the frame in which 3D objects\meshes are defined. When creating 3D objects, graphic artists often work in a convenient space that simplifies vertex modeling by providing symmetry with respect to the origin of the coordinate system.

For instance, consider the modeling of a sphere. It is much easier to place all the vertices at an equal distance from the origin, rather than using a random point as the sphereβs center. This intuitive choice is not only practical but can also be mathematically justified.
Note
The local space is the frame where the vertices of a mesh are defined in the first place. These vertices are often stored in a file on disk and can be loaded into memory to create the vertex buffer, which is subsequently sent to the input assembler. Within this buffer, the vertices remain in their local space representation until the graphics pipeline performs the necessary transformations to convert the 3D objects they represent into a 2D representation to show on the screen.
Note
Throughout this tutorial series, weβll use a left-handed coordinate system where the y-axis points upwards to represent the object space. As explained in Vectors, this is completely arbitrary and you can choose any configuration (z-up or y-up) and handedness that suits your needs.
World space and World matrix#
When the input assembler sends its output to the next stage (the vertex shader), we have vertices in local space that we want to place in a 3D global scene shared by all objects. The space of the global scene is called world space, and the transformation to go from local to world space is called world transformation. To represent the world space, we will use the same convention as the object space: left-handed system with the y-axis pointing upwards.

As we know, to go from a frame to another, we need to express the basis vectors of the starting frame with respect to the new frame. So, we can build a matrix
Important
We hardly place every object in the same location of the world space, so

Thus, we can define
where the first three rows of
Example 4
Given a cube in local space, suppose you want to double its size, rotate it by
The image below shows a 2D cross-section of the scene, obtained by looking down along the positive y-axis. Note that the first three rows of the

View space and View matrix#
Once we apply the world transformation, all objects are in world space. However, we still require a specific viewpoint to observe the 3D scene. This space is typically called view space, or camera space, and to represent it, weβll adopt a left-handed system similar to the ones used for local and world spaces, where the y-axis points upwards. To transition objects from world space to view space, we need to apply another transformation: the view transformation. This involves applying a view matrix (denoted by

Unlike the world transformation, where each object typically requires a separate transformation, the view transformation usually employs the same view matrix for all objects in the global scene. This is because we generally aim for a consistent viewpoint to observe the entire 3D scene. In other words, itβs as if we can consider the entire scene, encompassing all the objects, as a single large object that needs to be transformed from world space to view space.
Now, to build the view matrix, we can start considering the camera as an ordinary object we can place in world space. So, we can use a world matrix
Indeed, remember that the inverse of a rotation matrix is equivalent to its transpose (as explained in Matrices). Consequently, the view matrix
Itβs interesting to note that, since
because both
Now, we need to calculate
Observe that since this is the difference between two points in world coordinates, the result is a vector in world coordinates as well.
To compute

Therefore, we can calculate
As explained above, the vector
Finally, to compute
Note
Both
View matrix in DirectX#
DirectXMath provides the helper function XMMatrixLookAtLH to build a view matrix similar to the one we discussed in this section (i.e., for transitioning from world to camera spaces defined as left-handed systems). You need to pass the camera position and target point as arguments to this function, which returns the related view matrix.
// pos: position (in world coordinates) of the (origin of the) view space.
// target: position (in world coordinates) where we want the camera is aimed at.
// up == j (unit basis vector which points up).
XMVECTOR pos = XMVectorSet(x, y, z, 1.0f);
XMVECTOR target = XMVectorZero();
XMVECTOR up = XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f);
// Compute the View matrix.
XMMATRIX V = XMMatrixLookAtLH(pos, target, up);
XMVectorSet and XMVectorZero are also helper functions. They allow us to initialize an XMVECTOR variable using a single SIMD instruction (if SIMD instructions are supported by the CPU).
Note
As explained in Vectors, XMVECTOR is an alias for __m128, so we should avoid initializing it with a simple assignment or the usual array initialization, because these methods may require multiple instructions, which is inefficient. Instead, XMVectorSet and XMVectorZero offer a dual implementation (No-Intrinsics and SSE-Intrinsics, as detailed in Transformations) that allows the CPU to leverage SIMD instructions (if supported) to load four values into a 16-byte aligned __m128 variable in a single instruction, significantly improving performance.
The implementation of the XMMatrixLookAtLH function should be relatively straightforward to understand, given the concepts we have discussed in this section and in Transformations.
inline XMMATRIX XM_CALLCONV XMMatrixLookAtLH
(
FXMVECTOR EyePosition,
FXMVECTOR FocusPosition,
FXMVECTOR UpDirection
) noexcept
{
XMVECTOR EyeDirection = XMVectorSubtract(FocusPosition, EyePosition);
return XMMatrixLookToLH(EyePosition, EyeDirection, UpDirection);
}
inline XMMATRIX XM_CALLCONV XMMatrixLookToLH
(
FXMVECTOR EyePosition,
FXMVECTOR EyeDirection,
FXMVECTOR UpDirection
) noexcept
{
assert(!XMVector3Equal(EyeDirection, XMVectorZero()));
assert(!XMVector3IsInfinite(EyeDirection));
assert(!XMVector3Equal(UpDirection, XMVectorZero()));
assert(!XMVector3IsInfinite(UpDirection));
XMVECTOR R2 = XMVector3Normalize(EyeDirection);
XMVECTOR R0 = XMVector3Cross(UpDirection, R2);
R0 = XMVector3Normalize(R0);
XMVECTOR R1 = XMVector3Cross(R2, R0);
XMVECTOR NegEyePosition = XMVectorNegate(EyePosition);
XMVECTOR D0 = XMVector3Dot(R0, NegEyePosition);
XMVECTOR D1 = XMVector3Dot(R1, NegEyePosition);
XMVECTOR D2 = XMVector3Dot(R2, NegEyePosition);
XMMATRIX M;
M.r[0] = XMVectorSelect(D0, R0, g_XMSelect1110.v);
M.r[1] = XMVectorSelect(D1, R1, g_XMSelect1110.v);
M.r[2] = XMVectorSelect(D2, R2, g_XMSelect1110.v);
M.r[3] = g_XMIdentityR3.v;
M = XMMatrixTranspose(M);
return M;
}
EyePosition, FocusPosition and UpDirection are the origin, target and up direction of the camera, expressed in world coordinates.
NDC space and Projection matrix#
Once we have all objects in camera space, the next step is to project them onto a plane to obtain a 2D representation of the 3D scene. To achieve this, we can ideally place a plane in front of the camera and trace rays from the camera to each vertex of the mesh, as illustrated in the image below. The intersection between these rays and the plane gives us a 2D representation of the corresponding 3D vertices. Note that if the projection rays are parallel to each other and orthogonal to the projection plane, the cameraβs position becomes irrelevant.

In the first case, where the projection rays converge towards a focal point, distant objects appear smaller. This replicates the way human vision works in real life and we commonly refer to this type of projection as perspective.
On the other hand, if the projection rays are parallel to each other, the perspective effect is lost, and the size of objects becomes independent of their distance from the camera. This type of projection is known as orthographic.
To better understand the difference, consider the illustration provided below. It depicts two segments of equal size placed at different distances from the camera. In the perspective projection, the closer segment appears longer when projected onto the projection plane, emphasizing the depth perception effect.

Fortunately, the intricacies of the projection process are almost transparent to the programmer, who is primarily responsible for defining the portion of the 3D scene to be projected onto the projection plane. Indeed, in most cases, capturing the entire scene is not necessary or desired. Depending on the type of projection being used, different bounding geometries define the region of interest.
In orthographic projections, the region of interest is represented by a box. This box encompasses the portion of the scene that will be projected onto the 2D plane. While we can use any plane in front of the camera as the projection plane, typically the box face closest to the camera is used as the projection window where the 3D scene is projected.
In perspective projections, the region of interest is defined by a frustum. A frustum is the volume enclosed between two parallel planes that intersect a pyramid. The apex of the pyramid corresponds to the cameraβs position. The plane closer to the camera is called the near plane, while the farther plane is called the far plane. We can obtain a projection window by intersecting the pyramid between the camera and the near plane, with another plane parallel to the near one. Alternatively, the upper face of the frustum, the intersection between the near plane and the pyramid, can also be used as the projection window. In computer graphics literature, the terms βnear planeβ and βfar planeβ are commonly used to refer to the corresponding windows as well.

The illustration below clearly demonstrates the differences between perspective and orthographic projections. In both projections, the green ball lies outside the defined region of interest and therefore is not projected onto the projection window.
In the orthographic projection, the red and yellow balls appear the same size, regardless of their distance from the camera. This is because the projection rays are parallel and do not converge towards a focal point, resulting in a lack of perspective distortion.
On the other hand, in the perspective projection, the red ball appears smaller compared to the yellow ball. This is due to the converging projection rays that mimic the behavior of human vision in real life. As objects move further away from the camera, they appear smaller, resulting in the size difference observed in the perspective projection.

To define a frustum (for perspective projections) or a box (for orthographic projections), we need to specify the distances from the camera to the near and far planes. For convenience, we typically define this bounding geometry in view space, where the camera position is located at the origin. Additionally, we need to determine the dimensions of the projection window. With this information, we can construct a projection matrix. This matrix transforms 3D vertices from view space to a space called Normalized Device Coordinates (NDC).
In perspective projection, the frustum defined in view space becomes a box in NDC space. The origin of this box is located at the center of its front face, which corresponds to the transformed near plane\window. A significant aspect of interest is that the objects contained within the box in NDC space (previously within the frustum\box in view space) will have vertex coordinates falling within the following ranges:
The illustration below depicts the frustum in view space (left) and the corresponding box in NDC space (right) after a perspective transformation. In DirectX, the NDC space is a left-handed system, with the y-axis that points upwards. The z-axis is always perpendicular to both the front and back faces of the box in NDC space and passes through their centers. While this arrangement also holds in view space, it is not an absolute requirement. Indeed, the z-axis in view space can be non-perpendicular to both the near and far planes, and it may pass through a point other than their centers.

Now, you might be wondering whatβs the point of this transformation. The following illustration shows a 2D representation from the top that explains what happens if you transform a frustum to a box. The objects inside the frustum are transformed accordingly, and the projection rays become parallel to each other. This way, we can orthographically project the mesh vertices onto a projection window (typically the front face of the box in NDC space) to mimic the perspective vision we are used to in real life, where objects like roads seem to converge in the distance, and where near objects appear bigger than distant ones.

Note
Interestingly, once we are in NDC space, there is no actual need for explicit projection onto the window plane. Indeed, as mentioned earlier, in NDC space the projection rays are parallel, and the z-axis is orthogonal to the front face of the NDC box, passing through its center (which is the origin of the NDC space). This means that the x- and y-coordinates of vertices remain constant along the projection rays in NDC space, only the z-coordinate varies. Consequently, the x- and y-coordinates of a vertex in NDC space are identical both inside the NDC box and when projected onto the front face (which lies in the

Usually, thatβs all we need to know in order to write applications that renders 3D objects on the screen using a perspective or orthographic effect. However, as graphics programmers, we are expected to know how things work under the hood. In particular, knowing how to build a projection matrix might come in useful in the future.
As stated in the note box above, once we go from view space to NDC space, we implicitly get a 2D representation of 3D vertex positions. So, this transformation is definitely related to the concept of projection. Indeed, the associated matrix is called projection matrix, that can vary depending on the type of projection we are interested in. We will start with a couple of matrices associated with the perspective projection, and then we will show the matrix associated with the orthographic projection.
Perspective projection#
While DirectX offers convenient helper functions for constructing projection matrices, in this section we will explore the process of manually creating a couple of projection matrices based on frustum information. Our first objective is to derive NDC coordinates from view coordinates. Then, we will attempt to express the resulting equations in matrix form, with the goal of finding a projection matrix to go from the view space to the NDC space. Consider the following illustration.

Fig. 14 Frustum in view space#
To construct a projection matrix, we must first define a frustum (in view space) that provides the necessary information. Regarding the projection window, as explained in NDC space and Projection matrix, we can opt for the intersection between the pyramid formed by the camera and the near plane, with any plane parallel to the near one. For our purposes, letβs conveniently choose a plane at a distance
The angle
However, the horizontal FOV
Observe that the z-axis is orthogonal to the projection window and passes through its center, dividing the height of the projection window into two parts of unit lengths. Also, the near and far planes are located at a distance of
Since the z-axis is orthogonal to the projection window and passes through its center, any 3D vertex projected onto its surface will have the y-coordinate already in NDC space (i.e., within the range

Letβs start by examining
Also, we know that
If you want to compute the horizontal FOV
As for
Observe that a vertex in view space
where
As we know, a vertex position is a point, so the w-coordinate is always 1 regardless of the coordinate space. As for
However, before deriving
Observe that if we multiply the NDC coordinates by
Note
The rasterizer expects to receive primitives with vertices in clip coordinates as input. Therefore, the last stage before the rasterizer must output vertices in clip space. Typically, if no optional stage is enabled, the last stage before the rasterizer is the vertex shader. Otherwise, it can be one between geometry shader and domain shader.
With the perspective division automatically performed by the rasterizer, we are able to transform the coordinates of a vertex from clip to NDC space. Now, we need to find a matrix form to go from view space to clip space. To do this, we must first multiply equations equations
Remember that we still need to derive
Then, to get the NDC coordinates, we simply need to divide all the components of
We can now focus on deriving a formula for
Note
As already mentioned several times in this section, we can intersect any plane with the pyramid between the camera and the near plane to obtain the projection window. The result represents the same projection window but at different distances from the camera. This difference in distance does not affect the x- and y-coordinates, as previously explained. However, it does impact the z-coordinate, which requires to be handled separately, as discussed in the current explanation.
Observe that
Note
The equation above uses
Consequently, the matrix in equation
because the last two entry in the third column are the only ones that can scale and translate the third coordinate of
The coordinates of
However, in this case we know that
We also know that for a vertex in view space that lies in the far plane we have
where we used
However, in this case we know that
Substituting this into equation
We just found the values of
Matrix
Matrix
Although, thatβs not what we wanted to find at the start of this section (the matrix to go from view to NDC space). However, since we get the perspective division for free during the rasterizer stage, we can actually consider
General case#
We built the perspective projection matrix

Deriving a perspective projection matrix for the general case wonβt be overly difficult given our exploration of the specific case in the previuos section. Indeed, after projecting the 3D vertices onto the projection window, we need to translate the projection window so that the z-axis goes through its center again, bringing us back to the specific case. However, before proceeding, some initial observations are necessary.
In the general case, the frustum is not necessarily symmetrical with respect to the z-axis, so we canβt use the vertical FOV and aspect ratio to define the size of the projection window. Instead, we need to set its width and height by specifying the view coordinates of its top, bottom, left, and right sides.
We will project 3D vertices onto the projection window that lies on the near plane (meaning
). This isnβt really a limitation because we can project onto any projection window between the camera (exclusive) and near plane (inclusive).
Thus, in the general case, a vertex
where
Therefore, we need to translate the first two coordinates of
Observe that we used the mid-point formula to subtract the x- and y- center coordinates of the projection window from
Now that we are back to the specific case, we can substitute equation
Similarly, we can substitute equation
With equations
If we omit the perspective division
Perspective division and clipping#
After the perspective division by the w-component, the vertices inside the NDC box are the ones with NDC coordinates falling within the following ranges
This means that the vertices inside the frustum were the ones with homogeneous coordinates falling within the following ranges
That is, the vertices inside the frustum are the ones bounded by the following homogeneous planes (that is, 4D planes expressed in clip coordinates).
The following illustration shows a 2D representation of the frustum in the homogeneous xw-plane.

If
We have

As you can see in the image above, a clipped primitive might no longer be a triangle. Therefore, the rasterizer also needs to triangulate clipped primitives, and re-inserts them in the pipeline.
Depth buffer precision#
Whatever perspective projection matrix you decide to use (either
If you set
The following graph shows what happens if you set

This can represent a serious problem because if a far object A is in front of another object B, but A is rendered after B, then A could be considered at the same distance as B with respect to the camera, and discarded from the pipeline if the depth test is enabled. We will delve into depth testing in a subsequent tutorial.
To mitigate the problem, we can set
Orthographic projection#
Similar to the general case of perspective projection, in orthographic projections, we aim to align the z-axis with the center of the projection window. However, unlike perspective projections, the location of the projection window along the z-axis is irrelevant in orthographic projections. This unique property will allow us to simplify the derivation of an equation for

Indeed, we can reuse equations
With an orthographic projection, we can derive an equation for
Observe that, with an orthographic projection, we canβt substitute
The result is that we no longer have the
In conclusion, the matrix above allows us to go straight from view space to NDC space, without passing through the homogeneous clip space. Although, the rasterizer still expects vertices in clip coordinates. Therefore, we need a way to make the rasterizer believe we are passing clip coordinates, while also avoiding the perspective division. As you can see in the fourth column of the orthographic projection matrix
Projection matrices in DirectX#
DirectXMath provides many useful functions for building different types of projection matrices, depending on the type of projection and the handedness of the frame. However, in this tutorial series we will only work with left-handed coordinate systems. Therefore, to build a perspective projection matrix we can use the helper function XMMatrixPerspectiveFovLH.
XMMATRIX XMMatrixPerspectiveFovLH(
float FovAngleY,
float AspectRatio,
float NearZ,
float FarZ
);
As you can see, we only need to pass the vertical FOV, the aspect ratio, and the distances of the near and far planes. This means that with this function we can build the matrix
As for the general case of a perspective projection, we can use the helper function XMMatrixPerspectiveOffCenterLH.
XMMATRIX XMMatrixPerspectiveOffCenterLH(
float ViewLeft,
float ViewRight,
float ViewBottom,
float ViewTop,
float NearZ,
float FarZ
);
To build an orthographic projection matrix, we can use the helper function XMMatrixOrthographicOffCenterLH.
XMMATRIX XMMatrixOrthographicOffCenterLH(
float ViewLeft,
float ViewRight,
float ViewBottom,
float ViewTop,
float NearZ,
float FarZ
);
Refer to the DirectXMath libraryβs source code to verify that these projection matrices are implemented according to the definitions presented in this tutorial.
DirectXMath also provides XMMatrixPerspectiveLH. Please refer to the official API documentation for more details.
Render target space and Viewport#
After the perspective division, all vertices are in NDC space, and if we only consider the first two NDC coordinates, we also have their 2D representations. Although, we are in a normalized 2D space (the

The rasterizer automatically transforms the vertices from NDC space to render target space by using the viewport information we set with ID3D12GraphicsCommandList::RSSetViewports. Once in render target space, it can generate pixels covered by primitives. However, if the render target coordinates of a pixel fall outside the specified render target size, the pixel will be discarded and wonβt be processed by any subsequent stage of the pipeline.
In Hello Window, we briefly mentioned that a viewport can be seen as a rectangular region within the back buffer space where rendering operations take place. Now, we can be more specific in stating that a viewport is a structure that holds the necessary information for the rasterizer to construct a matrix that transforms vertices from NDC space to a specific rectangle within the render target space. In other words, it defines the mapping of the projection window onto a selected area of the render target.

Since we might find it useful in the future, letβs see how we can manually build this matrix to go from NDC space to render target space from the viewport information. Suppose we want to draw on a selected
to the following render target ranges
Starting with the x-coordinate, we need to map
As for the y-coordinate, we need to consider the change of direction between NDC and render target space. That is,
As for the z-coordinate, we only need to scale
At this point, we only need to translate the resulting coordinates to shift the origin of the
Now, we can derive our render target coordinates
Note
If the resulting render target coordinates fall outside the render target size, then the corresponding pixel generated by the rasterizer will be discarded. That is, it wonβt be processed by subsequent stages of the pipeline.
In matrix form this becomes
Although, most of the time, we donβt want to rescale the NDC z-coordinate, so we have
Tip
To prevent stretching in the final image on the screen, itβs recommended to set
Once mesh vertices are in render target space, the rasterizer can identify the texels covered by the primitives, and emit pixels at the corresponding positions to be consumed by the pixel shader.
Support this project
If you found the content of this tutorial somewhat useful or interesting, please consider supporting this project by clicking on the Sponsor button below. Whether a small tip, a one-time donation, or a recurring payment, all contributions are welcome! Thank you!

References#
S. Marschner and P. Shirley. Fundamentals of Computer Graphics. CRC Press, 4th edition, 2022.