CPU/GPU | Computation:

Precomputed/Dynamic | Mapping:

Forward/Backward | Interpolation:

Built-in/Manual | > | - | ------------------------ | ----------------------------------- | ---------------------------- | --------------------------------- | > | 1 | C | P | F | B | > | 2 | C | P | B | B | > | 3 | G | D | F | B | > | 4 | G | D | B | B | > | 5 | G | D | B | M | > > Table 4.1: Implementation strategies {#tab-strategies} ### 4.2.1 Strategy 1: CPFB The first strategy computes the transformation on the CPU (C), precomputes the transformation before use (P), employs forward mapping (F), and uses OpenGL ES' built-in interpolation (B). This is done by loading the image as a texture which is projected onto a transformed grid mesh. The crucial insight is that the mesh needs only be transformed once, and can be reused many times (e.g., when transforming a video sequence). If the image size is known in advance, the mesh need not even be computed when running the program; it can be stored as precomputed data. The strategy uses forward mapping, since the shape of the mesh is transformed directly. We can think of it like this: we create a grid mesh of, say, $10 \times 10$ vertices. Each vertex is mapped to a corresponding coordinate in texture space: the bottom-left vertex is mapped to the bottom-left corner of the texture, the upper-right vertex is mapped to the upper-right corner, and so on. Then the positions of the vertices are transformed by means of the chosen model. In consequence, the texture is warped. As illustrated in figure [4.2](#fig-gridmapping), the transformation is a mapping from regular rectangles (the undistorted grid) to quadrilaterals (the distorted grid). Although the vertex positions have changed, the texture coordinates still map to the undistorted grid. ![Figure 4.2: Transformation grid](gridmapping.svg){#fig-gridmapping width=400} The strategy leaves interpolation to OpenGL ES. The smoothest results are achieved by OpenGL ES' `GL_LINEAR` option. Some versions also supports multisampled anti-aliasing, although this is considerably more expensive. However, since the strategy is very efficient in the first place, the impact may be negligible. ### 4.2.2 Strategy 2: CPBB The second strategy is like the first, but employs backward mapping instead (B). This is done by transforming the other end of the mapping between vertex positions and texture coordinates. That is, while the forward mapping strategy transformed the vertex positions and held the texture coordinates constant, the backward mapping strategy transforms the texture coordinates and holds the vertex positions constant ([McReynolds and Blythe 2005](#mcreynolds05-advanced)). To explain this transformation in detail, we need to give an overview of OpenGL ES' coordinate spaces. By itself, OpenGL ES contains a mapping between *two* coordinate spaces: vertex positions $[x, y]$ in the range $[-1, 1]$ and texture coordinates $[s, t]$ in the range $[0, 1]$. The default mapping is given by the equation: $$\label{eq:mapping} \left[\begin{matrix} s\\ t \end{matrix}\right] = \left[\begin{matrix} x/2 + 1/2\\ y/2 + 1/2 \end{matrix}\right]$$ This is an affine transformation, where vertex space is centered around $[0, 0]$. It is translated and scaled to texture space, which is centered around $[1/2, 1/2]$. Using the conventions established in chapter [2](#chap-trans), we can express the translation in matrix form: $$\label{eq:affinemapping} \left[\begin{matrix} s\\ t\\ 1 \end{matrix}\right] = \left[\begin{matrix} x\\ y\\ 1 \end{matrix}\right] \left[\begin{matrix} 1/2 & 0 & 1/2\\ 0 & 1/2 & 1/2\\ 0 & 0 & 1 \end{matrix}\right]$$ Or in tableaux form: $$\label{eq:table} \left[\begin{matrix} s\\ t \end{matrix}\right] = \begin{cases} [0, 0] & \text{if $[x, y] = [-1, -1]$}\\ [0, 1] & \text{if $[x, y] = [-1, 1]$}\\ [1, 0] & \text{if $[x, y] = [1, -1]$}\\ [1, 1] & \text{if $[x, y] = [1, 1]$}\\ [1/2, 1/2] & \text{if $[x, y] = [0, 0]$}\\ \dots & \\ [x/2 + 1/2, y/2 + 1/2] & \end{cases} $$ The inverse relationship is of course given by: $$\label{eq:inversemapping} \left[\begin{matrix} x\\ y \end{matrix}\right] = \left[\begin{matrix} 2(s - 1/2)\\ 2(t - 1/2) \end{matrix}\right]$$ The utility of equations (\ref{eq:mapping}--\ref{eq:inversemapping}) is evident when we want to transform texture space. The transformation equations assume that the coordinate space is centered around $[0, 0]$, *not* $[1/2, 1/2]$. Therefore we must translate the coordinates before and after conversion, as shown in algorithm [2](#alg-coord). > Algorithm 2: Coordinate conversion > > ```python > [x, y] <- [2(s - 1/2), 2(t - 1/2)] # convert to vertex space > [x', y'] <- F([x, y]) # transform coordinate > [s', t'] <- [x'/2 + 1/2, y'/2 + 1/2] # convert to texture space > ``` {#alg-coord} Aside from these caveats, the implementation is straightforward and yields comparable results to the first strategy. The additional computations introduced by the coordinate conversion in algorithm [2](#alg-coord) are negligible and do not contribute much to the overall cost. Which is the better choice? It depends. If all necessary parameters are known beforehand so that the whole mesh can be stored as precomputed data, it doesn't really matter whether we choose one or the other. However, if the image size is not known and the mesh needs to be computed as least once when the program is initialized, then backward mapping may be preferable over forward mapping *if the chosen model is faster in that direction*. As we saw in chapter [3](#chap-models), models differ in this regard: a model well-suited for a forward mapping strategy may be a poor choice for a backward mapping strategy, or vice versa. If there is an "impedance mismatch" between model and strategy, we should change one or the other. ### 4.2.3 Strategy 3: GDFB The previous strategies have left little work to be performed by the GPU. For anything to be rendered, OpenGL ES requires us to at least write a simple vertex shader that passes the vertex data through unchanged (assuming that the input coordinates matches the screen coordinates), and a fragment shader that samples the color of the texture at the given coordinate. We can move the transformation into these very shaders. As before, we first create a regular grid mesh of, say, $10 \times 10$ vertices. Then we pass each vertex to the vertex shader, letting the vertex shader compute the transformed position. The end result is the same as when performing forward mapping on the CPU, but the transformation is expressed in shader code instead. If this approach is applied to a video sequence, the consequence is that the transformation is re-computed for each frame. To avoid this, the grid may be transformed by a separate vertex shader that is run only once and whose results are saved for later use. OpenGL ES 3.0 allows the output of a shader to be captured in a buffer object. The rendering itself is performed by a simple pass-through shader, as in the CPU case. ### 4.2.4 Strategy 4: GDBB {#sec-implementationgdbb} Backward mapping can also be done on the GPU. Whereas forward mapping was performed by the vertex shader, backward mapping is more appropriately performed by the fragment shader. The reason is that in this case, we don't need a $10 \times 10$ grid mesh -- a simple quadrilateral of four vertices will suffice. The task of the fragment shader, which is run once per output pixel in this case, is to transform the corresponding texture coordinate before sampling the texture at that coordinate. As in the CPU case, we must perform coordinate conversion before and after the transformation. We must also check if the transformation returns a position that is out of range: if so, we return a blank color (e.g., white). Section [4.2.4](#sec-implementationgdbb) describes the implementation of such a coordinate check. ### 4.2.5 Strategy 5: GDBM {#sec-gdbm} All the previous strategies have relied on built-in interpolation. However, the last strategy, which performs backward mapping on the GPU, can be combined with supersampling as a custom interpolation step. Instead of sampling a *single* color value, which will have been interpolated for us by OpenGL ES' `GL_LINEAR` option, we can sample multiple values and blend them together ourselves. If we know that the length and height of an output pixel is $l$, then the positions of the corners are given as follows: $$\begin{aligned} \label{eq:ul} [x, y]_{ul} &= [x - \tfrac{1}{2}l, y + \tfrac{1}{2}l] \\ [x, y]_{ur} &= [x + \tfrac{1}{2}l, y + \tfrac{1}{2}l] \\ [x, y]_{ll} &= [x - \tfrac{1}{2}l, y - \tfrac{1}{2}l] \\ [x, y]_{lr} &= [x + \tfrac{1}{2}l, y - \tfrac{1}{2}l] \end{aligned} $$ These four coordinates plus the original coordinate can then be transformed to input coordinates that are sampled to compute an average of five values. Alternatively, we can overlay a grid of $3 \times 3$ positions over the pixel and sample the input at nine different positions. This gives us four more positions to compute: $$\begin{aligned} [x, y]_{um} &= [x, y + \tfrac{1}{2}l] \\ [x, y]_{ml} &= [x - \tfrac{1}{2}l, y] \\ [x, y]_{mr} &= [x + \tfrac{1}{2}l, y] \\ \label{eq:lm} [x, y]_{lm} &= [x, y - \tfrac{1}{2}l] \end{aligned} $$ An unweighed average produces a smooth, if somewhat fuzzy result. We can achieve a different filtering effect by adjusting the coefficients -- that is, by computing a weighed average. A weighed average can be expressed a two-dimensional dot product with a weight matrix (or "kernel") divided by a normalization value ([Bjorke 2004](#bjorke04-filter)): $$\label{eq:kernel} \frac{1}{d} \left[\begin{matrix} w_{ul} & w_{um} & w_{ur}\\ w_{ml} & w_{mm} & w_{mr}\\ w_{lr} & w_{lm} & w_{lr} \end{matrix}\right] \cdot \left[\begin{matrix} c_{ul} & c_{um} & c_{ur}\\ c_{ml} & c_{mm} & c_{mr}\\ c_{lr} & c_{lm} & c_{lr} \end{matrix}\right] = \tfrac{1}{d}(w_{ul}c_{ul} + w_{um}c_{um} + \dotsb + w_{lr}c_{lr})$$ A brief discussion of a few very simple filters follows. For example, the previously mentioned average of nine samples can be expressed as a matrix of $1$'s divided by $9$ (figure [4.4b](#fig-kernels)). This is also known as a box filter, since it weighs the neighboring samples equally (figure [4.3a](#fig-boxfilterandgaussianfilter)). Such a filter removes much of the noise in the original image by averaging samples together; but it also removes a significant amount of detail that isn't noise. ![Figure 4.3: Filtering methods](boxfilterandgaussianfilter.svg){#fig-boxfilterandgaussianfilter width=400} In this regard, better results are typically achieved by a Gaussian filter (figure [4.3b](#fig-boxfilterandgaussianfilter)). Filtering with such a filter produces a blurring effect similar to that of the box filter, but the priority is on the central sample and the samples adjacent to it. The Gaussian kernel (figure [4.4c](#fig-kernels)) gives most weight to the center position, less weight to the closest neighbors, and least weight to the corners. Thus, the filter keeps more of the differences between one sample and the next. As such, the Gaussian filter is a compromise between the unfiltered image and the deep blurring effect produced by the box filter. Instead of reducing the difference between samples, we can also accentuate it. A sharpening filter works in the opposite way to the previous filters: instead of adding the values of the neighboring samples, the sharpening filter subtracts them from the central sample (figure [4.3c](#fig-boxfilterandgaussianfilter)). If the central sample has the same value as the surrounding samples, then the filtered value is equal to the original (figure [4.4d](#fig-kernels)). But if the central sample has a greater value than the surrounding samples, then the filtered value is magnified significantly. Note that as a sharpening filter magnifies noise along with other details in the image, it can be a good idea to apply a blurring filter, such as the Gaussian filter, prior to sharpening. ![Figure 4.4: Interpolation kernels](kernels.svg){#fig-kernels width=300} These are only some of the image effects that can be expressed in terms of $3 \times 3$ filter kernels. As we will see in section [5.8](#sec-implementationgdbm), one benefit of such filters is that they are easy to represent with OpenGL ES data structures. Since their dimensionality is the same, it is easy to swap out one filter in preference of another under an adaptive strategy. More sophisticated filters can be implemented, but fall outside the scope of this thesis. An examination of current film renderers performed by [Wexler and Enderton (2005)](#wexler05-rasterization) showed a wide variety of supported filters (sinc, Gaussian, Carmull-Rom, Lanczos, Mitchell and more) and default filter radii of 2 to 5 samples. If the supersampling approach is *combined* with OpenGL ES' `GL_LINEAR` option, the result is a "two-step" interpolation method. Each color value is interpolated by OpenGL ES when it is sampled, and then it is blended together with neighbor samples by the fragment shader. Adaptive supersampling is also possible: for example, we could compute the final value on the basis of one, four, five or nine samples depending on the distance to the center and the resulting sampling density. In this way, we could improve the efficiency of the shader. More interestingly, the kernels listed in figure [4.4](#fig-kernels) can be interchanged adaptively in order to improve interpolation quality. Recall from chapter [3](#chap-models) that as barrel undistortion produces a "pincushion" effect, the center of the image has the highest frequency, while the outer areas are often blurry because the samples are spread apart. This suggests that while a blurring kernel should be employed in the center to mitigate aliasing effects, it may be swapped out in preference of a sharpening kernel when the outer parts of the image are dealt with. Summary ------- We have outlined five different implementation strategies for accelerating image transformation with OpenGL ES. Two of the strategies perform the transformation on the CPU, the others perform in on the CPU. Two strategies cache the computations, while the others compute it dynamically. Two strategies use forward mapping and three strategies use backward mapping. Finally, four strategies uses OpenGL ES' built-in bilinear interpolation, while one strategy performs manual interpolation in the shader. The strategies vary by complexity. For the CPU strategies, the transformation is done prior to any rendering, and the OpenGL ES pipeline is very simple. For the GPU strategies, transformation is either done by means of a forward mapping vertex shader, or a backward mapping fragment shader. The latter can be augmented by increasingly complex and sophisticated methods for custom interpolation, such as supersampling. Our supersampling strategy makes use of $3 \times 3$ interpolation kernels, which, although simple, offer a variety of effects and are easy to implement in OpenGL ES. They can be replaced with more sophisticated filters, but that falls outside of the scope of this thesis. In the next chapter, we will take a detailed look at the implementation of these strategies. 5 Implementation with OpenGL ES {#chap-opengl} =============================== In this chapter, we describe our implementation of the strategies outlined in chapter [4](#chap-strategies). We have written a test program that runs each strategy multiple times and measures the execution time. We describe the general structure of this program, the data flow of the shaders, and our implementation of the grid mesh and the interface for transforming it. Then we turn to transformation on the GPU, illustrated by snippets of shader code. For instructions on how to obtain and compile the code, refer to appendix [A](#chap-code). 5.1 Qt ------ The OpenGL ES API does not specify how a rendering context is created or attached to the native windowing system. There are various frameworks for creating an OpenGL ES application -- e.g., EGL, GLUT, and Qt (which builds upon EGL). For our implementation, we have chosen Qt, which is written in C++. This affords us the opportunity to structure the code in an object-oriented way. Qt is a cross-platform application framework maintained by Qt Company. Applications written in Qt can be compiled and run on Windows, Linux and OS X, as well as mobile platforms. Qt provides the `QGLWidget` class as an abstraction layer for OpenGL ES ([Qt 2015](#qt15-qglwidget)). By subclassing this class, we can draw onto the current rendering context by invoking standard OpenGL ES functions. OpenGL ES is for the most part a subset of desktop OpenGL, with the exception of a few precision qualifiers (`highp`, `mediump` and `lowp`). In fact, most OpenGL ES code can easily be run on desktop OpenGL by prefixing the shaders with definitions of these qualifiers, and avoiding variable names with special meanings. Qt encourages the writing of such "portable shaders", and so do we: all the code is backward compatible with desktop OpenGL. ![Figure 5.1: UML diagram](uml.svg){#fig-uml width=600} The basic structure of the application is shown in figure [5.1](#fig-uml). A `QApplication` is instantiated with a `Window` (`main.cpp`). `Window` instantiates a `GLWidget` (`window.cpp`). `GLWidget` is a subclass of `QGLWidget`, and it sets up and runs OpenGL ES code in its `initializeGL()` and `paintGL()` functions (`glwidget.cpp`). The OpenGL ES code is structured in a number of "strategies" inheriting from a base class (`strategy.cpp`), using the Template Method pattern ([Gamma et al. 1995](#erich95-patterns)). 5.2 Shaders ----------- Each strategy invokes a vertex shader and a fragment shader. Since OpenGL ES lacks a fixed-function pipeline, we must at the very least specify a pass-through vertex shader and a texture sampling fragment shader in order to render anything at all. Both shaders take a number of data entries as input and return a number of data entries as output. The vertex shader (figure [5.2a](#fig-shaders)) takes a number of *input attributes* and outputs a number of *varyings*. The fragment shader (figure [5.2b](#fig-shaders)) takes the varyings as input and outputs a *color*. ![Figure 5.2: OpenGL Shaders](shaders.svg){#fig-shaders width=500} Constant data used by shaders are globally available as *uniforms*. A special type of uniform is the *sampler*, which represents a texture. It is used by the fragment shader for looking up the color of a texture coordinate. As of OpenGL ES 3.0, texture lookup operations are also possible in the vertex shader ([Munshi et al. 2014](#aaftab14-opengl-es)). In the CPU-computed strategies, a pass-through vertex shader takes the texture coordinate and position of each vertex and outputs them unchanged as a varying and the OpenGL ES attribute `gl_Position`. A texture sampling fragment shader takes the texture coordinate as input and outputs the color of the texture at that position. In the GPU-accelerated strategies, transformation is performed in the shaders. In the case of the vertex shader, the vertex position is transformed before it is output to `gl_Position`. In the case of the fragment shader, the texture coordinate is transformed before sampling the texture. 5.3 Compilation --------------- When the implementation is initialized, the shaders are read from a file on disk, parsed and compiled into a *shader object* in memory. The shader objects are then attached to and linked into a *program object* that performs the rendering (figure [5.3](#fig-compilation)). At this stage, vertex positions, texture coordinates and textures can be allocated and bound to the program object. ![Figure 5.3: Shader compilation](compilation.svg){#fig-compilation width=300} OpenGL ES does not specify a binary format for program objects. This is left to the vendor, which means that the format may change from one driver version to another. If the `glGetProgramBinary()` and `glProgramBinary()` functions are available, then a binary representation can be saved to the file system to be reused later. In this way, the cost of online compilation is avoided. Our implementation does not cache the compilation step. Instead, time measurements are postponed until *after* OpenGL ES has been successfully initialized. 5.4 Strategy 1: CPFB -------------------- When computing the transformation on the CPU, we need to create a grid mesh of $M \times N$ vertices -- a higher number means a more precise transformation. This task is handled by the `Grid` class (`grid.cpp`). The class encapsulates three arrays: an array of vertex positions, an array of associated texture coordinates, and an array of indices for drawing the triangles of the mesh. For example, a grid of ten rows and ten columns is instantiated with `Grid(10, 10)`, while a simple square of four corners is created with `Grid(2, 2)`. The mesh is constructed as a `GL_TRIANGLE_STRIP`, with alternating orientations of the inner triangles so that the whole strip can be drawn in one pass (figure [5.4a](#fig-grid)). Observe that each point has a *pair* of coordinates associated with it: a vertex position in the range $[-1, 1]$ and a texture coordinate in the range $[0, 1]$. The grid may be transformed by transforming the vertex positions and holding the texture coordinates constant, or by transforming the texture coordinates and holding the vertex positions constant. ![Figure 5.4: Grid class](grid.svg){#fig-grid width=300} To perform a forward mapping transformation, we iterate through each vertex position in the grid, transform it, and update the position. To this end, the `Grid` class provides an abstract interface (figure [5.4b](#fig-grid)). The `transform()` method takes a *functor class* as input. A functor class, in this context, is merely a wrapper class for a function on vertex positions.^[Our functor classes are implemented as *function objects*, that is, they overload the `operator()` operator. C++11 also provides support for anonymous functions in the form of lambda expressions ([Stroustrup 2013](#stroustrup13-cpp-lang)), but these are difficult to compose in the way outlined in section [5.5](#sec-implementationcpbb).] By implementing the transformation as such a class, we can pass it to the `Grid` class to perform transformation. (The vertex positions themselves are represented by a `Point` class with an `x` attribute and a `y` attribute.) After the grid has been transformed, it is ready for use and can be textured by the input image. Its vertex positions and texture coordinates are loaded as vertex attribute arrays, and become available as input attributes for the vertex shader. Since we are not using the GPU to accelerate the transformation, a simple a pass-through vertex shader (listing [5.1](#lst-vertexshader)) and a texture sampling fragment shader (listing [5.2](#lst-fragmentshader)) suffice. In OpenGL ES, if no default precision is specified, then the default precision is `highp` (the highest precision). It is possible that the shaders may run faster or with a better power efficiency at a lower precision. However, OpenGL ES does not require the vendor to support multiple precisions, so it is perfectly valid for an implementation to ignore all qualifiers and perform the calculations at the highest precision level. > Listing 5.1: Pass-through vertex shader > > ```c > attribute vec2 a_texcoord; > attribute vec4 a_position; > varying vec2 v_texcoord; > > void main() { > gl_Position = a_position; > v_texcoord = a_texcoord; > } > ``` {#lst-vertexshader} > Listing 5.2: Texture sampling fragment shader > > ```c > varying vec2 v_texcoord; > uniform sampler2D s_texture; > > void main() { > gl_FragColor = texture2D(s_texture, v_texcoord); > } > ``` {#lst-fragmentshader} 5.5 Strategy 2: CPBB {#sec-implementationcpbb} -------------------- In the second CPU-computed strategy, we perform backward mapping instead of forward mapping. That is, we hold the vertex positions constant, and transform the texture coordinates instead. To this end, the `Grid` class provides the `iTransform()` method, which iterates through the grid's texture coordinates. Recall that texture space's range of $[0, 1]$ differs from vertex space's range of $[-1, 1]$, so the method implements algorithm [2](#alg-coord) and converts between coordinate spaces before and after transformation. In this regard, our `Functor` interface comes in handy. Coordinate conversion, after all, is just another transformation, and can be encapsulated in a functor class of its own. By defining a `chain()` method for composing functor classes, we can build more complex transformations out of simpler transformations. Transformation in texture space, for example, can be defined as the composition of coordinate conversion and vertex transformation, composed with reverse coordinate conversion (figure [5.5](#fig-chain)). Likewise, when we initialize the grid, we employ functor classes to normalize vertex positions within the $[-1, 1]$ range and texture coordinates within the $[0, 1]$ range. ![Figure 5.5: Functor composition](chain.svg){#fig-chain width=500} As before, once the grid has been transformed, it stays transformed. The shaders are the same as for the previous strategy (listings [5.1](#lst-vertexshader)--[5.2](#lst-fragmentshader)). 5.6 Strategy 3: GDFB -------------------- The third strategy is a modification of the first strategy, but the transformation is performed on the GPU instead. As before, we instantiate an object of the `Grid` class, but we *don't* invoke the `transform()` method. Instead, we leave it to the vertex shader to map each vertex to the transformed position. Listing [5.3](#lst-vertexshadertransform) outlines the structure of the vertex shader. The `transform()` function implements our choice of distortion model. Note that texture coordinates are passed through unchanged. > Listing 5.3: Transformation in the vertex shader > > ```c > attribute vec2 a_texcoord; > attribute vec4 a_position; > varying vec2 v_texcoord; > > // fish-eye transformation > vec4 transform(vec4 pos) { > ... > } > > void main() { > gl_Position = transform(a_position); > v_texcoord = a_texcoord; > } > ``` {#lst-vertexshadertransform} 5.7 Strategy 4: GDBB {#strategy-4-gdbb} -------------------- In the backward mapping case, we transform the texture coordinates instead. This step can be performed in the fragment shader prior to sampling the texture. Since the fragment shader's texture coordinate is the interpolated value between the vertices surrounding the fragment's position, we don't need to create a detailed mesh beforehand -- a "grid" consisting of four corners suffices. Listing [5.4](#lst-fragmentshadertransform) shows how the texture coordinate is passed to `transform()` before `texture2D()`. > Listing 5.4: Transformation in the fragment shader > > ```c > varying vec2 v_texcoord; > uniform sampler2D s_texture; > > // texture coordinates to vertex positions > vec2 texcoordtopos(vec2 tex) { > ... > } > > // vertex positions to texture coordinates > vec2 postotexcoord(vec2 pos) { > ... > } > > // fish-eye transformation > vec2 fisheye(vec2 pos) { > ... > } > > // transformation function > vec2 transform(vec2 tex) { > return postotexcoord(fisheye(texcoordtopos(tex))); > } > > void main() { > gl_FragColor = texture2D(s_texture, transform(v_texcoord)); > } > ``` {#lst-fragmentshadertransform} As in the CPU case, we must perform coordinate conversion before and after transformation. This is done by the functions `texcoordtopos()` and `postotexcoord()`. The `transform()` function encapsulates the invocation of these functions and the `fisheye()` function, which performs the actual transformation. Depending on the model parameters, it may be wise to check if the transformed coordinate is within the $[0, 1]$ range, and return a blank color if it isn't. This can done by substituting a custom `color()` function for the direct invocation of `texture2D()` (listing [5.5](#lst-color)). > Listing 5.5: Texture sampling function > > ```c > vec4 color(sampler2D texture, vec2 pos) { > if(pos.x < 0.0 || pos.y < 0.0 || > pos.x > 1.0 || pos.y > 1.0) { > return vec4(1.0, 1.0, 1.0, 1.0); // white > } else { > return texture2D(texture, pos); > } > } > > void main() { > gl_FragColor = color(s_texture, transform(v_texcoord)); > } > ``` {#lst-color} 5.8 Strategy 5: GDBM {#sec-implementationgdbm} -------------------- The final strategy builds upon the previous to include custom interpolation in the form of supersampling. This is done by sampling the texture not once, but multiple times, and blending the values together. The first task of the `main()` method is to compute the neighbor coordinates according to equations (\ref{eq:ul}--\ref{eq:lm}): > Listing 5.6: Neighbor coordinates > > ```c > vec2 v0 = vec2(v_texcoord.x - px/2.0, > v_texcoord.y + px/2.0); > vec2 v1 = vec2(v_texcoord.x, > v_texcoord.y + px/2.0); > ... > vec2 v8 = vec2(v_texcoord.x + px/2.0, > v_texcoord.y - px/2.0); > ``` {lst-neighbor} Here, `px` is the normalized size of a fragment (i.e., $1$ divided by the number of fragment rows or columns). The next step is to transform these coordinates: > Listing 5.7: Transformed coordinates > > ```c > v0 = transform(v0); > v1 = transform(v1); > ... > v8 = transform(v8); > ``` {#lst-transformed} We sample the texture at the transformed positions: > Listing 5.8: Transformed coordinates > > ```c > vec4 c0 = color(s_texture, v0); > vec4 c1 = color(s_texture, v1); > ... > vec4 c8 = color(s_texture, v4); > ``` {#lst-sample} Then we can compute our interpolate value by blending these colors together. For example, if `blend9()` is a custom function computing an unweighed average, then we can call `blend9(c0, c1, ..., c8)`. We can also calculate a *weighed* average, expressed in terms of the kind of $3 \times 3$ kernel described in section [4.2.5](#sec-gdbm). The `mat3` data type is well suited to representing such a kernel. First we write a general `filter9()` function which takes a set of colors, a kernel, a divisor, and returns a weighed average: > Listing 5.9: Filtering function > > ```c > vec4 filter9(vec4 c1, vec4 c2, ..., vec4 c9, > mat3 kernel, float div) { > return (c1 * kernel[0][0] + > c2 * kernel[0][1] + > ... > c9 * kernel[2][2]) / div; > } > ``` {#lst-filtering} To this function we can for example pass a Gaussian kernel (figure [4.4c](#fig-kernels)): > Listing 5.10: Filtering function > > ```c > vec4 gaussian9(vec4 c1, vec4 c2, ..., vec4 c9) { > mat3 kernel = mat3(1.0, 2.0, 1.0, > 2.0, 4.0, 2.0, > 1.0, 2.0, 1.0); > return filter9(c1, c2, ..., c9, kernel, 16.0); > } > ``` {#lst-gaussian} This will typically produce a crisper result than the unweighed average. The choice of kernel can be determined adaptively by specifying a way to classify the image into concentric regions. For example, classifying on the radius describes a circular area, while classifying on the maximum transformed width or height describes a "pincushion"-shaped area (illustrated in figure [6.8](#fig-distance)): > Listing 5.11: Distance measure > > ```c > // radius > float distance() { > return length(texcoordtopos(v_texcoord)); > } > > // maximum transformed width/height > float distance2() { > vec2 vx = abs(fisheye(texcoordtopos(v_texcoord))); > return max(vx.x, vx.y); > } > ``` {#lst-distance} The appropriate kernel can be selected according to a given threshold: > Listing 5.12: Adaptive interpolation > > ```c > float r = distance(); > if(r > 0.8) { > gl_FragColor = sharpen9(c0, c1, ..., c8); > } else { > gl_FragColor = gaussian9(c0, c1, ..., c8); > } > ``` {#lst-adaptive} See section [6.6](#sec-adaptive) for the visual results of adaptive interpolation. Summary ------- Our implementation uses Qt for its surrounding framework, and is a mixture of C++ and shader code. The CPU strategies leverages the object orientation of C++ in order to structure the code. Both the grid mesh and the transformation code are encapsulated in composable classes. The GPU strategies rely on shader code, which is not as structured. Forward mapping is done in the vertex shader and backward mapping is done in the fragment shader. In the latter case, no vertex grid is necessary and a simple rectangle of four corner suffices. When performing manual supersampling in the fragment shader, OpenGL ES' matrix data types can be used to represent filtering kernels. By measuring the distance from the center, the image can be divided into regions which are interpolated differently. In the next chapter, we will measure the efficiency of these strategies and compare their visual output. 6 Results {#chap-results} ========= In this chapter, we compare the efficiency of the implementations. We also compare built-in interpolation against manual interpolation. 6.1 Setup --------- The code was executed on a MacBook Air 13" 2011 model with a 1,7 GHz Intel Core i5 processor, 4 GB 1333 MHz DDR3 RAM, and a Intel HD Graphics 3000 graphics card with 384 MB memory, running OS X Yosemite 10.10.4. Our test image is a photograph of $700 \times 700$ pixels (figure [6.1](#fig-testimage)).^[Figure [6.1](#fig-testimage) is an excerpt from the picture "Many Lovers" by Thomas Hawk, which is released under Creative Commons at