Ray tracing rendering practice: OpenGL ray tracing, accelerating computing with GPU!

preface

Ray tracing on GPU!

About half a year ago, I was Raytrace rendering practice Simple ray tracing is discussed and implemented in. It is the simplest calculation on cpu with c + + and multithreading, but only supports simple triangles and balls as primitives

Over time, it is no longer novel to manipulate these exquisite spheres. In order to traverse a large number of triangles efficiently, this article Ray tracing accelerated traversal structure This paper studies and discusses the BVH data structure, and gives the simple implementation and visual code of c + +

If the first two blogs are wood pickaxes and stone pickaxes in mc, today we'll knock an iron pickaxe and bake the GPU on the fire:

Well, let's talk about the idea: the fragment shader of OpenGL is used to calculate the light chase pixel by pixel, and then the triangle, BVH structure, material and other information are encoded, stuffed into the texture and passed to the shader, and then, um, it's finished

The core part of hhh is very simple, that is, transplant the original c + + code and run it in GLSL. emmm this is not a classic elephant stuffed into the refrigerator. In fact, I spend a lot of time solving all kinds of bug s

Fortunately, the difficult period has passed. Over this mountain... Ok... Stop gossiping and get ready. We will enter the world of waves and particles again!

0. Pre knowledge

Note:
This part does not involve any principle, code explanation
Only by dabbling in the pre knowledge listed in this section can you read this blog smoothly, so that you won't be confused when you see the code hhh

environment

visual studio 2019 uses glm for math library, glew for OpenGL, freeglut for window management, and SOIL2 for image reading. For how to install a third-party library on vs, you can use vcpkg for package management. For specific usage, refer to My previous blog

OpenGL

Because it is based on GLSL, you must have a certain understanding of OpenGL first. Common concepts, such as drawing process, shader, coordinate transformation, etc. Common operations, such as buffer object, passing texture, passing uniform consistent variable and so on. The most classic tutorial about OpenGL is Learn OpenGL Yes. Of course, I also sell my own special column , although it's ferra

GLSL

Then, the clip shader is not simply outputting the vertex color after centroid interpolation. We need to write nearly 114514e1919 lines of code in the fragment shader for light tracing, so we need to understand the syntax specification, function interface, uniform variable, texture, sampling and other operations of GLSL

Simple geometry

Mathematical and geometric principles such as intersection with triangles, axis aligned bounding box AABB, cuboid box intersection, triangle barycenter interpolation, etc. Principle of intersection with triangle, intersection with AABB

Path tracking

The color of a point is not determined by magic, but by integral solution of rendering equation. Each integration recursively solves the light path pixel by pixel until it touches the light source. Principle and code reference: Raytrace rendering practice

1. Layout canvas

The fragment shader will calculate each pixel once and is parallel on the GPU, which just meets our requirements: parallel and pixel by pixel calculation! We use it to calculate the color of ray tracing!

Using the slice shader to calculate each pixel, first we draw a square covering the whole screen as our canvas:

Without any transformation matrix, the six vertices are directly fixed to the NDC space, that is, the range of [- 1,1]. The code of c + + is very simple:

std::vector<vec3> square = { 
	vec3(-1, -1, 0), vec3(1, -1, 0), vec3(-1, 1, 0), 
	vec3(1, 1, 0), vec3(-1, 1, 0), vec3(1, -1, 0) 
};

// Generate VAO, VBO, transmit data, draw

...

The vertex shader directly passes the NDC coordinates of the vertex as the slice coordinates of the pixel, the range of the pix variable [- 1, 1], and tries to output the pixel coordinates pix as the color in the fragment shader. When you see the following image, it indicates that the canvas is working normally:

2. Transfer triangle data to shader

Note:
This part of the code is quite boring. We are preparing data for the shader
If you are interested in the GLSL implementation of path tracing, you can skip this section

Unlike previous ray tracing programs, the calculation takes place on the GPU side. There are many ways to transfer data to the shader, such as uniform array, uniform buffer, etc., but these methods have a number limit and can not transfer a large amount of data

Recall how we transferred pictures to the GPU. Yes, texture! We read the model data into memory through c + +, then transfer the data to video memory in the form of texture, and then sample the texture in the shader to read out the original data!

Our data is usually transmitted in the form of arrays, such as triangle array, BVH binary tree array, material array and so on. These arrays are one-dimensional to facilitate access and sampling with subscript pointers

However, it is well known that the general image texture is two-dimensional, sampling through a uv coordinate in the range of 0 ~ 1 instead of subscript. Not to mention the trick of folding a one-dimensional array into a two-dimensional texture, the hardware bilinear interpolation filter welded in the graphics card circuit will also destroy the accuracy of the data

For these reasons, OpenGL provides a more primitive transfer method called Buffer Texture, which uses texture as a general data buffer. It allows us to directly move the binary data in memory to video memory, and then access it through a special sampler, samplerBuffer.

Different from the general sampler2D, samplerBuffer regards the texture content (i.e. the original data in the video memory) as a one-dimensional array, which can directly index the data through subscripts, and does not use any filters, which just meets our needs!

For the usage of Buffer Texture, please refer to OpenGL Wiki , or the following simple example. Suppose we have an array to pass:

int n;	// Array size
float triangles[];

First, we need to create a buffer object called texture buffer object, or tbo for short, which can be compared to opening up a space in video memory:

GLuint tbo;
glGenBuffers(1, &tbo);
glBindBuffer(GL_TEXTURE_BUFFER, tbo);

Then insert the data into the buffer:

glBufferData(GL_TEXTURE_BUFFER, n * sizeof(float), &your_data[0], GL_STATIC_DRAW);

Then create a texture. Note that the texture type should be GL_TEXTURE_BUFFER this means that we are not developing an image texture, but a data buffer texture:

GLuint tex;
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_BUFFER, tex);

Then use glTexBuffer to associate the data in tbo to the texture buffer. Here, we use GL_RGB32F format, such an access can take out the data of a vec3 vector. The return value of the sampler has three RGB channels, and each channel is a 32-bit floating-point number:

glTexBuffer(GL_TEXTURE_BUFFER, GL_RGB32F, tbo);

Finally, pass texture 0 to shader:

glActiveTexture(GL_TEXTURE0);
glUniform1i(glGetUniformLocation(program, "triangles"), 0);

Then use texelFetch and an integer subscript index on the shader side to query the texture of samplerBuffer type:

uniform samplerBuffer triangles;

...

int index = xxx
vec3 data = texelFetch(triangles, index).xyz;

Note the data format here_ rgb32f refers to how much data can be read by a subscript (one sampling), that is, the unit of a grid of data. A subscript will index three 32-bit floating-point numbers and return a vec4, but only the rgb component is valid. Their mapping relationship with memory data is as follows:

So far, we have mastered the method of accessing 12 byte (RGB32) memory data on GPU, but our structures are not necessarily aligned with 12 bytes

Note:
You can also use GL_R32F to read a 32-bit floating-point number at a time, which can organize data more flexibly
But obviously reading one vec3 at a time is more efficient

Taking triangles and their materials as an example, their definitions in c + + code on CPU are as follows:

// Object surface material definition
struct Material {
    vec3 emissive = vec3(0, 0, 0);  // The luminous color when used as a light source
    vec3 baseColor = vec3(1, 1, 1);
    float subsurface = 0.0;
    float metallic = 0.0;
    float specular = 0.0;
    float specularTint = 0.0;
    float roughness = 0.0;
    float anisotropic = 0.0;
    float sheen = 0.0;
    float sheenTint = 0.0;
    float clearcoat = 0.0;
    float clearcoatGloss = 0.0;
    float IOR = 1.0;
    float transmission = 0.0;
};

// Triangle definition
struct Triangle {
    vec3 p1, p2, p3;    // Vertex coordinates
    vec3 n1, n2, n3;    // Vertex normal
    Material material;  // texture of material
};

Note:
Sell it here
Forget it, the PBR material parameters defined by Disney are used here. In the next chapter, we will implement Disney principle's BRDF, which is simply standardized physics based rendering

Encoding triangle data, we create the structure Triangle_encoded. Their data types are composed of vec3 and meet the 12 byte alignment:

struct Triangle_encoded {
    vec3 p1, p2, p3;    // Vertex coordinates
    vec3 n1, n2, n3;    // Vertex normal
    vec3 emissive;      // Self luminous parameters
    vec3 baseColor;     // colour
    vec3 param1;        // (subsurface, metallic, specular)
    vec3 param2;        // (specularTint, roughness, anisotropic)
    vec3 param3;        // (sheen, sheenTint, clearcoat)
    vec3 param4;        // (clearcoatGloss, IOR, transmission)
};

Then prepare to encode the data. This part of the c + + code is quite boring, just tossing the data around:

// Read triangle
std::vector<Triangle> triangles;
readObj()
int nTriangles = triangles.size();

...

// Coded triangle, material
std::vector<Triangle_encoded> triangles_encoded(nTriangles);
for (int i = 0; i < nTriangles; i++) {
    Triangle& t = triangles[i];
    Material& m = t.material;
    // Vertex Position 
    triangles_encoded[i].p1 = t.p1;
    triangles_encoded[i].p2 = t.p2;
    triangles_encoded[i].p3 = t.p3;
    // Vertex normal
    triangles_encoded[i].n1 = t.n1;
    triangles_encoded[i].n2 = t.n2;
    triangles_encoded[i].n3 = t.n3;
    // texture of material
    triangles_encoded[i].emissive = m.emissive;
    triangles_encoded[i].baseColor = m.baseColor;
    triangles_encoded[i].param1 = vec3(m.subsurface, m.metallic, m.specular);
    triangles_encoded[i].param2 = vec3(m.specularTint, m.roughness, m.anisotropic);
    triangles_encoded[i].param3 = vec3(m.sheen, m.sheenTint, m.clearcoat);
    triangles_encoded[i].param4 = vec3(m.clearcoatGloss, m.IOR, m.transmission);
}

Then use the texture buffer to transfer it to the shader. Here, create a texture buffer object, import the data into tbo, create a texture, and bind tbo to the texture:

GLuint trianglesTextureBuffer;
GLuint tbo0;
glGenBuffers(1, &tbo0);
glBindBuffer(GL_TEXTURE_BUFFER, tbo0);
glBufferData(GL_TEXTURE_BUFFER, triangles_encoded.size() * sizeof(Triangle_encoded), &triangles_encoded[0], GL_STATIC_DRAW);
glGenTextures(1, &trianglesTextureBuffer);
glBindTexture(GL_TEXTURE_BUFFER, trianglesTextureBuffer);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGB32F, tbo0);

Then decode the data through textelfetch in the shader and restore it to the original structure. This part of the code is also quite boring, which is also the inverted data. Just note that the texture is accessed in vec3 units, and the encoded triangle contains 12 vec3 vectors, so the offset corresponding to each subscript should be calculated through the subscript. The GLSL code is as follows:

#define SIZE_TRIANGLE   12

uniform samplerBuffer triangles;

...

// Gets the triangle with the i-th subscript
Triangle getTriangle(int i) {
    int offset = i * SIZE_TRIANGLE;
    Triangle t;

    // Vertex coordinates
    t.p1 = texelFetch(triangles, offset + 0).xyz;
    t.p2 = texelFetch(triangles, offset + 1).xyz;
    t.p3 = texelFetch(triangles, offset + 2).xyz;
    // normal
    t.n1 = texelFetch(triangles, offset + 3).xyz;
    t.n2 = texelFetch(triangles, offset + 4).xyz;
    t.n3 = texelFetch(triangles, offset + 5).xyz;

    return t;
}

// Gets the material of the triangle with subscript i
Material getMaterial(int i) {
    Material m;

    int offset = i * SIZE_TRIANGLE;
    vec3 param1 = texelFetch(triangles, offset + 8).xyz;
    vec3 param2 = texelFetch(triangles, offset + 9).xyz;
    vec3 param3 = texelFetch(triangles, offset + 10).xyz;
    vec3 param4 = texelFetch(triangles, offset + 11).xyz;
    
    m.emissive = texelFetch(triangles, offset + 6).xyz;
    m.baseColor = texelFetch(triangles, offset + 7).xyz;
    m.subsurface = param1.x;
    m.metallic = param1.y;
    m.specular = param1.z;
    m.specularTint = param2.x;
    m.roughness = param2.y;
    m.anisotropic = param2.z;
    m.sheen = param3.x;
    m.sheenTint = param3.y;
    m.clearcoat = param3.z;
    m.clearcoatGloss = param4.x;
    m.IOR = param4.y;
    m.transmission = param4.z;

    return m;
}

Then try to transfer a triangle and its material, and output its color, vertex information, or any attributes in the shader to verify. If the color (or other data, such as coordinates) is successfully read, the data transmission is correct

3. Triangle intersection in shader

The first is the definition of light:

// light
struct Ray {
    vec3 startPoint;
    vec3 direction;
};

Then we also need an intersection result, which contains some necessary information, such as intersection position, distance and surface material:

// Ray intersection result
struct HitResult {
    bool isHit;             // Hit
    bool isInside;          // Whether to hit from inside
    float distance;         // Distance from intersection
    vec3 hitPoint;          // Light midpoint
    vec3 normal;            // Hit point normal
    vec3 viewDir;           // The direction of the light hitting the point
    Material material;      // Surface material of hit point
};

It is not difficult to find the intersection of triangles. First, solve the distance t between the light and the plane where the triangle is located. With the distance, find the intersection P. the idea is as follows:

After finding the intersection, judge whether the intersection is in the triangle. Here, it is judged by whether the direction of cross multiplication and normal phase are the same. If all three cross products are in the same direction as N, P is in the triangle:

The code is also very simple. The only thing worth noting is that if the line of sight direction d is the same as the normal direction N of the triangle, flip N to ensure that we can get the correct normal vector no matter whether we hit the triangle from the front or back. The code is as follows:

#define INF             114514.0

// Intersection of light and triangle 
HitResult hitTriangle(Triangle triangle, Ray ray) {
    HitResult res;
    res.distance = INF;
    res.isHit = false;
    res.isInside = false;

    vec3 p1 = triangle.p1;
    vec3 p2 = triangle.p2;
    vec3 p3 = triangle.p3;

    vec3 S = ray.startPoint;    // Ray starting point
    vec3 d = ray.direction;     // Ray direction
    vec3 N = normalize(cross(p2-p1, p3-p1));    // Normal vector

    // Hit from behind the triangle (inside the model)
    if (dot(N, d) > 0.0f) {
        N = -N;   
        res.isInside = true;
    }

    // If the line of sight is parallel to the triangle
    if (abs(dot(N, d)) < 0.00001f) return res;

    // distance
    float t = (dot(N, p1) - dot(S, N)) / dot(d, N);
    if (t < 0.0005f) return res;    // If the triangle is on the back of the light

    // Intersection calculation
    vec3 P = S + d * t;

    // Determine whether the intersection is in the triangle
    vec3 c1 = cross(p2 - p1, P - p1);
    vec3 c2 = cross(p3 - p2, P - p2);
    vec3 c3 = cross(p1 - p3, P - p3);
    bool r1 = (dot(c1, N) > 0 && dot(c2, N) > 0 && dot(c3, N) > 0);
    bool r2 = (dot(c1, N) < 0 && dot(c2, N) < 0 && dot(c3, N) < 0);

    // Hit, encapsulate the returned result
    if (r1 || r2) {
        res.isHit = true;
        res.hitPoint = P;
        res.distance = t;
        res.normal = N;
        res.viewDir = d;
        // Interpolate vertex normals according to the intersection position
        float alpha = (-(P.x-p2.x)*(p3.y-p2.y) + (P.y-p2.y)*(p3.x-p2.x)) / (-(p1.x-p2.x-0.00005)*(p3.y-p2.y+0.00005) + (p1.y-p2.y+0.00005)*(p3.x-p2.x+0.00005));
        float beta  = (-(P.x-p3.x)*(p1.y-p3.y) + (P.y-p3.y)*(p1.x-p3.x)) / (-(p2.x-p3.x-0.00005)*(p1.y-p3.y+0.00005) + (p2.y-p3.y+0.00005)*(p1.x-p3.x+0.00005));
        float gama  = 1.0 - alpha - beta;
        vec3 Nsmooth = alpha * triangle.n1 + beta * triangle.n2 + gama * triangle.n3;
        res.normal = (res.isInside) ? (-Nsmooth) : (Nsmooth);
    }

    return res;
}

Then we write a function to traverse the triangle array for intersection and return the nearest intersection:

#define INF             114514.0

// Brutally traverse the array subscript range [l, r] to find the nearest intersection
HitResult hitArray(Ray ray, int l, int r) {
    HitResult res;
    res.isHit = false;
    res.distance = INF;
    for(int i=l; i<=r; i++) {
        Triangle triangle = getTriangle(i);
        HitResult r = hitTriangle(triangle, ray);
        if(r.isHit && r.distance<res.distance) {
            res = r;
            res.material = getMaterial(i);
        }
    }
    return res;
}

Then we configure the camera. The camera is located in vec3(0, 0, 4), looks at the negative direction of the z axis, and projects rays according to the NDC coordinates of canvas pixels. Here, the length and width of the projection plane are 2.0, while zNear is 2.0, which ensures a field angle of view of about 50 °

The code is as follows:

Ray ray;
ray.startPoint = vec3(0, 0, 4);
vec3 dir = vec3(pix.xy, 2) - ray.startPoint;
ray.direction = normalize(dir);

Then use hitArray written just now to traverse the array, find the intersection and return the color of the surface:

uniform int nTriangles;

...

HitResult res = hitArray(ray, 0, nTriangles-1);
if(res.isHit) fragColor = vec4(res.material.color, 1);

The triangle... Has been seen enough times

Read an obj model of Stanford Bunny and try again:

Ohhh although it traverses more than 4000 triangles violently, we can maintain 4 ~ 5 FPS because of the powerful parallel ability of GPU

4. Linearized BVH tree

We successfully traverse triangles, but we need to traverse more efficiently. BVH tree is a good choice. However, there is no concept of pointer in GLSL. We need to change the tree structure using pointer to a linear binary tree using array subscript as pointer. I believe that children's shoes who have learned data structure are not unfamiliar with this

This is the original BVH node structure. The content is divided into three parts: left and right children, AABB collision box and leaf node information, in which AA is the minimum point and BB is the maximum point. Because we can't use pointers, we can only use array subscripts. We change the structure to:

// BVH tree node
struct BVHNode {
    int left, right;    // Left and right subtree index
    int n, index;       // Leaf node information               
    vec3 AA, BB;        // Collision box
};

A small change is also introduced here: a leaf node can store multiple triangles, n represents the number of triangles of the leaf node, index represents the first triangle of the node, and the index in the triangles array:

Linearizing a binary tree is also very simple. You only need to change new Node() to push every time you create a node_ Back () inserts the array, and the index of the subscript is the same as usual.

Here, we allow a leaf to contain n triangles and directly push each node created_ Back, and then indexed by subscript. Use the simplest dichotomy to create a triangle array half at a time! The code for creating BVH is as follows:

Note:
Only the common binary tree code is given here
The code for the SAH optimized version can be found in the "complete code" section below

// Build BVH
int buildBVH(std::vector<Triangle>& triangles, std::vector<BVHNode>& nodes, int l, int r, int n) {
    if (l > r) return 0;

    // Note:
    // It cannot be operated by pointer, reference, etc. it must be operated by nodes[id]
    // Because STD:: vector < > will be copied to larger memory during capacity expansion, the address will change
    // The pointer and reference point to the original memory, so an error will occur
    nodes.push_back(BVHNode());
    int id = nodes.size() - 1;   // Note: save the index first
    
    nodes[id] Property initialization for ...

    // Calculate AABB
    for (int i = l; i <= r; i++) {
        ...		// Traversal triangle calculation AABB
    }

    // No more than n triangles return leaf nodes
    if ((r - l + 1) <= n) {
        nodes[id].n = r - l + 1;
        nodes[id].index = l;
        return id;
    }

    // Otherwise, recursive tree building
    // Divide the array by x,y,z
    std::sort(...)

    // recursion
    int mid = (l + r) / 2;
    int left = buildBVH(triangles, nodes, l, mid, n);
    int right = buildBVH(triangles, nodes, mid + 1, r, n);

    nodes[id].left = left;
    nodes[id].right = right;

    return id;
}

So far, we have a linearized binary tree. Next, we are ready to send it to the GPU

5. BVH data transfer to shader

Note:
This part of the code is also quite boring. Skip it

Similar to triangle data, encode and then transmit:

struct BVHNode_encoded {
    vec3 childs;        // (left, right, reserved)
    vec3 leafInfo;      // (n, index, reserved)
    vec3 AA, BB;        
};

The coding process is too boring to copy the code. Here is the code for decoding BVHNode in the shader:

#define SIZE_BVHNODE    4

uniform samplerBuffer nodes;

// Gets the BVHNode object of the ith subscript
BVHNode getBVHNode(int i) {
    BVHNode node;

    // Left and right subtree
    int offset = i * SIZE_BVHNODE;
    ivec3 childs = ivec3(texelFetch(nodes, offset + 0).xyz);
    ivec3 leafInfo = ivec3(texelFetch(nodes, offset + 1).xyz);
    node.left = int(childs.x);
    node.right = int(childs.y);
    node.n = int(leafInfo.x);
    node.index = int(leafInfo.y);

    // Bounding box
    node.AA = texelFetch(nodes, offset + 2).xyz;
    node.BB = texelFetch(nodes, offset + 3).xyz;

    return node;
}

Read a rabbit, and then traverse all nodes. For each leaf node, traverse all triangles it contains to see whether the rabbit is complete to verify whether the BVH transmission is correct:

Note:
This step is still a violent traversal. There is no binary traversal bvh, just to verify whether the data is correct

throw light 

...

for(int i=0; i<nNodes; i++) {
    BVHNode node = getBVHNode(i);
    if(node.n>0) {
        int L = node.index;
        int R = node.index + node.n - 1;
        HitResult res = hitArray(ray, L, R);
        if(res.isHit) fragColor = vec4(res.material.color, 1);
    }
}

If the same result as hitArray is returned, it means that we have no problem with BVH data transmission:

In addition, before traversal, it is recommended to output the left pointer of node 1. No accident, it should be 2 and put node Left / 3 is output as pixel color. If gray indicates that it is correct, be sure to ensure that the integer subscript is transmitted correctly!

6. Cross with AABB box

For axis aligned bounding boxes, there are three groups of penetration points and penetration points when the light enters and exits xoy, xoz and yoz planes. If a group of penetration points and penetration points are found, so that the distance between the starting point of the light and the penetration point is less than the distance between the starting point of the light and the penetration point, that is, t0 < T1, it indicates a hit

Let's take the minimum distance in out as t1 and the maximum distance in in as T0, and then see if t1 > t0. If the equation is satisfied, it indicates that the hit:

The corresponding GLSL codes are as follows:

// Intersection with aabb box. If there is no intersection, return - 1
float hitAABB(Ray r, vec3 AA, vec3 BB) {
    vec3 invdir = 1.0 / r.direction;

    vec3 f = (BB - r.startPoint) * invdir;
    vec3 n = (AA - r.startPoint) * invdir;

    vec3 tmax = max(f, n);
    vec3 tmin = min(f, n);

    float t1 = min(tmax.x, min(tmax.y, tmax.z));
    float t0 = max(tmin.x, max(tmin.y, tmin.z));

    return (t1 >= t0) ? ((t0 > 0.0) ? (t0) : (t1)) : (-1);
}

Note:
n is the near intersection, that is, in
f is the far intersection far, which is out
Because in and out are keywords in GLSL, they cannot be used as variable names

Next, test it. For the root node (node 1) of BVH, we intersect with its left and right subtrees respectively. If the left subtree hits, it returns red, the right subtree hits, it returns green, and both hit, it returns yellow:

...

BVHNode node = getBVHNode(1);
BVHNode left = getBVHNode(node.left);
BVHNode right = getBVHNode(node.right);

float r1 = hitAABB(ray, left.AA, left.BB);  
float r2 = hitAABB(ray, right.AA, right.BB);  

vec3 color;
if(r1>0) color = vec3(1, 0, 0);
if(r2>0) color = vec3(0, 1, 0);
if(r1>0 && r2>0) color = vec3(1, 1, 0);

...

The results are as follows:

The above results show that there are no problems in the construction, transmission and intersection of BVH

7. Non recursive traversal of BVH tree

There is no stack concept on GPU, nor can we execute recursive programs. We need to manually write code for binary tree traversal. For BVH tree, after intersecting with the root node, we always find its left and right subtrees, which is equivalent to the preorder traversal of binary tree, which is relatively easy to implement

Maintain a stack to hold nodes. First put the tree root on the stack, and then loop through while(!stack.empty()):

Pop the node root from the stack
If the right tree is not empty, push the right subtree of root into the stack
If the left tree is not empty, push the left subtree of root into the stack

It's very simple, but pay attention to the nodes accessed first and then put into the stack, because the access order of the stack is opposite, so as to ensure that the next top element of the stack must be the node accessed first. Here is a stack state diagram during non recursive traversal:

If you don't know if there are bug s in the code, you can test it in question 144 of leetcode. Here is Portal

So far, we have mastered the method of non recursive traversal of binary trees. There is no STL stack in GLSL, so we simulate the stack with an array and an sp pointer. The following is the GLSL code for traversing the nearest triangle of BVH query:

// Traversal BVH intersection
HitResult hitBVH(Ray ray) {
    HitResult res;
    res.isHit = false;
    res.distance = INF;

    // Stack
    int stack[256];
    int sp = 0;

    stack[sp++] = 1;
    while(sp>0) {
        int top = stack[--sp];
        BVHNode node = getBVHNode(top);
        
        // Is the leaf node, traverse the triangle and find the nearest intersection
        if(node.n>0) {
            int L = node.index;
            int R = node.index + node.n - 1;
            HitResult r = hitArray(ray, L, R);
            if(r.isHit && r.distance<res.distance) res = r;
            continue;
        }
        
        // Intersection with left and right box AABB
        float d1 = INF; // Left box distance
        float d2 = INF; // Right box distance
        if(node.left>0) {
            BVHNode leftNode = getBVHNode(node.left);
            d1 = hitAABB(ray, leftNode.AA, leftNode.BB);
        }
        if(node.right>0) {
            BVHNode rightNode = getBVHNode(node.right);
            d2 = hitAABB(ray, rightNode.AA, rightNode.BB);
        }

        // Search in the nearest box
        if(d1>0 && d2>0) {
            if(d1<d2) { // D1 < D2, left first
                stack[sp++] = node.right;
                stack[sp++] = node.left;
            } else {    // D2 < D1, right first
                stack[sp++] = node.left;
                stack[sp++] = node.right;
            }
        } else if(d1>0) {   // Hit left only
            stack[sp++] = node.left;
        } else if(d2>0) {   // Hit right only
            stack[sp++] = node.right;
        }
    }

    return res;
}

Here, by judging the distance of the intersection, the priority is to find the nearest box, which can be greatly accelerated. Replace the hitArray of the original violence search with a new hitBVH function, and then look at the effect again:

The number of frames is stable at about 100, which is more than 20 times more efficient than the previous five frames

8. Start ray tracing

8.1. Principle review

Review the rendering equation again:

Because the optical path is reversible, the energy of the light emitted into point p along the wi direction is equal to the energy of the light emitted along the wi direction from point q:

L i ( p , w i ) = L o ( q , w i ) L_i(p, w_i)=L_o(q, w_i) Li(p,wi)=Lo(q,wi)

Therefore, a recursive formula is formed:

L o ( p , w o ) = L e ( p , w o ) + ∫ Ω + L o ( q , w i ) f r ( p , w i , w o ) ( n ⋅ w i ) d w i L_o(p, w_o)=L_e(p, w_o)+\int_{\Omega^+}L_o(q, w_i) \ f_r(p, w_i, w_o) \ (n\cdot w_i) {\rm d} w_i Lo(p,wo)=Le(p,wo)+∫Ω+Lo(q,wi) fr(p,wi,wo) (n⋅wi)dwi

So the pseudo code is:

Note:
Because there is a random wi in the normal hemisphere at point p, the direction of wi is emission, which is opposite to the defined injection, so the negative sign is taken

The return result of each recursion is multiplied by f_r * cosine / pdf, but for no recursion in the shader, we use a loop instead. Maintain a variable history to record each recursion and return the cumulative multiplication of the result.

It is not consistent with the parameter entry of the above pseudo code. We give the surface information of a point p, that is, HitResult structure, an incident light direction viewDir and a maximum ejection times, and then solve the color of point p through pathTracing function:

throw light

...

// primary hit
HitResult firstHit = hitBVH(ray);
vec3 color;

if(!firstHit.isHit) {
    color = vec3(0);
} else {
    vec3 Le = firstHit.material.emissive;
    int maxBounce = 2;
    vec3 Li = pathTracing(firstHit, maxBounce);
    color = Le + Li;
}

fragColor = vec4(color, 1.0);

We just finished writing the main function, but pathTracing hasn't started yet. But we have to do some groundwork

8.2. Help function

Write three help functions:

Function of random numbers with uniform distribution of 0 ~ 1
Generating a function of a random vector uniformly distributed in the hemisphere
Function of any vector projected onto the normal hemisphere

The first is the random number with uniform distribution of 0 ~ 1. This part of the code is quoted from Casual Shadertoy Path Tracing Part 1 , and a uniform uint variable frameCounter (frame counter) is required as random seed, as well as width, height and NDC coordinate pix variables of current screen pixels:

uniform uint frameCounter;

uint seed = uint(
    uint((pix.x * 0.5 + 0.5) * width)  * uint(1973) + 
    uint((pix.y * 0.5 + 0.5) * height) * uint(9277) + 
    uint(frameCounter) * uint(26699)) | uint(1);

uint wang_hash(inout uint seed) {
    seed = uint(seed ^ uint(61)) ^ uint(seed >> uint(16));
    seed *= uint(9);
    seed = seed ^ (seed >> 4);
    seed *= uint(0x27d4eb2d);
    seed = seed ^ (seed >> 15);
    return seed;
}
 
float rand() {
    return float(wang_hash(seed)) / 4294967296.0;
}

The second is hemispherical uniform distribution, and this part of the code is quoted from PBRT 13.6 , where ξ 1 , ξ 2 \xi_1, \xi_2 ξ 1, ξ 2 ￠ is a random number evenly distributed from 0 to 1:

The code is as follows:

// Hemispherical uniform sampling
vec3 SampleHemisphere() {
    float z = rand();
    float r = max(0, sqrt(1.0 - z*z));
    float phi = 2.0 * PI * rand();
    return vec3(r * cos(phi), r * sin(phi), z);
}

Then it is projected to the normal hemisphere, and the code of this part is quoted from GPU Path Tracing in Unity – Part 2 , the GLSL implementation is different from the text:

// Project the vector v onto the normal hemisphere of N
vec3 toNormalHemisphere(vec3 v, vec3 N) {
    vec3 helper = vec3(1, 0, 0);
    if(abs(N.x)>0.999) helper = vec3(0, 0, 1);
    vec3 tangent = normalize(cross(N, helper));
    vec3 bitangent = normalize(cross(N, tangent));
    return v.x * tangent + v.y * bitangent + v.z * N;
}

8.3. Implementation of pathtracing

This is followed by the implementation of pathTracing. Here we only implement diffuse reflection

Hemispherical area 2 π 2\pi 2 π, here we take the probability density function pdf of diffuse reflection as 1 / 2 π 1/2\pi 1 / 2 π, in addition, about f_r here we take the surface color divided by π \pi π. Let's sell it first. Let's take it as a constant

Note:
Here f_r is actually BRDF, that is, bidirectional reflection distribution function
The value of the function BRDF(p, wi, wo) describes how much light can be emitted from Wo after scattering when light enters point p from wi
One conclusion is that the BRDF of diffuse reflection is the color value divided by pi
Remember here, my subsequent blog will have a detailed derivation

The specific pathTracing code is as follows:

// Path tracking
vec3 pathTracing(HitResult hit, int maxBounce) {

    vec3 Lo = vec3(0);      // Final color
    vec3 history = vec3(1); // Recursive accumulated color

    for(int bounce=0; bounce<maxBounce; bounce++) {
        // Random exit direction wi
        vec3 wi = toNormalHemisphere(SampleHemisphere(), hit.normal);

        // Diffuse: emitting rays randomly
        Ray randomRay;
        randomRay.startPoint = hit.hitPoint;
        randomRay.direction = wi;
        HitResult newHit = hitBVH(randomRay);

        float pdf = 1.0 / (2.0 * PI);                                   // Hemispherical uniform sampling probability density
        float cosine_o = max(0, dot(-hit.viewDir, hit.normal));         // Cosine of angle between incident light and normal
        float cosine_i = max(0, dot(randomRay.direction, hit.normal));  // Cosine of angle between outgoing light and normal
        vec3 f_r = hit.material.baseColor / PI;                         // Diffuse BRDF

        // Miss
        if(!newHit.isHit) {
            break;
        }
        
        // Hit light accumulation color
        vec3 Le = newHit.material.emissive;
        Lo += history * Le * f_r * cosine_i / pdf;
        
        // Recursion (step)
        hit = newHit;
        history *= f_r * cosine_i / pdf;  // Cumulative color
    }
    
    return Lo;
}

Run code:

The result is very noisy. This is because we want to accumulate the results of each frame as the integral value instead of taking each discrete sample separately. Therefore, we need to mix the rendering results of multiple frames

8.4. Multi frame mixing and post-processing

In order to obtain the result of the previous frame, we need to maintain a texture lastFrame to save the image of the previous frame. At the same time, in order to post process the output (such as gamma correction and tone mapping), we need to implement a simple pipeline:

pass1 is raytraced, mixed with the previous frame and output. pass2 frame buffer has only one color attachment, which is associated with lastFrame. The value of lastFrame can be updated by directly outputting the color, while pass3 is post-processing and final output.

For pipeline and frame caching, you should be familiar with the children's shoes of defer render or Minecraft shader. You can also refer to my blog: OpenGL learning (XI): delayed rendering pipeline , here we just change the first geometry pass to raytracing pass

Note:
We have to deal with geometry pass later

A RenderPass class is encapsulated here, where colorAttachments is the texture id to be passed into the next pass, and these textures will be used as color attachments for frame buffer. Then each pass directly calls draw, where texPassArray is the colorAttachments of the previous pass. For the specific c + + code implementation, please refer to the [complete] code section below:

class RenderPass {
public:
    std::vector<GLuint> colorAttachments;
    // More properties
    
    void bindData(bool finalPass = false) {
        
    }
    void draw(std::vector<GLuint> texPassArray = {}) {
        
    }
};

After we implement the pipeline, we add multi frame mixing in the code of the chip shader of pass1:

uniform sampler2D lastFrame;

...

// Blend with previous frame
vec3 lastColor = texture2D(lastFrame, pix.xy*0.5+0.5).rgb;
color = mix(lastColor, color, 1.0/float(frameCounter+1));

Again, the motion picture is compressed, and there is a little more noise:

Yeah! So far, we have realized light tracing on GPU, and the fitting process is still visual

9. HDR environment map

add a beautiful thing to a contrasting beautiful thing.

HDR picture is a panorama. Different from ordinary soap flakes, HDR picture can represent a wider range of brightness. The contrast between general pictures and HDR pictures when brightness changes is as follows:

Note:
Picture quoted from How to Create Your Own HDR Environment Maps
This is a blog about photography. In a sense, photography is also a process of sampling (or integrating) the real world. When the light gate of the camera photosensitive chip opens, the capacitor begins to accumulate photons (charge), and then read the value of the capacitor to obtain the pixel color
(Chunyun, I don't promise. Yes, I can't take photos, and I don't have money to buy a camera Orz

The general picture brightness is 255 when it is full, but the HDR brightness is the whole floating-point number range, which can better represent the real light, so it is used for environment mapping

First, you can ploy heaven Download HDR map above:

Then we need to read the HDR picture. Obviously, sol cannot be read (in fact, there is A pseudo HDR, which is realized through RGBE, RGBdivA and RGBdivA2, but it seems that there is A BUG with channel A always 128, so it can't be used...

Here we choose a lightweight Library: HDR Image Loader. It does not need to be installed, but only needs to be include d. Its code is in here

After successfully include, read the code of HDR diagram as follows:

#include "lib/hdrloader.h"

...

// hdr panorama
HDRLoaderResult hdrRes;
bool r = HDRLoader::load("./skybox/sunset.hdr", hdrRes);
GLuint hdrMap = Create a texture ()
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB32F, hdrRes.width, hdrRes.height, 0, GL_RGB, GL_FLOAT, hdrRes.cols);

Then in the shader, we directly output the following:

There are two questions:

The image is a little dark because there is no gamma correction
The image is reversed. Just flip y when sampling later
The image is distorted: we'll use spherical coord sampling later

Then we give a vector v and convert it into the texture coordinate uv of the sampled HDR image, which is the code reference of this part stack overflow , the difference from the original is that I flip the y coordinate:

// Convert the 3D vector v to the texture coordinate uv of HDR map
vec2 SampleSphericalMap(vec3 v) {
    vec2 uv = vec2(atan(v.z, v.x), asin(v.y));
    uv /= vec2(2.0 * PI, PI);
    uv += 0.5;
    uv.y = 1.0 - uv.y;
    return uv;
}

At this point, we can sample the HDR diagram:

// Get HDR ambient color
vec3 sampleHdr(vec3 v) {
    vec2 uv = SampleSphericalMap(normalize(v));
    vec3 color = texture2D(hdrMap, uv).rgb;
    //color = min(color, vec3(10));
    return color;
}

Note:
There is a sentence in the above code that min claps the HDR brightness to less than 10. Strictly speaking, this is wrong, and I also commented it out
However, if this is not done, many firefly images will appear, and the pictures will be noisy, as shown in the figure:

This is because there are points with too high brightness and small area in HDR, which makes a lot of samples occasionally accumulate a lot of energy (it may also be related to my random number generation algorithm? We use the pseudo-random of linear congruence. Try Sobol Sequence another day to see the effect...

Then, in the main function, in the processing of miss of primary ray, color = vec3(0) is replaced by:

color = sampleHdr(ray.direction);

In addition, in pathTracing, ray miss also needs to be processed:

// Miss
if(!newHit.isHit) {
    vec3 skyColor = sampleHdr(randomRay.direction);
    Lo += history * skyColor * f_r * cosine_i / pdf;
    break;
}

The maximum value of HDR in the following figure is clamp ed to within 10. You can see that the effect is still good:

If you don't clamp, you can cast a brighter shadow, but the disadvantage is that it is firefly. Later, you may try to solve it with importance sampling (escape)

10. Complete code

10.1. c + + code

#include <iostream>
#include <iomanip>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
#include <iostream>
#include <algorithm>
#include <ctime>

#include <GL/glew.h>
#include <GL/freeglut.h>
#include <glm/glm.hpp>
#include <glm/gtc/matrix_transform.hpp>
#include <glm/gtc/type_ptr.hpp>
#include <SOIL2/SOIL2.h>


#include "lib/hdrloader.h"

#define INF 114514.0

using namespace glm;

// ----------------------------------------------------------------------------- //

// Object surface material definition
struct Material {
    vec3 emissive = vec3(0, 0, 0);  // The luminous color when used as a light source
    vec3 baseColor = vec3(1, 1, 1);
    float subsurface = 0.0;
    float metallic = 0.0;
    float specular = 0.0;
    float specularTint = 0.0;
    float roughness = 0.0;
    float anisotropic = 0.0;
    float sheen = 0.0;
    float sheenTint = 0.0;
    float clearcoat = 0.0;
    float clearcoatGloss = 0.0;
    float IOR = 1.0;
    float transmission = 0.0;
};

// Triangle definition
struct Triangle {
    vec3 p1, p2, p3;    // Vertex coordinates
    vec3 n1, n2, n3;    // Vertex normal
    Material material;  // texture of material
};

// BVH tree node
struct BVHNode {
    int left, right;    // Left and right subtree index
    int n, index;       // Leaf node information               
    vec3 AA, BB;        // Collision box
};

// ----------------------------------------------------------------------------- //

struct Triangle_encoded {
    vec3 p1, p2, p3;    // Vertex coordinates
    vec3 n1, n2, n3;    // Vertex normal
    vec3 emissive;      // Self luminous parameters
    vec3 baseColor;     // colour
    vec3 param1;        // (subsurface, metallic, specular)
    vec3 param2;        // (specularTint, roughness, anisotropic)
    vec3 param3;        // (sheen, sheenTint, clearcoat)
    vec3 param4;        // (clearcoatGloss, IOR, transmission)
};

struct BVHNode_encoded {
    vec3 childs;        // (left, right, reserved)
    vec3 leafInfo;      // (n, index, reserved)
    vec3 AA, BB;        
};

// ----------------------------------------------------------------------------- //

class RenderPass {
public:
    GLuint FBO = 0;
    GLuint vao, vbo;
    std::vector<GLuint> colorAttachments;
    GLuint program;
    int width = 512;
    int height = 512;
    void bindData(bool finalPass = false) {
        if (!finalPass) glGenFramebuffers(1, &FBO);
        glBindFramebuffer(GL_FRAMEBUFFER, FBO);

        glGenBuffers(1, &vbo);
        glBindBuffer(GL_ARRAY_BUFFER, vbo);
        std::vector<vec3> square = { vec3(-1, -1, 0), vec3(1, -1, 0), vec3(-1, 1, 0), vec3(1, 1, 0), vec3(-1, 1, 0), vec3(1, -1, 0) };
        glBufferData(GL_ARRAY_BUFFER, sizeof(vec3) * square.size(), NULL, GL_STATIC_DRAW);
        glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(vec3) * square.size(), &square[0]);

        glGenVertexArrays(1, &vao);
        glBindVertexArray(vao);
        glEnableVertexAttribArray(0);   // layout (location = 0) 
        glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, (GLvoid*)0);
        // If it is not finalPass, the color attachment of frame buffer is generated
        if (!finalPass) {
            std::vector<GLuint> attachments;
            for (int i = 0; i < colorAttachments.size(); i++) {
                glBindTexture(GL_TEXTURE_2D, colorAttachments[i]);
                glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, colorAttachments[i], 0);// Bind color texture to color i attachment
                attachments.push_back(GL_COLOR_ATTACHMENT0 + i);
            }
            glDrawBuffers(attachments.size(), &attachments[0]);
        }
        
        glBindFramebuffer(GL_FRAMEBUFFER, 0);
    }
    void draw(std::vector<GLuint> texPassArray = {}) {
        glUseProgram(program);
        glBindFramebuffer(GL_FRAMEBUFFER, FBO);
        glBindVertexArray(vao);
        // Transmit the frame buffer color attachment of the previous frame
        for (int i = 0; i < texPassArray.size(); i++) {
            glActiveTexture(GL_TEXTURE0+i);
            glBindTexture(GL_TEXTURE_2D, texPassArray[i]);
            std::string uName = "texPass" + std::to_string(i);
            glUniform1i(glGetUniformLocation(program, uName.c_str()), i);
        }
        glViewport(0, 0, width, height);
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
        glDrawArrays(GL_TRIANGLES, 0, 6);

        glBindVertexArray(0);
        glBindFramebuffer(GL_FRAMEBUFFER, 0);
        glUseProgram(0);
    }
};

// ----------------------------------------------------------------------------- //

GLuint trianglesTextureBuffer;
GLuint nodesTextureBuffer;
GLuint lastFrame;
GLuint hdrMap;

RenderPass pass1;
RenderPass pass2;
RenderPass pass3;

// Camera parameters
float upAngle = 0.0;
float rotatAngle = 0.0;
float r = 4.0;

// ----------------------------------------------------------------------------- //

// Sort by triangle center -- comparison function
bool cmpx(const Triangle& t1, const Triangle& t2) {
    vec3 center1 = (t1.p1 + t1.p2 + t1.p3) / vec3(3, 3, 3);
    vec3 center2 = (t2.p1 + t2.p2 + t2.p3) / vec3(3, 3, 3);
    return center1.x < center2.x;
}
bool cmpy(const Triangle& t1, const Triangle& t2) {
    vec3 center1 = (t1.p1 + t1.p2 + t1.p3) / vec3(3, 3, 3);
    vec3 center2 = (t2.p1 + t2.p2 + t2.p3) / vec3(3, 3, 3);
    return center1.y < center2.y;
}
bool cmpz(const Triangle& t1, const Triangle& t2) {
    vec3 center1 = (t1.p1 + t1.p2 + t1.p3) / vec3(3, 3, 3);
    vec3 center2 = (t2.p1 + t2.p2 + t2.p3) / vec3(3, 3, 3);
    return center1.z < center2.z;
}

// ----------------------------------------------------------------------------- //

// Reads the file and returns a long string representing the file content
std::string readShaderFile(std::string filepath) {
    std::string res, line;
    std::ifstream fin(filepath);
    if (!fin.is_open())
    {
        std::cout << "file " << filepath << " Open failed" << std::endl;
        exit(-1);
    }
    while (std::getline(fin, line))
    {
        res += line + '\n';
    }
    fin.close();
    return res;
}

GLuint getTextureRGB32F(int width, int height) {
    GLuint tex;
    glGenTextures(1, &tex);
    glBindTexture(GL_TEXTURE_2D, tex);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, NULL);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
    return tex;
}

// Get shader object
GLuint getShaderProgram(std::string fshader, std::string vshader) {
    // Read shader source file
    std::string vSource = readShaderFile(vshader);
    std::string fSource = readShaderFile(fshader);
    const char* vpointer = vSource.c_str();
    const char* fpointer = fSource.c_str();

    // fault-tolerant
    GLint success;
    GLchar infoLog[512];

    // Create and compile vertex shaders
    GLuint vertexShader = glCreateShader(GL_VERTEX_SHADER);
    glShaderSource(vertexShader, 1, (const GLchar**)(&vpointer), NULL);
    glCompileShader(vertexShader);
    glGetShaderiv(vertexShader, GL_COMPILE_STATUS, &success);   // Error detection
    if (!success)
    {
        glGetShaderInfoLog(vertexShader, 512, NULL, infoLog);
        std::cout << "Vertex shader compilation error\n" << infoLog << std::endl;
        exit(-1);
    }

    // Create and compile clip shaders
    GLuint fragmentShader = glCreateShader(GL_FRAGMENT_SHADER);
    glShaderSource(fragmentShader, 1, (const GLchar**)(&fpointer), NULL);
    glCompileShader(fragmentShader);
    glGetShaderiv(fragmentShader, GL_COMPILE_STATUS, &success);   // Error detection
    if (!success)
    {
        glGetShaderInfoLog(fragmentShader, 512, NULL, infoLog);
        std::cout << "Fragment shader compilation error\n" << infoLog << std::endl;
        exit(-1);
    }

    // Link two shaders to the program object
    GLuint shaderProgram = glCreateProgram();
    glAttachShader(shaderProgram, vertexShader);
    glAttachShader(shaderProgram, fragmentShader);
    glLinkProgram(shaderProgram);

    // Delete shader object
    glDeleteShader(vertexShader);
    glDeleteShader(fragmentShader);

    return shaderProgram;
}

// ----------------------------------------------------------------------------- //

// Model transformation matrix
mat4 getTransformMatrix(vec3 rotateCtrl, vec3 translateCtrl, vec3 scaleCtrl) {
    glm::mat4 unit(    // Identity matrix
        glm::vec4(1, 0, 0, 0),
        glm::vec4(0, 1, 0, 0),
        glm::vec4(0, 0, 1, 0),
        glm::vec4(0, 0, 0, 1)
    );
    mat4 scale = glm::scale(unit, scaleCtrl);
    mat4 translate = glm::translate(unit, translateCtrl);
    mat4 rotate = unit;
    rotate = glm::rotate(rotate, glm::radians(rotateCtrl.x), glm::vec3(1, 0, 0));
    rotate = glm::rotate(rotate, glm::radians(rotateCtrl.y), glm::vec3(0, 1, 0));
    rotate = glm::rotate(rotate, glm::radians(rotateCtrl.z), glm::vec3(0, 0, 1));

    mat4 model = translate * rotate * scale;
    return model;
}

// Read obj
void readObj(std::string filepath, std::vector<Triangle>& triangles, Material material, mat4 trans, bool smoothNormal) {

    // Vertex position, index
    std::vector<vec3> vertices;
    std::vector<GLuint> indices;

    // Open file stream
    std::ifstream fin(filepath);
    std::string line;
    if (!fin.is_open()) {
        std::cout << "file " << filepath << " Open failed" << std::endl;
        exit(-1);
    }

    // Calculate the AABB box and normalize the model size
    float maxx = -11451419.19;
    float maxy = -11451419.19;
    float maxz = -11451419.19;
    float minx = 11451419.19;
    float miny = 11451419.19;
    float minz = 11451419.19;

    // Read by line
    while (std::getline(fin, line)) {
        std::istringstream sin(line);   // Parse and read a row of data as a string stream
        std::string type;
        GLfloat x, y, z;
        int v0, v1, v2;
        int vn0, vn1, vn2;
        int vt0, vt1, vt2;
        char slash;

        // Count the number of inclined rods and read them in different formats
        int slashCnt = 0;
        for (int i = 0; i < line.length(); i++) {
            if (line[i] == '/') slashCnt++;
        }

        // Read obj file
        sin >> type;
        if (type == "v") {
            sin >> x >> y >> z;
            vertices.push_back(vec3(x, y, z));
            maxx = max(maxx, x); maxy = max(maxx, y); maxz = max(maxx, z);
            minx = min(minx, x); miny = min(minx, y); minz = min(minx, z);
        }
        if (type == "f") {
            if (slashCnt == 6) {
                sin >> v0 >> slash >> vt0 >> slash >> vn0;
                sin >> v1 >> slash >> vt1 >> slash >> vn1;
                sin >> v2 >> slash >> vt2 >> slash >> vn2;
            }
            else if (slashCnt == 3) {
                sin >> v0 >> slash >> vt0;
                sin >> v1 >> slash >> vt1;
                sin >> v2 >> slash >> vt2;
            }
            else {
                sin >> v0 >> v1 >> v2;
            }
            indices.push_back(v0 - 1);
            indices.push_back(v1 - 1);
            indices.push_back(v2 - 1);
        }
    }

    // Model size normalization
    float lenx = maxx - minx;
    float leny = maxy - miny;
    float lenz = maxz - minz;
    float maxaxis = max(lenx, max(leny, lenz));
    for (auto& v : vertices) {
        v.x /= maxaxis;
        v.y /= maxaxis;
        v.z /= maxaxis;
    }

    // Coordinate transformation through matrix
    for (auto& v : vertices) {
        vec4 vv = vec4(v.x, v.y, v.z, 1);
        vv = trans * vv;
        v = vec3(vv.x, vv.y, vv.z);
    }

    // Generate Normals 
    std::vector<vec3> normals(vertices.size(), vec3(0, 0, 0));
    for (int i = 0; i < indices.size(); i += 3) {
        vec3 p1 = vertices[indices[i]];
        vec3 p2 = vertices[indices[i + 1]];
        vec3 p3 = vertices[indices[i + 2]];
        vec3 n = normalize(cross(p2 - p1, p3 - p1));
        normals[indices[i]] += n;
        normals[indices[i + 1]] += n;
        normals[indices[i + 2]] += n;
    }

    // Building an array of Triangle objects
    int offset = triangles.size();  // Incremental update
    triangles.resize(offset + indices.size() / 3);
    for (int i = 0; i < indices.size(); i += 3) {
        Triangle& t = triangles[offset + i / 3];
        // Pass vertex attributes
        t.p1 = vertices[indices[i]];
        t.p2 = vertices[indices[i + 1]];
        t.p3 = vertices[indices[i + 2]];
        if (!smoothNormal) {
            vec3 n = normalize(cross(t.p2 - t.p1, t.p3 - t.p1));
            t.n1 = n; t.n2 = n; t.n3 = n;
        }
        else {
            t.n1 = normalize(normals[indices[i]]);
            t.n2 = normalize(normals[indices[i + 1]]);
            t.n3 = normalize(normals[indices[i + 2]]);
        }

        // Transmission material
        t.material = material;
    }
}

// Build BVH
int buildBVH(std::vector<Triangle>& triangles, std::vector<BVHNode>& nodes, int l, int r, int n) {
    if (l > r) return 0;

    // Note:
    // It cannot be operated by pointer, reference, etc. it must be operated by nodes[id]
    // Because STD:: vector < > will be copied to larger memory during capacity expansion, the address will change
    // The pointer and reference point to the original memory, so an error will occur
    nodes.push_back(BVHNode());
    int id = nodes.size() - 1;   // Note: save the index first
    nodes[id].left = nodes[id].right = nodes[id].n = nodes[id].index = 0;
    nodes[id].AA = vec3(1145141919, 1145141919, 1145141919);
    nodes[id].BB = vec3(-1145141919, -1145141919, -1145141919);

    // Calculate AABB
    for (int i = l; i <= r; i++) {
        // Minimum point AA
        float minx = min(triangles[i].p1.x, min(triangles[i].p2.x, triangles[i].p3.x));
        float miny = min(triangles[i].p1.y, min(triangles[i].p2.y, triangles[i].p3.y));
        float minz = min(triangles[i].p1.z, min(triangles[i].p2.z, triangles[i].p3.z));
        nodes[id].AA.x = min(nodes[id].AA.x, minx);
        nodes[id].AA.y = min(nodes[id].AA.y, miny);
        nodes[id].AA.z = min(nodes[id].AA.z, minz);
        // Maximum point BB
        float maxx = max(triangles[i].p1.x, max(triangles[i].p2.x, triangles[i].p3.x));
        float maxy = max(triangles[i].p1.y, max(triangles[i].p2.y, triangles[i].p3.y));
        float maxz = max(triangles[i].p1.z, max(triangles[i].p2.z, triangles[i].p3.z));
        nodes[id].BB.x = max(nodes[id].BB.x, maxx);
        nodes[id].BB.y = max(nodes[id].BB.y, maxy);
        nodes[id].BB.z = max(nodes[id].BB.z, maxz);
    }

    // No more than n triangles return leaf nodes
    if ((r - l + 1) <= n) {
        nodes[id].n = r - l + 1;
        nodes[id].index = l;
        return id;
    }

    // Otherwise, recursive tree building
    float lenx = nodes[id].BB.x - nodes[id].AA.x;
    float leny = nodes[id].BB.y - nodes[id].AA.y;
    float lenz = nodes[id].BB.z - nodes[id].AA.z;
    // Divided by x
    if (lenx >= leny && lenx >= lenz)
        std::sort(triangles.begin() + l, triangles.begin() + r + 1, cmpx);
    // Divided by y
    if (leny >= lenx && leny >= lenz)
        std::sort(triangles.begin() + l, triangles.begin() + r + 1, cmpy);
    // Divided by z
    if (lenz >= lenx && lenz >= leny)
        std::sort(triangles.begin() + l, triangles.begin() + r + 1, cmpz);
    // recursion
    int mid = (l + r) / 2;
    int left = buildBVH(triangles, nodes, l, mid, n);
    int right = buildBVH(triangles, nodes, mid + 1, r, n);

    nodes[id].left = left;
    nodes[id].right = right;

    return id;
}

// SAH optimized construction of BVH
int buildBVHwithSAH(std::vector<Triangle>& triangles, std::vector<BVHNode>& nodes, int l, int r, int n) {
    if (l > r) return 0;

    nodes.push_back(BVHNode());
    int id = nodes.size() - 1;   
    nodes[id].left = nodes[id].right = nodes[id].n = nodes[id].index = 0;
    nodes[id].AA = vec3(1145141919, 1145141919, 1145141919);
    nodes[id].BB = vec3(-1145141919, -1145141919, -1145141919);

    // Calculate AABB
    for (int i = l; i <= r; i++) {
        // Minimum point AA
        float minx = min(triangles[i].p1.x, min(triangles[i].p2.x, triangles[i].p3.x));
        float miny = min(triangles[i].p1.y, min(triangles[i].p2.y, triangles[i].p3.y));
        float minz = min(triangles[i].p1.z, min(triangles[i].p2.z, triangles[i].p3.z));
        nodes[id].AA.x = min(nodes[id].AA.x, minx);
        nodes[id].AA.y = min(nodes[id].AA.y, miny);
        nodes[id].AA.z = min(nodes[id].AA.z, minz);
        // Maximum point BB
        float maxx = max(triangles[i].p1.x, max(triangles[i].p2.x, triangles[i].p3.x));
        float maxy = max(triangles[i].p1.y, max(triangles[i].p2.y, triangles[i].p3.y));
        float maxz = max(triangles[i].p1.z, max(triangles[i].p2.z, triangles[i].p3.z));
        nodes[id].BB.x = max(nodes[id].BB.x, maxx);
        nodes[id].BB.y = max(nodes[id].BB.y, maxy);
        nodes[id].BB.z = max(nodes[id].BB.z, maxz);
    }

    // No more than n triangles return leaf nodes
    if ((r - l + 1) <= n) {
        nodes[id].n = r - l + 1;
        nodes[id].index = l;
        return id;
    }

    // Otherwise, recursive tree building
    float Cost = INF;
    int Axis = 0;
    int Split = (l + r) / 2;
    for (int axis = 0; axis < 3; axis++) {
        // Sort by x, y and z axes respectively
        if (axis == 0) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpx);
        if (axis == 1) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpy);
        if (axis == 2) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpz);

        // Leftmax [i]: the maximum xyz value in [l, I]
        // Leftmin [i]: the smallest xyz value in [l, I]
        std::vector<vec3> leftMax(r - l + 1, vec3(-INF, -INF, -INF));
        std::vector<vec3> leftMin(r - l + 1, vec3(INF, INF, INF));
        // Calculate prefix note i-l to align to subscript 0
        for (int i = l; i <= r; i++) {
            Triangle& t = triangles[i];
            int bias = (i == l) ? 0 : 1;  // Special treatment of the first element

            leftMax[i - l].x = max(leftMax[i - l - bias].x, max(t.p1.x, max(t.p2.x, t.p3.x)));
            leftMax[i - l].y = max(leftMax[i - l - bias].y, max(t.p1.y, max(t.p2.y, t.p3.y)));
            leftMax[i - l].z = max(leftMax[i - l - bias].z, max(t.p1.z, max(t.p2.z, t.p3.z)));

            leftMin[i - l].x = min(leftMin[i - l - bias].x, min(t.p1.x, min(t.p2.x, t.p3.x)));
            leftMin[i - l].y = min(leftMin[i - l - bias].y, min(t.p1.y, min(t.p2.y, t.p3.y)));
            leftMin[i - l].z = min(leftMin[i - l - bias].z, min(t.p1.z, min(t.p2.z, t.p3.z)));
        }

        // Rightmax [i]: the largest xyz value in [I, R]
        // Rightmin [i]: the smallest xyz value in [I, R]
        std::vector<vec3> rightMax(r - l + 1, vec3(-INF, -INF, -INF));
        std::vector<vec3> rightMin(r - l + 1, vec3(INF, INF, INF));
        // Calculate suffix note i-l to align to subscript 0
        for (int i = r; i >= l; i--) {
            Triangle& t = triangles[i];
            int bias = (i == r) ? 0 : 1;  // Special treatment of the first element

            rightMax[i - l].x = max(rightMax[i - l + bias].x, max(t.p1.x, max(t.p2.x, t.p3.x)));
            rightMax[i - l].y = max(rightMax[i - l + bias].y, max(t.p1.y, max(t.p2.y, t.p3.y)));
            rightMax[i - l].z = max(rightMax[i - l + bias].z, max(t.p1.z, max(t.p2.z, t.p3.z)));

            rightMin[i - l].x = min(rightMin[i - l + bias].x, min(t.p1.x, min(t.p2.x, t.p3.x)));
            rightMin[i - l].y = min(rightMin[i - l + bias].y, min(t.p1.y, min(t.p2.y, t.p3.y)));
            rightMin[i - l].z = min(rightMin[i - l + bias].z, min(t.p1.z, min(t.p2.z, t.p3.z)));
        }

        // Traversal search segmentation
        float cost = INF;
        int split = l;
        for (int i = l; i <= r - 1; i++) {
            float lenx, leny, lenz;
            // Left [l, i]
            vec3 leftAA = leftMin[i - l];
            vec3 leftBB = leftMax[i - l];
            lenx = leftBB.x - leftAA.x;
            leny = leftBB.y - leftAA.y;
            lenz = leftBB.z - leftAA.z;
            float leftS = 2.0 * ((lenx * leny) + (lenx * lenz) + (leny * lenz));
            float leftCost = leftS * (i - l + 1);

            // Right [i+1, r]
            vec3 rightAA = rightMin[i + 1 - l];
            vec3 rightBB = rightMax[i + 1 - l];
            lenx = rightBB.x - rightAA.x;
            leny = rightBB.y - rightAA.y;
            lenz = rightBB.z - rightAA.z;
            float rightS = 2.0 * ((lenx * leny) + (lenx * lenz) + (leny * lenz));
            float rightCost = rightS * (r - i);

            // Record the minimum answer for each split
            float totalCost = leftCost + rightCost;
            if (totalCost < cost) {
                cost = totalCost;
                split = i;
            }
        }
        // Record the best answer for each axis
        if (cost < Cost) {
            Cost = cost;
            Axis = axis;
            Split = split;
        }
    }

    // Split by best axis
    if (Axis == 0) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpx);
    if (Axis == 1) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpy);
    if (Axis == 2) std::sort(&triangles[0] + l, &triangles[0] + r + 1, cmpz);

    // recursion
    int left  = buildBVHwithSAH(triangles, nodes, l, Split, n);
    int right = buildBVHwithSAH(triangles, nodes, Split + 1, r, n);

    nodes[id].left = left;
    nodes[id].right = right;

    return id;
}

// ----------------------------------------------------------------------------- //

// draw
clock_t t1, t2;
double dt, fps;
unsigned int frameCounter = 0;
void display() {
    
    // Frame timing
    t2 = clock();
    dt = (double)(t2 - t1) / CLOCKS_PER_SEC;
    fps = 1.0 / dt;
    std::cout << "\r";
    std::cout << std::fixed << std::setprecision(2) << "FPS : " << fps << "    Number of iterations: " << frameCounter;
    t1 = t2;

    // Camera parameters
    vec3 eye = vec3(-sin(radians(rotatAngle)) * cos(radians(upAngle)), sin(radians(upAngle)), cos(radians(rotatAngle)) * cos(radians(upAngle)));
    eye.x *= r; eye.y *= r; eye.z *= r;
    mat4 cameraRotate = lookAt(eye, vec3(0, 0, 0), vec3(0, 1, 0));  // The camera looks at the origin
    cameraRotate = inverse(cameraRotate);   // The inverse matrix of lookat converts the light direction

    // Pass uniform to pass1
    glUseProgram(pass1.program);
    glUniform3fv(glGetUniformLocation(pass1.program, "eye"), 1, value_ptr(eye));
    glUniformMatrix4fv(glGetUniformLocation(pass1.program, "cameraRotate"), 1, GL_FALSE, value_ptr(cameraRotate));
    glUniform1ui(glGetUniformLocation(pass1.program, "frameCounter"), frameCounter++);// Transmit counter as random seed

    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_BUFFER, trianglesTextureBuffer);
    glUniform1i(glGetUniformLocation(pass1.program, "triangles"), 0);

    glActiveTexture(GL_TEXTURE1);
    glBindTexture(GL_TEXTURE_BUFFER, nodesTextureBuffer);
    glUniform1i(glGetUniformLocation(pass1.program, "nodes"), 1);

    glActiveTexture(GL_TEXTURE2);
    glBindTexture(GL_TEXTURE_2D, lastFrame);
    glUniform1i(glGetUniformLocation(pass1.program, "lastFrame"), 2);

    glActiveTexture(GL_TEXTURE3);
    glBindTexture(GL_TEXTURE_2D, hdrMap);
    glUniform1i(glGetUniformLocation(pass1.program, "hdrMap"), 3);

    // draw
    pass1.draw();
    pass2.draw(pass1.colorAttachments);
    pass3.draw(pass2.colorAttachments);

    glutSwapBuffers();
}

// Every frame
void frameFunc() {
    glutPostRedisplay();
}

// Mouse motion function
double lastX = 0.0, lastY = 0.0;
void mouse(int x, int y) {
    frameCounter = 0;
    // Adjust rotation
    rotatAngle += 150 * (x - lastX) / 512;
    upAngle += 150 * (y - lastY) / 512;
    upAngle = min(upAngle, 89.0f);
    upAngle = max(upAngle, -89.0f);
    lastX = x, lastY = y;
    glutPostRedisplay();    // Repaint 
}

// Mouse down
void mouseDown(int button, int state, int x, int y) {
    if (button == GLUT_LEFT_BUTTON && state == GLUT_DOWN) {
        lastX = x, lastY = y;
    }
}

// Mouse wheel function
void mouseWheel(int wheel, int direction, int x, int y) {
    frameCounter = 0;
    r += -direction * 0.5;
    glutPostRedisplay();    // Repaint 
}

// ----------------------------------------------------------------------------- //

int main(int argc, char** argv) {

    glutInit(&argc, argv);              // glut initialization
    glutInitDisplayMode(GLUT_RGBA | GLUT_DEPTH);
    glutInitWindowSize(512, 512);// Window size
    glutInitWindowPosition(350, 50);
    glutCreateWindow("Path Tracing GPU"); // Create OpenGL context
    glewInit();

    // ----------------------------------------------------------------------------- //

    // scene config
    std::vector<Triangle> triangles;

    Material m;
    m.baseColor = vec3(0, 1, 1);
    readObj("models/Stanford Bunny.obj", triangles, m, getTransformMatrix(vec3(0, 0, 0), vec3(0.3, -1.6, 0), vec3(1.5, 1.5, 1.5)),true);

    m.baseColor = vec3(0.725, 0.71, 0.68);
    readObj("models/quad.obj", triangles, m, getTransformMatrix(vec3(0, 0, 0), vec3(0, -1.4, 0), vec3(18.83, 0.01, 18.83)), false);

    m.baseColor = vec3(1, 1, 1);
    m.emissive = vec3(20, 20, 20);
    readObj("models/quad.obj", triangles, m, getTransformMatrix(vec3(0, 0, 0), vec3(0.0, 1.38, -0.0), vec3(0.7, 0.01, 0.7)), false);

    int nTriangles = triangles.size();
    std::cout << "Model read complete: common " << nTriangles << " Three triangles" << std::endl;

    // Establish bvh
    BVHNode testNode;
    testNode.left = 255;
    testNode.right = 128;
    testNode.n = 30;
    testNode.AA = vec3(1, 1, 0);
    testNode.BB = vec3(0, 1, 0);
    std::vector<BVHNode> nodes{ testNode };
    //buildBVH(triangles, nodes, 0, triangles.size() - 1, 8);
    buildBVHwithSAH(triangles, nodes, 0, triangles.size() - 1, 8);
    int nNodes = nodes.size();
    std::cout << "BVH Build complete: common " << nNodes << " Nodes" << std::endl;

    // Coded triangle, material
    std::vector<Triangle_encoded> triangles_encoded(nTriangles);
    for (int i = 0; i < nTriangles; i++) {
        Triangle& t = triangles[i];
        Material& m = t.material;
        // Vertex Position 
        triangles_encoded[i].p1 = t.p1;
        triangles_encoded[i].p2 = t.p2;
        triangles_encoded[i].p3 = t.p3;
        // Vertex normal
        triangles_encoded[i].n1 = t.n1;
        triangles_encoded[i].n2 = t.n2;
        triangles_encoded[i].n3 = t.n3;
        // texture of material
        triangles_encoded[i].emissive = m.emissive;
        triangles_encoded[i].baseColor = m.baseColor;
        triangles_encoded[i].param1 = vec3(m.subsurface, m.metallic, m.specular);
        triangles_encoded[i].param2 = vec3(m.specularTint, m.roughness, m.anisotropic);
        triangles_encoded[i].param3 = vec3(m.sheen, m.sheenTint, m.clearcoat);
        triangles_encoded[i].param4 = vec3(m.clearcoatGloss, m.IOR, m.transmission);
    }

    // Code BVHNode, aabb
    std::vector<BVHNode_encoded> nodes_encoded(nNodes);
    for (int i = 0; i < nNodes; i++) {
        nodes_encoded[i].childs = vec3(nodes[i].left, nodes[i].right, 0);
        nodes_encoded[i].leafInfo = vec3(nodes[i].n, nodes[i].index, 0);
        nodes_encoded[i].AA = nodes[i].AA;
        nodes_encoded[i].BB = nodes[i].BB;
    }

    // ----------------------------------------------------------------------------- //

    // Generate texture

    // Triangle array
    GLuint tbo0;
    glGenBuffers(1, &tbo0);
    glBindBuffer(GL_TEXTURE_BUFFER, tbo0);
    glBufferData(GL_TEXTURE_BUFFER, triangles_encoded.size() * sizeof(Triangle_encoded), &triangles_encoded[0], GL_STATIC_DRAW);
    glGenTextures(1, &trianglesTextureBuffer);
    glBindTexture(GL_TEXTURE_BUFFER, trianglesTextureBuffer);
    glTexBuffer(GL_TEXTURE_BUFFER, GL_RGB32F, tbo0);

    // BVHNode array
    GLuint tbo1;
    glGenBuffers(1, &tbo1);
    glBindBuffer(GL_TEXTURE_BUFFER, tbo1);
    glBufferData(GL_TEXTURE_BUFFER, nodes_encoded.size() * sizeof(BVHNode_encoded), &nodes_encoded[0], GL_STATIC_DRAW);
    glGenTextures(1, &nodesTextureBuffer);
    glBindTexture(GL_TEXTURE_BUFFER, nodesTextureBuffer);
    glTexBuffer(GL_TEXTURE_BUFFER, GL_RGB32F, tbo1);

    // hdr panorama
    HDRLoaderResult hdrRes;
    bool r = HDRLoader::load("./HDR/sunset.hdr", hdrRes);
    hdrMap = getTextureRGB32F(hdrRes.width, hdrRes.height);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB32F, hdrRes.width, hdrRes.height, 0, GL_RGB, GL_FLOAT, hdrRes.cols);

    // ----------------------------------------------------------------------------- //

    // Pipeline configuration

    pass1.program = getShaderProgram("./shaders/fshader.fsh", "./shaders/vshader.vsh");
    //pass1.width = pass1.height = 256;
    pass1.colorAttachments.push_back(getTextureRGB32F(pass1.width, pass1.height));
    pass1.colorAttachments.push_back(getTextureRGB32F(pass1.width, pass1.height));
    pass1.colorAttachments.push_back(getTextureRGB32F(pass1.width, pass1.height));
    pass1.bindData();
   
    glUseProgram(pass1.program);
    glUniform1i(glGetUniformLocation(pass1.program, "nTriangles"), triangles.size());
    glUniform1i(glGetUniformLocation(pass1.program, "nNodes"), nodes.size());
    glUniform1i(glGetUniformLocation(pass1.program, "width"), pass1.width);
    glUniform1i(glGetUniformLocation(pass1.program, "height"), pass1.height);
    glUseProgram(0);

    pass2.program = getShaderProgram("./shaders/pass2.fsh", "./shaders/vshader.vsh");
    lastFrame = getTextureRGB32F(pass2.width, pass2.height);
    pass2.colorAttachments.push_back(lastFrame);
    pass2.bindData();

    pass3.program = getShaderProgram("./shaders/pass3.fsh", "./shaders/vshader.vsh");
    pass3.bindData(true);


    // ----------------------------------------------------------------------------- //

    std::cout << "start..." << std::endl << std::endl;

    glEnable(GL_DEPTH_TEST);  // Opening depth test
    glClearColor(0.0, 0.0, 0.0, 1.0);   // Background color - Black

    glutDisplayFunc(display);   // Set display callback function
    glutIdleFunc(frameFunc);    // Refresh when idle
    glutMotionFunc(mouse);      // Mouse drag
    glutMouseFunc(mouseDown);   // Press the left mouse button
    glutMouseWheelFunc(mouseWheel); // Wheel zoom
    glutMainLoop();

    return 0;
}

10.2. Slice shader (pass1, raytrace)

#version 330 core

in vec3 pix;
out vec4 fragColor;

// ----------------------------------------------------------------------------- //

uniform uint frameCounter;
uniform int nTriangles;
uniform int nNodes;
uniform int width;
uniform int height;

uniform samplerBuffer triangles;
uniform samplerBuffer nodes;

uniform sampler2D lastFrame;
uniform sampler2D hdrMap;

uniform vec3 eye;
uniform mat4 cameraRotate;

// ----------------------------------------------------------------------------- //

#define PI              3.1415926
#define INF             114514.0
#define SIZE_TRIANGLE   12
#define SIZE_BVHNODE    4

// ----------------------------------------------------------------------------- //

// Triangle data format
struct Triangle {
    vec3 p1, p2, p3;    // Vertex coordinates
    vec3 n1, n2, n3;    // Vertex normal
};

// BVH tree node
struct BVHNode {
    int left;           // Left subtree
    int right;          // Right subtree
    int n;              // Contains the number of triangles
    int index;          // Triangle index
    vec3 AA, BB;        // Collision box
};

// Object surface material definition
struct Material {
    vec3 emissive;          // The luminous color when used as a light source
    vec3 baseColor;
    float subsurface;
    float metallic;
    float specular;
    float specularTint;
    float roughness;
    float anisotropic;
    float sheen;
    float sheenTint;
    float clearcoat;
    float clearcoatGloss;
    float IOR;
    float transmission;
};

// light
struct Ray {
    vec3 startPoint;
    vec3 direction;
};

// Ray intersection result
struct HitResult {
    bool isHit;             // Hit
    bool isInside;          // Whether to hit from inside
    float distance;         // Distance from intersection
    vec3 hitPoint;          // Light midpoint
    vec3 normal;            // Hit point normal
    vec3 viewDir;           // The direction of the light hitting the point
    Material material;      // Surface material of hit point
};

// ----------------------------------------------------------------------------- //

/*
 * Generate random vectors, depending on the frameCounter frame counter
 * Code source: https://blog.demofox.org/2020/05/25/casual-shadertoy-path-tracing-1-basic-camera-diffuse-emissive/
*/

uint seed = uint(
    uint((pix.x * 0.5 + 0.5) * width)  * uint(1973) + 
    uint((pix.y * 0.5 + 0.5) * height) * uint(9277) + 
    uint(frameCounter) * uint(26699)) | uint(1);

uint wang_hash(inout uint seed) {
    seed = uint(seed ^ uint(61)) ^ uint(seed >> uint(16));
    seed *= uint(9);
    seed = seed ^ (seed >> 4);
    seed *= uint(0x27d4eb2d);
    seed = seed ^ (seed >> 15);
    return seed;
}
 
float rand() {
    return float(wang_hash(seed)) / 4294967296.0;
}

// ----------------------------------------------------------------------------- //

// Hemispherical uniform sampling
vec3 SampleHemisphere() {
    float z = rand();
    float r = max(0, sqrt(1.0 - z*z));
    float phi = 2.0 * PI * rand();
    return vec3(r * cos(phi), r * sin(phi), z);
}

/*
vec3 toNormalHemisphere(vec3 v, vec3 N) {
    vec3 tangent = vec3(0);
    if(N.yz==vec2(0)) tangent = vec3(0, 0, -N.x);
    else if(N.xz==vec2(0)) tangent = vec3(0, 0, N.y);
    else if(N.xy==vec2(0)) tangent = vec3(-N.z, 0, 0);
    else if(abs(N.x)>abs(N.y)) tangent = normalize(vec3(0, N.z, -N.y));
    else tangent = normalize(vec3(-N.z, 0, N.x)); 
    vec3 bitangent = cross(N, tangent);
    return normalize(v.x * tangent + v.y * bitangent + v.z * N);
}
*/

// Project the vector v onto the normal hemisphere of N
vec3 toNormalHemisphere(vec3 v, vec3 N) {
    vec3 helper = vec3(1, 0, 0);
    if(abs(N.x)>0.999) helper = vec3(0, 0, 1);
    vec3 tangent = normalize(cross(N, helper));
    vec3 bitangent = normalize(cross(N, tangent));
    return v.x * tangent + v.y * bitangent + v.z * N;
}

// ----------------------------------------------------------------------------- //

// Convert the 3D vector v to the texture coordinate uv of HDR map
vec2 SampleSphericalMap(vec3 v) {
    vec2 uv = vec2(atan(v.z, v.x), asin(v.y));
    uv /= vec2(2.0 * PI, PI);
    uv += 0.5;
    uv.y = 1.0 - uv.y;
    return uv;
}

// Get HDR ambient color
vec3 sampleHdr(vec3 v) {
    vec2 uv = SampleSphericalMap(normalize(v));
    vec3 color = texture2D(hdrMap, uv).rgb;
    color = min(color, vec3(10));
    return color;
}

// ----------------------------------------------------------------------------- //

// Gets the triangle with the i-th subscript
Triangle getTriangle(int i) {
    int offset = i * SIZE_TRIANGLE;
    Triangle t;

    // Vertex coordinates
    t.p1 = texelFetch(triangles, offset + 0).xyz;
    t.p2 = texelFetch(triangles, offset + 1).xyz;
    t.p3 = texelFetch(triangles, offset + 2).xyz;
    // normal
    t.n1 = texelFetch(triangles, offset + 3).xyz;
    t.n2 = texelFetch(triangles, offset + 4).xyz;
    t.n3 = texelFetch(triangles, offset + 5).xyz;

    return t;
}

// Gets the material of the triangle with subscript i
Material getMaterial(int i) {
    Material m;

    int offset = i * SIZE_TRIANGLE;
    vec3 param1 = texelFetch(triangles, offset + 8).xyz;
    vec3 param2 = texelFetch(triangles, offset + 9).xyz;
    vec3 param3 = texelFetch(triangles, offset + 10).xyz;
    vec3 param4 = texelFetch(triangles, offset + 11).xyz;
    
    m.emissive = texelFetch(triangles, offset + 6).xyz;
    m.baseColor = texelFetch(triangles, offset + 7).xyz;
    m.subsurface = param1.x;
    m.metallic = param1.y;
    m.specular = param1.z;
    m.specularTint = param2.x;
    m.roughness = param2.y;
    m.anisotropic = param2.z;
    m.sheen = param3.x;
    m.sheenTint = param3.y;
    m.clearcoat = param3.z;
    m.clearcoatGloss = param4.x;
    m.IOR = param4.y;
    m.transmission = param4.z;

    return m;
}

// Gets the BVHNode object of the ith subscript
BVHNode getBVHNode(int i) {
    BVHNode node;

    // Left and right subtree
    int offset = i * SIZE_BVHNODE;
    ivec3 childs = ivec3(texelFetch(nodes, offset + 0).xyz);
    ivec3 leafInfo = ivec3(texelFetch(nodes, offset + 1).xyz);
    node.left = int(childs.x);
    node.right = int(childs.y);
    node.n = int(leafInfo.x);
    node.index = int(leafInfo.y);

    // Bounding box
    node.AA = texelFetch(nodes, offset + 2).xyz;
    node.BB = texelFetch(nodes, offset + 3).xyz;

    return node;
}

// ----------------------------------------------------------------------------- //

// Intersection of light and triangle 
HitResult hitTriangle(Triangle triangle, Ray ray) {
    HitResult res;
    res.distance = INF;
    res.isHit = false;
    res.isInside = false;

    vec3 p1 = triangle.p1;
    vec3 p2 = triangle.p2;
    vec3 p3 = triangle.p3;

    vec3 S = ray.startPoint;    // Ray starting point
    vec3 d = ray.direction;     // Ray direction
    vec3 N = normalize(cross(p2-p1, p3-p1));    // Normal vector

    // Hit from behind the triangle (inside the model)
    if (dot(N, d) > 0.0f) {
        N = -N;   
        res.isInside = true;
    }

    // If the line of sight is parallel to the triangle
    if (abs(dot(N, d)) < 0.00001f) return res;

    // distance
    float t = (dot(N, p1) - dot(S, N)) / dot(d, N);
    if (t < 0.0005f) return res;    // If the triangle is on the back of the light

    // Intersection calculation
    vec3 P = S + d * t;

    // Determine whether the intersection is in the triangle
    vec3 c1 = cross(p2 - p1, P - p1);
    vec3 c2 = cross(p3 - p2, P - p2);
    vec3 c3 = cross(p1 - p3, P - p3);
    bool r1 = (dot(c1, N) > 0 && dot(c2, N) > 0 && dot(c3, N) > 0);
    bool r2 = (dot(c1, N) < 0 && dot(c2, N) < 0 && dot(c3, N) < 0);

    // Hit, encapsulate the returned result
    if (r1 || r2) {
        res.isHit = true;
        res.hitPoint = P;
        res.distance = t;
        res.normal = N;
        res.viewDir = d;
        // Interpolate vertex normals according to the intersection position
        float alpha = (-(P.x-p2.x)*(p3.y-p2.y) + (P.y-p2.y)*(p3.x-p2.x)) / (-(p1.x-p2.x-0.00005)*(p3.y-p2.y+0.00005) + (p1.y-p2.y+0.00005)*(p3.x-p2.x+0.00005));
        float beta  = (-(P.x-p3.x)*(p1.y-p3.y) + (P.y-p3.y)*(p1.x-p3.x)) / (-(p2.x-p3.x-0.00005)*(p1.y-p3.y+0.00005) + (p2.y-p3.y+0.00005)*(p1.x-p3.x+0.00005));
        float gama  = 1.0 - alpha - beta;
        vec3 Nsmooth = alpha * triangle.n1 + beta * triangle.n2 + gama * triangle.n3;
        res.normal = (res.isInside) ? (-Nsmooth) : (Nsmooth);
    }

    return res;
}

// Intersection with aabb box. If there is no intersection, return - 1
float hitAABB(Ray r, vec3 AA, vec3 BB) {
    vec3 invdir = 1.0 / r.direction;

    vec3 f = (BB - r.startPoint) * invdir;
    vec3 n = (AA - r.startPoint) * invdir;

    vec3 tmax = max(f, n);
    vec3 tmin = min(f, n);

    float t1 = min(tmax.x, min(tmax.y, tmax.z));
    float t0 = max(tmin.x, max(tmin.y, tmin.z));

    return (t1 >= t0) ? ((t0 > 0.0) ? (t0) : (t1)) : (-1);
}

// ----------------------------------------------------------------------------- //

// Brutally traverse the array subscript range [l, r] to find the nearest intersection
HitResult hitArray(Ray ray, int l, int r) {
    HitResult res;
    res.isHit = false;
    res.distance = INF;
    for(int i=l; i<=r; i++) {
        Triangle triangle = getTriangle(i);
        HitResult r = hitTriangle(triangle, ray);
        if(r.isHit && r.distance<res.distance) {
            res = r;
            res.material = getMaterial(i);
        }
    }
    return res;
}

// Traversal BVH intersection
HitResult hitBVH(Ray ray) {
    HitResult res;
    res.isHit = false;
    res.distance = INF;

    // Stack
    int stack[256];
    int sp = 0;

    stack[sp++] = 1;
    while(sp>0) {
        int top = stack[--sp];
        BVHNode node = getBVHNode(top);
        
        // Is the leaf node, traverse the triangle and find the nearest intersection
        if(node.n>0) {
            int L = node.index;
            int R = node.index + node.n - 1;
            HitResult r = hitArray(ray, L, R);
            if(r.isHit && r.distance<res.distance) res = r;
            continue;
        }
        
        // Intersection with left and right box AABB
        float d1 = INF; // Left box distance
        float d2 = INF; // Right box distance
        if(node.left>0) {
            BVHNode leftNode = getBVHNode(node.left);
            d1 = hitAABB(ray, leftNode.AA, leftNode.BB);
        }
        if(node.right>0) {
            BVHNode rightNode = getBVHNode(node.right);
            d2 = hitAABB(ray, rightNode.AA, rightNode.BB);
        }

        // Search in the nearest box
        if(d1>0 && d2>0) {
            if(d1<d2) { // D1 < D2, left first
                stack[sp++] = node.right;
                stack[sp++] = node.left;
            } else {    // D2 < D1, right first
                stack[sp++] = node.left;
                stack[sp++] = node.right;
            }
        } else if(d1>0) {   // Hit left only
            stack[sp++] = node.left;
        } else if(d2>0) {   // Hit right only
            stack[sp++] = node.right;
        }
    }

    return res;
}

// ----------------------------------------------------------------------------- //

// Path tracking
vec3 pathTracing(HitResult hit, int maxBounce) {

    vec3 Lo = vec3(0);      // Final color
    vec3 history = vec3(1); // Recursive accumulated color

    for(int bounce=0; bounce<maxBounce; bounce++) {
        // Random exit direction wi
        vec3 wi = toNormalHemisphere(SampleHemisphere(), hit.normal);

        // Diffuse: emitting rays randomly
        Ray randomRay;
        randomRay.startPoint = hit.hitPoint;
        randomRay.direction = wi;
        HitResult newHit = hitBVH(randomRay);

        float pdf = 1.0 / (2.0 * PI);                                   // Hemispherical uniform sampling probability density
        float cosine_o = max(0, dot(-hit.viewDir, hit.normal));         // Cosine of angle between incident light and normal
        float cosine_i = max(0, dot(randomRay.direction, hit.normal));  // Cosine of angle between outgoing light and normal
        vec3 f_r = hit.material.baseColor / PI;                         // Diffuse BRDF

        // Miss
        if(!newHit.isHit) {
            vec3 skyColor = sampleHdr(randomRay.direction);
            Lo += history * skyColor * f_r * cosine_i / pdf;
            break;
        }
        
        // Hit light accumulation color
        vec3 Le = newHit.material.emissive;
        Lo += history * Le * f_r * cosine_i / pdf;
        
        // Recursion (step)
        hit = newHit;
        history *= f_r * cosine_i / pdf;  // Cumulative color
    }
    
    return Lo;
}

// ----------------------------------------------------------------------------- //

void main() {

    // throw light
    Ray ray;
    
    ray.startPoint = eye;
    vec2 AA = vec2((rand()-0.5)/float(width), (rand()-0.5)/float(height));
    vec4 dir = cameraRotate * vec4(pix.xy+AA, -1.5, 0.0);
    ray.direction = normalize(dir.xyz);

    // primary hit
    HitResult firstHit = hitBVH(ray);
    vec3 color;
    
    if(!firstHit.isHit) {
        color = vec3(0);
        color = sampleHdr(ray.direction);
    } else {
        vec3 Le = firstHit.material.emissive;
        vec3 Li = pathTracing(firstHit, 2);
        color = Le + Li;
    }  
    
    // Blend with previous frame
    vec3 lastColor = texture2D(lastFrame, pix.xy*0.5+0.5).rgb;
    color = mix(lastColor, color, 1.0/float(frameCounter+1));

    // output
    gl_FragData[0] = vec4(color, 1.0);
}

10.3. Slice shader (pass2, pass3, gamma correction and tone mapping)

pass2:

#version 330 core

in vec3 pix;
out vec4 fragColor;

uniform sampler2D texPass0;
uniform sampler2D texPass1;
uniform sampler2D texPass2;
uniform sampler2D texPass3;
uniform sampler2D texPass4;
uniform sampler2D texPass5;
uniform sampler2D texPass6;

void main() {
    gl_FragData[0] = vec4(texture2D(texPass0, pix.xy*0.5+0.5).rgb, 1.0);
}

pass3:

#version 330 core

in vec3 pix;
out vec4 fragColor;

uniform sampler2D texPass0;
uniform sampler2D texPass1;
uniform sampler2D texPass2;
uniform sampler2D texPass3;
uniform sampler2D texPass4;
uniform sampler2D texPass5;
uniform sampler2D texPass6;

vec3 toneMapping(in vec3 c, float limit) {
    float luminance = 0.3*c.x + 0.6*c.y + 0.1*c.z;
    return c * 1.0 / (1.0 + luminance / limit);
}

void main() {
    vec3 color = texture2D(texPass0, pix.xy*0.5+0.5).rgb;
    color = toneMapping(color, 1.5);
    color = pow(color, vec3(1.0 / 2.2));

    fragColor = vec4(color, 1.0);
}

10.4. Vertex shader (shared)

#version 330 core

layout (location = 0) in vec3 vPosition;  // Vertex coordinates passed in by cpu

out vec3 pix;

void main() {
    gl_Position = vec4(vPosition, 1.0);
    pix = vPosition;
}

Summary & postscript

The characteristic of pixel by pixel parallel computing makes GPU very friendly to light tracing. Today, we have realized simple ray tracing on GPU:

Only two bounces are used here, and the 512 x 512 resolution is enough to drain the GPU resources of my broken computer. However, for relatively simple scenes, there can be 30 ~ 40 frames for partially missed primary ray or relatively uniform BVH. The picture above is a little more than 40 frames

In short... It took me a lot of effort to paint each pixel with its correct color. At first, I thought it was a simple migration code. Later, I found that there were too many pits. From texture format and data transfer to rendering equation, I stepped on almost all pits

Debugging on the GPU is less intuitive than that on the CPU. I have scolded the fragmented images on the screen more than once, and scratched my head late into the night because of the data transmission between textures and different pass es. However, when I finally succeeded in lighting up the classic cornell and watched the scene light up gradually, these difficulties had become a thing of the past

Despite the small sense of achievement, it is true that our algorithm has many problems, such as only supporting diffuse reflection, more noise and slow fitting. For these problems, my subsequent blog will solve them, but the space is limited, so it's inconvenient to start here

forget it... Here's the spoilers:

More reflections: Disney BRDF & BSDF
Slow fitting & more noise: importance s amp ling

Update when you have time... No, just send it

Keywords: Computer Graphics OpenGL

Added by kovudalion on Tue, 28 Dec 2021 07:40:47 +0200

Programming VIP