Using point cloud data to generate aerial view

Relevant axis of aerial view

In order to create an aerial view image, the relevant axes in the point cloud data will be the x and y axes.

However, as can be seen from the above figure, we must be careful and consider the following:

  • The x and y axes mean the opposite.
  • The x and y axes point in opposite directions.
  • You must move these values so that (0,0) is the smallest possible value in the image.

Restrict rectangle viewing

It is often useful to focus only on specific areas of the point cloud. Therefore, we want to create a filter that retains only the points in the area of interest.

Since we are looking at the data at the top and we are interested in converting it to an image, I will use a more consistent direction with the image axis. Next, I specify the range of values relative to the origin I want to focus on. Anything to the left of the origin is treated as negative, and anything to the right is treated as positive. The x-axis of the point cloud will be interpreted as the forward direction (this will be the upward direction of our aerial view image).

The following code sets the rectangle of interest to span 10m on both sides of the origin and 20m in front of it.

side_range = ( -10 , 10 )      # From leftmost to rightmost
fwd_range = ( 0 , 20 )        # From last to front

Next, we create a filter that retains only the points that actually lie within the rectangle we specify.

# Extract points for each axis
x_points = points[:, 0]
y_points = points[:, 1]
z_points = points[:, 2]

# Filter - returns only the index of the points in the desired cube
# Three filters: front and rear, left and right, and height range
# Note that the left side is the positive y-axis in the lidar coordinates
f_filt = np.logical_and((x_points > fwd_range[0]), (x_points < fwd_range[1]))
s_filt = np.logical_and((y_points > -side_range[1]), (y_points < -side_range[0]))
filter = np.logical_and(f_filt, s_filt)
indices = np.argwhere(filter).flatten()

# Holder 
x_points = x_points[indices]
y_points = y_points[indices]
z_points = z_points[indices]

Map point positions to pixel positions

At present, we have a bunch of points with real values. To map those values, map these values to integer position values. We can naively convert all x and y value types to integers, but we may eventually lose a lot of resolution. For example, if these points are measured in meters, each pixel will represent a 1x1 meter rectangle in the point cloud, and we will lose any details smaller than this. If you have a point cloud like a mountain view, this may not be a problem. But if you want to capture finer details and identify people, cars, or even smaller things, this method won't work.

However, the above method can be modified slightly so that we can obtain the required resolution level. We can scale the data before converting the type to an integer. For example, if the unit of measurement is meters and we want a resolution of 5cm, we can do the following:

res = 0.05
# CONVERT TO PIXEL POSITION VALUES - Based on resolution
x_img = (-y_points / res).astype(np.int32)  # x axis is -y in LIDAR
y_img = (-x_points / res).astype(np.int32)  # y axis is -x in LIDAR

You'll notice that the x and y axes are swapped and the direction is reversed, so we can now start working on the image coordinates.

Transfer to new origin

The x and y data are not fully ready to map to the image. We may still have negative x and y values. So we need to move the data to make (0,0) the minimum.

# SHIFT PIXELS TO HAVE MINIMUM BE (0,0)
# floor and ceil used to prevent anything being rounded to below 0 after shift
x_img -= int(np.floor(side_range[0] / res))
y_img += int(np.ceil(fwd_range[1] / res))

We can explore the data to prove to ourselves that these values are now positive, for example:

>>> x_img.min()
7
>>> x_img.max()
199
>>> y_img.min()
1
>>> y_img.max()
199

Pixel value

So we use point data to specify the x and y positions in the image. What we need to do now is specify the values we want to fill in these pixel positions. One possibility is to populate it with height data.
But two things to remember are:

  • Pixel value should be an integer.
  • Pixel value should be between 0-255.

We can get the minimum and maximum height values from the data and readjust the range to fit the range of 0-255. Another method to be used here is to set the range of height values we want to focus on. Any value higher or lower than this range will be clipped to the minimum and maximum values. This is useful because it allows us to get the maximum amount of detail from the area of interest.

In the following code, we set the range to 2 meters below the origin and half a meter above the origin.

height_range = (-2, 0.5)  # bottom-most to upper-most

# CLIP HEIGHT VALUES - to between min and max heights
pixel_values = np.clip(a = z_points,
                           a_min=height_range[0],
                           a_max=height_range[1])

Next, we rescale these values to 0-255 and convert the type to an integer.

def scale_to_255(a, min, max, dtype=np.uint8):
    """ Scales an array of values from specified min, max range to 0-255
        Optionally specify the data type of the output (default is uint8)
    """
    return (((a - min) / float(max - min)) * 255).astype(dtype)

# RESCALE THE HEIGHT VALUES - to be between the range 0-255
pixel_values  = scale_to_255(pixel_values, min=height_range[0], max=height_range[1])

Create image array

Now that we are ready to actually create the image, we just need to initialize an array whose dimension depends on the range of values we need in the rectangle and the resolution we choose. We then use the x and y point values converted to pixel positions to specify the indexes in the array and assign the values we selected as pixel values in the previous section to these indexes.

# INITIALIZE EMPTY ARRAY - of the dimensions we want
x_max = 1+int((side_range[1] - side_range[0])/res)
y_max = 1+int((fwd_range[1] - fwd_range[0])/res)
im = np.zeros([y_max, x_max], dtype=np.uint8)

# FILL PIXEL VALUES IN IMAGE ARRAY
im[y_img, x_img] = pixel_values

see

Currently, images are stored as a numpy array. If we want to visualize it, we can convert it to a PIL image and view it.

# CONVERT FROM NUMPY ARRAY TO A PIL IMAGE
from PIL import Image
im2 = Image.fromarray(im)
im2.show()

It actually encodes the same amount of information as the image drawn by PIL, so the machine learning algorithm can still distinguish height differences, even if we humans can't see these differences very clearly.

Complete code

For convenience, I put all the above code in a function that returns the aerial view as a numpy array. You can then choose to visualize it using any method you like, or insert a numpy array into a machine learning algorithm.

import numpy as np


# ==============================================================================
#                                                                   SCALE_TO_255
# ==============================================================================
def scale_to_255(a, min, max, dtype=np.uint8):
    """ Scales an array of values from specified min, max range to 0-255
        Optionally specify the data type of the output (default is uint8)
    """
    return (((a - min) / float(max - min)) * 255).astype(dtype)


# ==============================================================================
#                                                         POINT_CLOUD_2_BIRDSEYE
# ==============================================================================
def point_cloud_2_birdseye(points,
                           res=0.1,
                           side_range=(-10., 10.),  # left-most to right-most
                           fwd_range = (-10., 10.), # back-most to forward-most
                           height_range=(-2., 2.),  # bottom-most to upper-most
                           ):
    """ Creates an 2D birds eye view representation of the point cloud data.

    Args:
        points:     (numpy array)
                    N rows of points data
                    Each point should be specified by at least 3 elements x,y,z
        res:        (float)
                    Desired resolution in metres to use. Each output pixel will
                    represent an square region res x res in size.
        side_range: (tuple of two floats)
                    (-left, right) in metres
                    left and right limits of rectangle to look at.
        fwd_range:  (tuple of two floats)
                    (-behind, front) in metres
                    back and front limits of rectangle to look at.
        height_range: (tuple of two floats)
                    (min, max) heights (in metres) relative to the origin.
                    All height values will be clipped to this min and max value,
                    such that anything below min will be truncated to min, and
                    the same for values above max.
    Returns:
        2D numpy array representing an image of the birds eye view.
    """
    # EXTRACT THE POINTS FOR EACH AXIS
    x_points = points[:, 0]
    y_points = points[:, 1]
    z_points = points[:, 2]

    # FILTER - To return only indices of points within desired cube
    # Three filters for: Front-to-back, side-to-side, and height ranges
    # Note left side is positive y axis in LIDAR coordinates
    f_filt = np.logical_and((x_points > fwd_range[0]), (x_points < fwd_range[1]))
    s_filt = np.logical_and((y_points > -side_range[1]), (y_points < -side_range[0]))
    filter = np.logical_and(f_filt, s_filt)
    indices = np.argwhere(filter).flatten()

    # KEEPERS
    x_points = x_points[indices]
    y_points = y_points[indices]
    z_points = z_points[indices]

    # CONVERT TO PIXEL POSITION VALUES - Based on resolution
    x_img = (-y_points / res).astype(np.int32)  # x axis is -y in LIDAR
    y_img = (-x_points / res).astype(np.int32)  # y axis is -x in LIDAR

    # SHIFT PIXELS TO HAVE MINIMUM BE (0,0)
    # floor & ceil used to prevent anything being rounded to below 0 after shift
    x_img -= int(np.floor(side_range[0] / res))
    y_img += int(np.ceil(fwd_range[1] / res))

    # CLIP HEIGHT VALUES - to between min and max heights
    pixel_values = np.clip(a=z_points,
                           a_min=height_range[0],
                           a_max=height_range[1])

    # RESCALE THE HEIGHT VALUES - to be between the range 0-255
    pixel_values = scale_to_255(pixel_values,
                                min=height_range[0],
                                max=height_range[1])

    # INITIALIZE EMPTY ARRAY - of the dimensions we want
    x_max = 1 + int((side_range[1] - side_range[0]) / res)
    y_max = 1 + int((fwd_range[1] - fwd_range[0]) / res)
    im = np.zeros([y_max, x_max], dtype=np.uint8)

    # FILL PIXEL VALUES IN IMAGE ARRAY
    im[y_img, x_img] = pixel_values

    return im

reference resources

http://ronny.rest/tutorials/module/pointclouds_01/point_cloud_birdseye/

Keywords: Autonomous vehicles

Added by gillms1 on Sat, 01 Jan 2022 00:37:34 +0200