Relevant axis of aerial view
In order to create an aerial view image, the relevant axes in the point cloud data will be the x and y axes.
However, as can be seen from the above figure, we must be careful and consider the following:
- The x and y axes mean the opposite.
- The x and y axes point in opposite directions.
- You must move these values so that (0,0) is the smallest possible value in the image.
Restrict rectangle viewing
It is often useful to focus only on specific areas of the point cloud. Therefore, we want to create a filter that retains only the points in the area of interest.
Since we are looking at the data at the top and we are interested in converting it to an image, I will use a more consistent direction with the image axis. Next, I specify the range of values relative to the origin I want to focus on. Anything to the left of the origin is treated as negative, and anything to the right is treated as positive. The x-axis of the point cloud will be interpreted as the forward direction (this will be the upward direction of our aerial view image).
The following code sets the rectangle of interest to span 10m on both sides of the origin and 20m in front of it.
side_range = ( -10 , 10 ) # From leftmost to rightmost fwd_range = ( 0 , 20 ) # From last to front
Next, we create a filter that retains only the points that actually lie within the rectangle we specify.
# Extract points for each axis x_points = points[:, 0] y_points = points[:, 1] z_points = points[:, 2] # Filter - returns only the index of the points in the desired cube # Three filters: front and rear, left and right, and height range # Note that the left side is the positive y-axis in the lidar coordinates f_filt = np.logical_and((x_points > fwd_range[0]), (x_points < fwd_range[1])) s_filt = np.logical_and((y_points > -side_range[1]), (y_points < -side_range[0])) filter = np.logical_and(f_filt, s_filt) indices = np.argwhere(filter).flatten() # Holder x_points = x_points[indices] y_points = y_points[indices] z_points = z_points[indices]
Map point positions to pixel positions
At present, we have a bunch of points with real values. To map those values, map these values to integer position values. We can naively convert all x and y value types to integers, but we may eventually lose a lot of resolution. For example, if these points are measured in meters, each pixel will represent a 1x1 meter rectangle in the point cloud, and we will lose any details smaller than this. If you have a point cloud like a mountain view, this may not be a problem. But if you want to capture finer details and identify people, cars, or even smaller things, this method won't work.
However, the above method can be modified slightly so that we can obtain the required resolution level. We can scale the data before converting the type to an integer. For example, if the unit of measurement is meters and we want a resolution of 5cm, we can do the following:
res = 0.05 # CONVERT TO PIXEL POSITION VALUES - Based on resolution x_img = (-y_points / res).astype(np.int32) # x axis is -y in LIDAR y_img = (-x_points / res).astype(np.int32) # y axis is -x in LIDAR
You'll notice that the x and y axes are swapped and the direction is reversed, so we can now start working on the image coordinates.
Transfer to new origin
The x and y data are not fully ready to map to the image. We may still have negative x and y values. So we need to move the data to make (0,0) the minimum.
# SHIFT PIXELS TO HAVE MINIMUM BE (0,0) # floor and ceil used to prevent anything being rounded to below 0 after shift x_img -= int(np.floor(side_range[0] / res)) y_img += int(np.ceil(fwd_range[1] / res))
We can explore the data to prove to ourselves that these values are now positive, for example:
>>> x_img.min() 7 >>> x_img.max() 199 >>> y_img.min() 1 >>> y_img.max() 199
Pixel value
So we use point data to specify the x and y positions in the image. What we need to do now is specify the values we want to fill in these pixel positions. One possibility is to populate it with height data.
But two things to remember are:
- Pixel value should be an integer.
- Pixel value should be between 0-255.
We can get the minimum and maximum height values from the data and readjust the range to fit the range of 0-255. Another method to be used here is to set the range of height values we want to focus on. Any value higher or lower than this range will be clipped to the minimum and maximum values. This is useful because it allows us to get the maximum amount of detail from the area of interest.
In the following code, we set the range to 2 meters below the origin and half a meter above the origin.
height_range = (-2, 0.5) # bottom-most to upper-most # CLIP HEIGHT VALUES - to between min and max heights pixel_values = np.clip(a = z_points, a_min=height_range[0], a_max=height_range[1])
Next, we rescale these values to 0-255 and convert the type to an integer.
def scale_to_255(a, min, max, dtype=np.uint8): """ Scales an array of values from specified min, max range to 0-255 Optionally specify the data type of the output (default is uint8) """ return (((a - min) / float(max - min)) * 255).astype(dtype) # RESCALE THE HEIGHT VALUES - to be between the range 0-255 pixel_values = scale_to_255(pixel_values, min=height_range[0], max=height_range[1])
Create image array
Now that we are ready to actually create the image, we just need to initialize an array whose dimension depends on the range of values we need in the rectangle and the resolution we choose. We then use the x and y point values converted to pixel positions to specify the indexes in the array and assign the values we selected as pixel values in the previous section to these indexes.
# INITIALIZE EMPTY ARRAY - of the dimensions we want x_max = 1+int((side_range[1] - side_range[0])/res) y_max = 1+int((fwd_range[1] - fwd_range[0])/res) im = np.zeros([y_max, x_max], dtype=np.uint8) # FILL PIXEL VALUES IN IMAGE ARRAY im[y_img, x_img] = pixel_values
see
Currently, images are stored as a numpy array. If we want to visualize it, we can convert it to a PIL image and view it.
# CONVERT FROM NUMPY ARRAY TO A PIL IMAGE from PIL import Image im2 = Image.fromarray(im) im2.show()
It actually encodes the same amount of information as the image drawn by PIL, so the machine learning algorithm can still distinguish height differences, even if we humans can't see these differences very clearly.
Complete code
For convenience, I put all the above code in a function that returns the aerial view as a numpy array. You can then choose to visualize it using any method you like, or insert a numpy array into a machine learning algorithm.
import numpy as np # ============================================================================== # SCALE_TO_255 # ============================================================================== def scale_to_255(a, min, max, dtype=np.uint8): """ Scales an array of values from specified min, max range to 0-255 Optionally specify the data type of the output (default is uint8) """ return (((a - min) / float(max - min)) * 255).astype(dtype) # ============================================================================== # POINT_CLOUD_2_BIRDSEYE # ============================================================================== def point_cloud_2_birdseye(points, res=0.1, side_range=(-10., 10.), # left-most to right-most fwd_range = (-10., 10.), # back-most to forward-most height_range=(-2., 2.), # bottom-most to upper-most ): """ Creates an 2D birds eye view representation of the point cloud data. Args: points: (numpy array) N rows of points data Each point should be specified by at least 3 elements x,y,z res: (float) Desired resolution in metres to use. Each output pixel will represent an square region res x res in size. side_range: (tuple of two floats) (-left, right) in metres left and right limits of rectangle to look at. fwd_range: (tuple of two floats) (-behind, front) in metres back and front limits of rectangle to look at. height_range: (tuple of two floats) (min, max) heights (in metres) relative to the origin. All height values will be clipped to this min and max value, such that anything below min will be truncated to min, and the same for values above max. Returns: 2D numpy array representing an image of the birds eye view. """ # EXTRACT THE POINTS FOR EACH AXIS x_points = points[:, 0] y_points = points[:, 1] z_points = points[:, 2] # FILTER - To return only indices of points within desired cube # Three filters for: Front-to-back, side-to-side, and height ranges # Note left side is positive y axis in LIDAR coordinates f_filt = np.logical_and((x_points > fwd_range[0]), (x_points < fwd_range[1])) s_filt = np.logical_and((y_points > -side_range[1]), (y_points < -side_range[0])) filter = np.logical_and(f_filt, s_filt) indices = np.argwhere(filter).flatten() # KEEPERS x_points = x_points[indices] y_points = y_points[indices] z_points = z_points[indices] # CONVERT TO PIXEL POSITION VALUES - Based on resolution x_img = (-y_points / res).astype(np.int32) # x axis is -y in LIDAR y_img = (-x_points / res).astype(np.int32) # y axis is -x in LIDAR # SHIFT PIXELS TO HAVE MINIMUM BE (0,0) # floor & ceil used to prevent anything being rounded to below 0 after shift x_img -= int(np.floor(side_range[0] / res)) y_img += int(np.ceil(fwd_range[1] / res)) # CLIP HEIGHT VALUES - to between min and max heights pixel_values = np.clip(a=z_points, a_min=height_range[0], a_max=height_range[1]) # RESCALE THE HEIGHT VALUES - to be between the range 0-255 pixel_values = scale_to_255(pixel_values, min=height_range[0], max=height_range[1]) # INITIALIZE EMPTY ARRAY - of the dimensions we want x_max = 1 + int((side_range[1] - side_range[0]) / res) y_max = 1 + int((fwd_range[1] - fwd_range[0]) / res) im = np.zeros([y_max, x_max], dtype=np.uint8) # FILL PIXEL VALUES IN IMAGE ARRAY im[y_img, x_img] = pixel_values return im
reference resources
http://ronny.rest/tutorials/module/pointclouds_01/point_cloud_birdseye/