Computer Vision: Advanced Lane Marking Through Thresholding
Lane Detection using Gradient, Colour Thresholding and sliding window algorithm
In my earlier post, I talked about finding lane lines using Edge Detection and Hough Transforms. While Canny edge detection is great in finding the edges, it gives you a lot of edges in the picture, all of which are not relevant for the lane finding.
In this post, I would describe how to create a pipeline to find lane markings from a video using better algorithms than the last post. The pipeline would mark the lane, project the marked lane onto the video, tell the curvature of the road and also the position of the vehicle within that lane. I would use some of the concepts that I described before like camera calibration, perspective transform as well as a few new ones like thresholding, sliding window etc.
To begin with, the images were corrected for camera distortion using the algorithm described in my previous post. The next task was to identify the pixels in picture that belongs to the lane markings for which I used Gradient and Colour Thresholding.
Gradient Threshold
In the Canny Edge Detection, we took the overall gradient which helped us in detecting the regions which had sharp change in intensity or colour. For this ,canny edge detection uses Sobel operator which is an approximation to taking a derivative of image in a direction. The operator consists of a pair of convolution kernels.
The magnitude of overall gradient is given by the formula:
While the direction of the gradient is
Instead of taking overall gradient let’s try to separate out magnitude and Direction of Gradient. This can provide greater advantages in some cases. Lane lines, if the lanes are not too curved, in an image would be close to vertical. So a x direction gradient would make more sense than a y direction. Taking individual x and y gradients or taking the magnitude of the gradient or just taking the direction of the gradient can all have their advantages. We can apply different thresholding on each to arrive at a desired outcome.
Sobel X, Y threshold
OpenCV has a sobel function to take the gradient in x,y direction which can be used to also create magnitude and direction only thresholds using the formula above. It is not exactly necessary to convert your figure to grayscale but it provides better visuals. Thresholding is just a way to create a binary image where every pixel that meets the condition is changed to 1 and other pixels are set to 0.
import numpy as np
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pickle
# Read in an image and grayscale it
image = mpimg.imread('straight_lines1.jpg')
# Define a function that applies Sobel x or y,
# then takes an absolute value and applies a threshold.
# Note: calling your function with orient='x', thresh_min=5, thresh_max=100
def abs_sobel_thresh(img, orient='x', thresh_min=0, thresh_max=255):
# Apply the following steps to img
# 1) Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# 2) Take the derivative in x or y given orient = 'x' or 'y'
sobel = cv2.Sobel(gray, cv2.CV_64F, orient=='x', orient=='y')
# 3) Take the absolute value of the derivative or gradient
abs_sobel = np.absolute(sobel)
# 4) Scale to 8-bit (0 - 255) then convert to type = np.uint8
scaled_sobel = np.uint8(255*abs_sobel/np.max(abs_sobel))
# 5) Create a mask of 1's where the scaled gradient magnitude
# is > thresh_min and < thresh_max
sxbinary = np.zeros_like(scaled_sobel)
sxbinary[(scaled_sobel >= thresh_min) & (scaled_sobel <= thresh_max)] = 1
# 6) Return this mask as your binary_output image
binary_output = sxbinary # Remove this line
return binary_output
# Run the function
grad_binary_x = abs_sobel_thresh(image, orient='x', thresh_min=20, thresh_max=100)
grad_binary_y = abs_sobel_thresh(image, orient='y', thresh_min=20, thresh_max=100)
# Plot the result
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 9))
f.tight_layout()
ax1.imshow(image)
ax1.set_title('Original Image', fontsize=30)
ax2.imshow(grad_binary_x, cmap='gray')
ax2.set_title('Thresholded Gradient in X', fontsize=30)
ax3.imshow(grad_binary_y, cmap='gray')
ax3.set_title('Thresholded Gradient in Y', fontsize=30)
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
The output of the code above shows the differences between different thresholding. Notice how X gradient thresholding seems a bit better to suit our needs here.
Similarly using the Magnitude of the overall gradient as the threshold can combine some of the individual X, Y gradient features.
Similarly we can apply threshold on direction of the gradient. As you can see the lane lines are somewhere in 45 to 60 degree range in these figures. Appropriate tan values could be used cover that angle range.
Colour Spaces
Colour spaces are very useful tool to analyse images. There are various colour space models that can be used to define the colours in an image. The simplest RGB (Red Green Blue) model defines colours in terms of their red, green, and blue components. Each component can take a value between 0 and 255, where [0,0,0] represents black and [255,255,255] represents white. RGB is considered an “additive” color space and colors can be imagined as different combinations of red, green and blue. OpenCV has multiple functions to utilise different colourspaces. One more thing to note though is that OpenCV by default reads an image in BGR which can be converted to RGB.
Notice how in blue channel, yellow lane lines are not visible while they are brightest in the Red channel. So here Red channel can be the most useful one to find lane lines. Please note that I have used a greyscale map to show different colour channels. Apart from RGB, there are multiple other colour space models like CMYK, HLS, HSV, LAB etc. HSV and HLS stands for hue, saturation, and brightness/luminance, which are particularly useful for identifying contrast in images.
Hue is the different colours, Saturation is how intense the colour is and value is the brightness value. You can try out different colourspace and colour channels to see what works for your application. Once you know the correct colourspace and colour channel, you can apply thresholding. For my purpose I found the S channel in HLS colourspace to be the best suited.
I applied colour thresholding on that using the following code:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
# Read in an image, you can also try test1.jpg or test4.jpg
image = mpimg.imread('straight_lines1.jpg')
# Define a function that thresholds the S-channel of HLS
# Use exclusive lower bound (>) and inclusive upper (<=)
def hls_select(img, thresh=(0, 255)):
# 1) Convert to HLS color space
hls = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
# 2) Apply a threshold to the S channel
binary_output = np.zeros_like(hls[:,:,2])
binary_output[(hls[:,:,2] > thresh[0]) & (hls[:,:,2] <= thresh[1])] = 1
# 3) Return a binary image of threshold result
#binary_output = np.copy(img) # placeholder line
return binary_output
hls_binary = hls_select(image, thresh=(180, 255))
# Plot the result
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(24, 9))
f.tight_layout()
ax1.imshow(image)
ax1.set_title('Original Image', fontsize=50)
ax2.imshow(hls_binary, cmap='gray')
ax2.set_title('Thresholded S', fontsize=50)
plt.subplots_adjust(left=0., right=1, top=0.9, bottom=0.)
It is not always easy to arrive at the correct thresholding values. One way to do it might be to use 3D scatterplot. We can plot the individual channels for the picture and then approximate the values we might be interested in.
Once you know what gradient, colourspace and channel to use, you can combine the various thresholds. For this particular project, I used X direction gradient along with S- Channel in HLS colourspace to apply the thresholds.
Perspective Transform(as described in the previous post) was applied to the resultant binary image to get the birds view. In 2D images, objects appear smaller the farther away they are from a viewpoint. So it is better to perform a perspective transform on the undistorted thresholded image to have a birdeye view of how the lane lines are so that later, the curve fitting through them can be done accurately
The colours look different in the picture as the matplotlib and opencv reads in images differently( RGB vs BGR). The next step was to fit the curves along the lane lines.
Line Finding Method: Peaks in a Histogram
After applying calibration, thresholding, and a perspective transform to a road image, you should have a binary image where the lane lines stand out clearly. However, you still need to decide explicitly which pixels are part of the lines and which belong to the left line and which belong to the right line. Plotting a histogram of where the binary activations occur across the image is one potential solution for this. Taking a histogram along all the columns in the lower half of the image like this:
The two most prominent peaks in this histogram will be good indicators of the x-position of the base of the lane lines. We can use that as a starting point for where to search for the lines. From that point, we can use a sliding window, placed around the line centers, to find and follow the lines up to the top of the frame.
Sliding window Algorithm
The following algorithm was followed:
1- All the non-zero pixels are identified in the image
2-Next a sliding window is defined at the x positions of lane and all the non-zero pixels appearing inside the window are identified.
3- The sliding window is moved in Y direction to find more non zero pixels and offsetted in X to their mean in case we find more than a set number.
4- Once we have all the good pixel candidates for lanes in the entire image, a second degree polynomial is fitted through them f(y)=Ay2+By+C
5- The steps are repeated for left and right lane line separately.
Once you know where the lines are, you have a fit! In the next frame of video, you don’t need to do a blind search again, but instead you can just search in a margin around the previous line position.
Measuring Curvature
Once the polynomial is fitted through the lane lines, its radius of curvature is calculated using Curvdist() function. We can draw a circle that closely fits nearby points on a local section of a curve.
The formula for the radius of curvature at any point x for the curve y = f(x) is given by
In order to cover the pixel values into the road units the following conversion is used
ym_per_pix = 30/720
xm_per_pix = 3.7/700
where they are in meters per pixel units
In order to calculate the distance from the centre, assumption is made that the camera is mounted on the centre of the car. The average of the left and right lane is taken at the bottom of the image and then subtracted from the centre of the image. The distance is then mutiplied by the xm_per_pix to convert it into metres.
Once the lane lines are identified, the full lane is warped back onto the original image using the inverse of the matrix calculated in the perspective transform step.
Finally the steps are repeated for every frame to identify the lane lines in the video:
It marks the lane and the text on the upper left corner tells you the lane curvature and position of the vehicle in that lane.
This pipeline worked well for the given video. However, it struggled in the cases where lane curvature is more. To solve this, it might be a good idea to store all the coefficients of the fits as a history from frame to frame and look for any significant departures. It might also be useful to update the sliding window to take into account large curvatures.
More details and actual code for this project can be found at my github repo
Thanks to Udacity for guiding me through this project.
Written while listening to Fleet Foxes