I See You: Computer Vision Fundamentals

I See You: Computer Vision Fundamentals

My introduction to Computer Vision happened in 2017 when I was doing Self-driving Car Nanodegree from Udacity. The first semester was mainly related to Computer Vision and Deep Learning which sparked my interest in the subject. This post would cover some basic introduction of Computer Vision as well as Camera Calibration and affine transformations.

The goal of computer vision is to aid machines to see and understand the content of digital images. It deals with perceiving and understanding the world around you through images. Each digital image is made up of different pixels which are the smallest building blocks for an image. Mathematically, it’s these pixels that contain different values for different features -colors. A simplified example would be an image in an RGB color scheme with every pixel containing values of Red, Green, Blue. In this case, the image can be seen as a matrix whose values can be utilized by different algorithms. A video stream is just a collection of different 2D images played over time.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

Different algorithms can be used to extract information from images and videos. These algorithms might look at different features in the image and apply different techniques:

  • Colour Detection- Different colors are coded differently mathematically.
  • Edge detection: edge detection helps the computer to distinguish between different object shapes, sizes, etc.
  • Masking/unmasking: only using the specified area of interest. For example, if you are looking for lane lines from dashcam, you might only want to look lower half of the image
  • Shape and feature extraction: Using colors and shapes to identify objects
  • Machine/deep learning: It can also use different features to learn itself about different objects etc.

How to apply these techniques/algorithms? There are different libraries but OpenCV is one of the most versatile and widely used. Its open-source, originally developed by Intel and support various programming platforms like Python, C++, etc. It is well documented and as a result of a large user base, online help is readily available.

These algorithms and techniques then can used for various Computer Vision tasks such as:

  • Object Classification: What broad category of object is in this image?
  • Object Identification: Which type of a given object is in this image?
  • Object Verification: Is the object in the image?
  • Object Detection: Where are the objects in the image?
  • Object Landmark Detection: What are the key points for the object in the image?
  • Object Segmentation: What pixels belong to the object in the image?
  • Object Recognition: What objects are in this image and where are they?

But before these algorithms can be applied, some image processing is required. Image processing is an integral part of computer vision. The images are preprocessed :

  • to preprocess the image for the algorithm
  • to clean up the image or a dataset for algorithm to use
  • to generate new images for the machine/deep learning also to use
  • To better understand the scene, by using say perspective transform

Camera Model

The image itself coming out of any camera must first be undistorted as a first step. Image distortion occurs when a camera looks at 3D objects in the real world and transforms them into a 2D image; this transformation isn’t perfect. Distortion actually changes what the shape and size of these 3D objects appear to be. So, the first step in analysing camera images, is to undo this distortion so that you can get correct and useful information out of them.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

In a pin hole camera, 2D image is formed when the light from 3D objects in the real world is focused through the lens on the screen. The image formed is reversed as shown in the figure. The image then needs to be converted using the Camera Matrix.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

However most of the cameras don’t just use a pinhole. They use lenses which causes distortion.

Types of Distortion

Radial Distortion

Real cameras use curved lenses to form an image, and light rays often bend a little too much or too little at the edges of these lenses. This creates an effect that distorts the edges of images, so that lines or objects appear more or less curved than they actually are. This is called radial distortion, and it’s the most common type of distortion.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

Another type of distortion, is tangential distortion. This occurs when a camera’s lens is not aligned perfectly parallel to the imaging plane, where the camera film or sensor is. This makes an image look tilted so that some objects appear farther away or closer than they actually are.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

Distortion Coefficients and Correction

The first step for distortion correction is finding the Distortion coefficients. There are three coefficients needed to correct for radial distortion: k1, k2, and k3 and two for radial distortion: p1 and p2.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

To correct the appearance of radially distorted points in an image, one can use a correction formula:

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

where (x,y) is a point in a distorted image. k1, k2, and k3 — Radial distortion coefficients of the lens. r2: x2 + y2.

To undistort these points, OpenCV calculates r, which is the known distance between a point in an undistorted (corrected) image and the center of the image distortion, which is often the center of that image is sometimes referred to as the distortion center.

Similarly the tangential distortion correction can be applied as :

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

The corrected coordinates x and y are then converted to normalised image coordinates. Normalised image coordinates are calculated from pixel coordinates by translating to the optical center and dividing by the focal length in pixels.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

where fx, fy are camera focal lengths and cx, cy are optical centers.

The distortion coefficient k3 is required to accurately reflect major radial distortion (like in wide angle lenses). However, for minor radial distortion, which most regular camera lenses have, k3 has a value close to or equal to zero and is negligible. So, in OpenCV, you can choose to ignore this coefficient; this is why it appears at the end of the distortion values array: [k1, k2, p1, p2, k3].

Methodology

For distortion correction, the most common way is to use the check board images. The process involves mapping distorted points to undistorted points in order to check for the amount of distortion. The chessboard is a great place to start as it has multiple checkpoints(corners) that can be used to identify distortion at various locations in the image. Its better to do this for multiple images in order to get the full gauge. The general recommendation is to use >20 images.

1 – Get a chessboard and click pictures (>20) from different angles to have a starting. set. The basic idea is to find the corners in the distorted chess board images and map them to undistorted corners in the real world.

2- Start by preparing the obj points which are undistorted coordinates of the chessboard corners in the world. Assuming the chessboard is fixed on the (x, y) plane at z=0, such that the object points are the same for each calibration image. Thus, objp is just a replicated array of coordinates, and objpoints will be appended with a copy of it every time.

3- All chessboard corners in a test image are detected using the OpenCV functions findChessboardCorners() and drawChessboardCorners()

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

4- Imgpoints which are the corners of distorted image in 2D world are appended with the (x, y) pixel position of each of the corners in the image plane with each successful chessboard detection.

5- Output objpoints and imgpoints are used to compute the camera calibration and distortion coefficients using the cv2.calibrateCamera() function.

6- This distortion correction is applied to the test image using the cv2.undistort() function and obtained this result:

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

After that, these distortion coefficients are written to wide_dist_pickle.p to be used later for distortion correction of camera

Functions for distortion correction

Finding chessboard corners (for an 8×6 board):

ret, corners = cv2.findChessboardCorners(gray, (8,6), None)

Drawing detected corners on an image:

img = cv2.drawChessboardCorners(img, (8,6), corners, ret)

Camera calibration, given object points, image points, and the shape of the grayscale image:

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

Undistorting a test image:

dst = cv2.undistort(img, mtx, dist, None, mtx)

The shape of the image, which is passed into the calibrateCamera function, is just the height and width of the image. One way to retrieve these values is by retrieving them from the grayscale image shape array gray.shape[::-1].

Another way to retrieve the image shape, is to get them directly from the color image by retrieving the first two values in the color image shape array using img.shape[1::-1]. Greyscale images on the other hand only have 2 dimensions (color images have three, height, width, and depth).

Full Code

import numpy as np
import cv2
import glob
import matplotlib.pyplot as plt
import pickle
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*9,3), np.float32)
objp[:,:2] = np.mgrid[0:9, 0:6].T.reshape(-1,2)
#print (objp )
## Arrays to store object points and image points from all the images.
objpoints = [] # 3d points in real world space
imgpoints = [] # 2d points in image plane.
# Make a list of calibration images
images = glob.glob('/Users/architrastogi/Documents/camera_cal/calibration*.jpg')
# Step through the list and search for chessboard corners
for idx, fname in enumerate(images):
img = cv2.imread (fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Find the chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (9,6), None)
# If found, add object points, image points
if ret == True:
objpoints.append(objp)
imgpoints.append(corners)
cv2.startWindowThread()
# Draw and display the corners
cv2.drawChessboardCorners(img, (9,6), corners, ret)
write_name = 'corners_found'+str(idx)+'.jpg'
print(write_name)
# Test undistortion on an image
img = cv2.imread('/Users/architrastogi/Documents/camera_cal/calibration2.jpg')
img_size = (img.shape[1], img.shape[0])
# Do camera calibration given object points and image points
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, img_size,None,None)
# Applying the undistort to an image based on the calibration 
dst = cv2.undistort(img, mtx, dist, None, mtx)
# writing out the image 
cv2.imwrite('/Users/architrastogi/Documents/output_images/test_undist.jpg',dst)
# Save the camera calibration result for later use (we won't worry about rvecs / tvecs)
dist_pickle = {}
dist_pickle["mtx"] = mtx
dist_pickle["dist"] = dist
pickle.dump( dist_pickle, open( "/Users/architrastogi/Documents/camera_cal/wide_dist_pickle.p", "wb" ) )
#dst = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB)
# Visualize undistortion
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,10))
ax1.imshow(img)
ax1.set_title('Original Image', fontsize=30)
ax2.imshow(dst)
ax2.set_title('Undistorted Image', fontsize=30)

Affine Transformations

Once the distortion correction is done, the images can be used for Computer vision work. Geometric transformations can be applied to them for various purposes. The most common are affine transformations. Affine transformation are those that can be expressed in the form of a matrix multiplication (linear transformation) followed by a vector addition (translation).The reasons you might want to apply transformations include:

  • To enhance the dataset– Sometimes Machine/Deep learning algorithms need a bigger dataset than is available. In those cases one can augment the dataset by applying these transformations to the images.
  • To extract some particular information: You might only be interested in rotated figures etc. or need to have bird’s eye view for your algorithm.

Different Affine Transformation and Implementation

OpenCV provides two transformation functions, cv2.warpAffine and cv2.warpPerspective

Scaling

Scaling is a linear transformation that enlarges or shrinks objects by a scale factor that is the same in all directions. Scaling is just resizing of the image. OpenCV comes with a function cv2.resize() for this purpose.

Translation

A translation is a function that moves every point with a constant distance in a specified direction. Mathematically, the transformation matrix M can be represented as

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

Where tx and ty are the translation in x and y.

If the original picture is like

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/
Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

A sample code to achieve this could look like :

img = cv2.imread('/Users/architrastogi/Documents/blog/michigan.jpeg',0)
rows,cols = img.shape
M = np.float32([[1,0,100],[0,1,50]])
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imwrite('/Users/architrastogi/Documents/blog/michigan_trans.jpeg', dst)
cv2.imshow('img',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()

Rotation

Rotation is a circular transformation around a point or an axis. We can specify the angle of rotation to rotate our image around a point or an axis.

Rotation transformation matrix can be defined as

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

where theta is the angle of rotation

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

The sample code to achieve this :

img = cv2.imread('/Users/architrastogi/Documents/blog/michigan.jpeg',0)
rows,cols = img.shape
M = cv2.getRotationMatrix2D((cols/2,rows/2),90,1)
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imwrite('/Users/architrastogi/Documents/blog/michigan_rot.jpeg', dst)
cv2.imshow('img',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()

Perspective transform

A perspective transform maps the points in a given image to different, desired, image points with a new perspective. One of the most common use of perspective transform is to convert to bird’s eye view.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

Aside from creating a bird’s eye view representation of an image, a perspective transform can also be used for all kinds of different view points.

Image Courtesy: https://theauthorhelper.com/rl/readerlinks-help-page/understanding-pixels/

The difference between camera calibration and perspective transform is that in perspective transform you are mapping image points to different image points while in calibration, you map object points to image points. OpenCV provides tailored functions for perspective transform.

Compute the perspective transform, M, given source and destination points:

M = cv2.getPerspectiveTransform(src, dst)

Compute the inverse perspective transform:

Minv = cv2.getPerspectiveTransform(dst, src)

Warp an image using the perspective transform, M:

warped = cv2.warpPerspective(img, M, img_size, flags=cv2.INTER_LINEAR)

You can either detect the source points manually or using specific programs.

Thats all for now.

Written while listening to Father John Misty