$30
MP2: Optical Flow
In this lab, you'll calculate optical flow from a low-res video, then use the optical flow field to interpolate high-res images to make a high-res video.
In order to make sure everything works, you might want to go to the command line, and run
pip install -r requirements.txt
This will install the modules that are used on the autograder, including numpy, h5py, and the gradescope utilities.
Part 1: Loading the video and image files
First, let's load the low-res video. This video was posted by NairobiPapel(Kamaa) at https://commons.wikimedia.org/wiki/File:Cat_Play.webm under a CreativeCommons Attribution-ShareAlike license.
from IPython.display import Video
Video("cat.webm")
The high-res images are provided once per 30 frames (once per second). There are four of them, corresponding to frames $30s$ for $s\in\{0,\ldots,3\}$. Let's load them all as grayscale (add the three colors).
import matplotlib.image
import numpy as np
highres = np.zeros((91,270,480),dtype=float)
for s in range(4):
highres[30*s,:,:] = np.sum(matplotlib.image.imread('highres/cat%4.4d.jpg'%(30*s)).astype(float), axis=2)
print(highres.dtype)
print(highres.shape)
float64
(91, 270, 480)
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 5))
plt.imshow(highres[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb6633f8ca0>
You would need ffmpeg in order to extract frames from the video. You should probably install ffmpeg. But just in case you haven't, the frames are provided in the lowres directory.
import numpy as np
lowres = np.zeros((91,135,240),dtype=float)
for t in range(91):
lowres[t,:,:] = np.sum(matplotlib.image.imread('lowres/cat%4.4d.jpg'%(t)).astype(float), axis=2)
print(lowres.dtype)
print(lowres.shape)
float64
(91, 135, 240)
plt.figure(figsize=(14, 5))
plt.imshow(lowres[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb662dfb850>
Part 2: Further Smooth the Low-Res Image
First, load submitted.py.
import submitted
import importlib
importlib.reload(submitted)
print(submitted.__doc__)
This is the module you'll submit to the autograder.
There are several function definitions, here, that raise RuntimeErrors. You should replace
each "raise RuntimeError" line with a line that performs the function specified in the
function's docstring.
First, in order to make the gradient estimation smoother, we'll smooth all of the low-res images
help(submitted.smooth_video)
Help on function smooth_video in module submitted:
smooth_video(x, sigma, L)
y = smooth_video(x, sigma, L)
Smooth the video using a sampled-Gaussian smoothing kernel.
x (TxRxC) - a video with T frames, R rows, C columns
sigma (scalar) - standard deviation of the Gaussian smoothing kernel
L (scalar) - length of the Gaussian smoothing kernel
y (TxRxC) - the same video, smoothed in the row and column directions.
The Gaussian smoothing kernel is: $$h[n] = \left\{\begin{array}{ll} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\left(\frac{n-(L-1)/2}{\sigma}\right)^2} & 0\le n\le L-1\\0 & \mbox{otherwise}\end{array}\right.$$
You should implement this as a separable filter, i.e., convolve in both the row and column directions: $$z[r,c] = h[r]\ast_r x[r,c]$$ $$y[r,c] = h[c]\ast_c z[r,c]$$ where $\ast_r$ means convolution across rows (in the $r$ direction), and $\ast_c$ means convolution across columns.
importlib.reload(submitted)
smoothed = submitted.smooth_video(lowres, sigma=1.5, L=7)
print(smoothed.shape)
(91, 135, 240)
plt.figure(figsize=(14, 5))
plt.imshow(smoothed[0,:,:], cmap='gray')
<matplotlib.image.AxesImage at 0x7fb6629f6b20>
Part 3: Calculating the Image Gradient
Now that we have the smoothed images, let's find their gradient. Use a central difference filter: $$h[n] = 0.5\delta[n-1]-0.5\delta[n+1]$$
You will need to compute three different gradients: the column gradient $g_c$, row gradient $g_r$, and frame gradient $g_t$, defined as: $$g_t[t,r,c] = h[t] \ast_t x[t,r,c]$$ $$g_r[t,r,c] = h[r] \ast_r x[t,r,c]$$ $$g_c[t,r,c] = h[c] \ast_c x[t,r,c]$$ where $x[t,r,c]$ should be the smoothed video.
importlib.reload(submitted)
help(submitted.gradients)
Help on function gradients in module submitted:
gradients(x)
gt, gr, gc = gradients(x)
Compute gradients using a first-order central finite difference.
x (TxRxC) - a video with T frames, R rows, C columns
gt (TxRxC) - gradient in the time direction
gr (TxRxC) - gradient in the vertical direction
gc (TxRxC) - gradient in the horizontal direction
In order to avoid weird numerical problems, please set
gt[0,:,:]=0, gt[-1,:,:]=0,
gr[:,0,:]=0, gr[:,-1,:]=0,
gc[:,:,0]=0, and gc[:,:,-1]=0.
All of the samples of $x[t,r,c]$ are positive, of course, but the three gradient images have equal parts positive and negative values. Matplotlib will automatically normalize those things for us, but it's useful to put a colorbar on each image, so we can see what values of the gradient are matched to each color in the image.
importlib.reload(submitted)
gt, gr, gc = submitted.gradients(smoothed)
print(gt.shape)
(91, 135, 240)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2,figsize=(14, 10))
plt1 = ax1.imshow(gt[1,:,:])
plt.colorbar(plt1, ax=ax1)
ax1.set_title('Time Gradient')
plt2 = ax2.imshow(gr[1,:,:])
plt.colorbar(plt2, ax=ax2)
ax2.set_title('Vertical Gradient')
plt3 = ax3.imshow(gc[1,:,:])
plt.colorbar(plt3, ax=ax3)
ax3.set_title('Horizontal Gradient')
ax4.imshow(lowres[1,:,:],cmap='gray')
<matplotlib.image.AxesImage at 0x7fb662bf8e80>