$30
Lab 5: The DFT and FFT
In the last lab, we explored the Discrete-Time Fourier Transform (DTFT) and the frequency response of LSI systems. However, we know that it is impossible to hold an entire DTFT in computer memory since the DTFT of a signal has infinitely many points! We will discuss the Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT) in this lab as the practical implementations of the DTFT and consider some of the limitations and considerations when using the DFT.
Discrete Fourier Transform
The Discrete Fourier Transform (DFT) is the discretized version of the DTFT. Sounds weird, right? Isn't the DTFT already discrete? It's in the name! But recall that the DTFT is a continuous function: it has infinitely many points to represent frequency content from −π to π. The DFT is just a resampling of the DTFT. In other words, the DFT picks a finite number of equally spaced points in the DTFT for representation.
More concretely, the DFT is computed as follows:
X[k]=
N−1
∑
n=0
x[n]e−j
2πk
N
n,0≤k≤N−1.
And the inverse DFT is given by:
x[n]=
1
N
N−1
∑
k=0
X[k]ej
2πk
N
n,0≤n≤N−1
The DFT has N "frequency bins" differentiated by our frequency index k. These are the samples we take from the DTFT. Perhaps a more intuitive formulation with respect to the DTFT makes these bins more obvious:
ωk=
2πk
N
,
X[k]=
N−1
∑
n=0
x[n]e−jωkn,0≤k≤N−1.
This representation explicitly shows that we try to capture N equally spaced frequencies between 0 to 2π (same as −π to π by periodicity of DTFT) with the DFT.
The Linear Algebra Intuition
The DFT gives an important insight into how we can use linear algebra or vectorize our systems to perform common signal processing operations. Consider our second formulation of the DFT. Suppose we fix k to be one value k0. The second formulation tells us that
X[k0]=
N−1
∑
n=0
x[n]e−jωk0n.
There is helpful notation that can be introduced here using something call the "twiddle factor" (great name, right?). Different conventions exist for the twiddle factor, but we will opt for the one most consistent with your ECE 310 textbook. Consider the following notation for the twiddle factor W:
W=e−j2π
WN=e−j
2π
N
W
k
N
=e−j
2πk
N
If we return to our expression for X[k0], we see that all we are doing at frequency ωk0 is summing the product of each entry in the signal and some complex exponential. This is the same thing as taking an inner product (dot product) between the signal and the complex exponential rotating at frequency wk0! Thus,
e−jωk0n=[e−jωk0∗(0),e−jωk0∗(1),…,e−jωk0∗(N−1)]=[W
k0∗0
N
,W
k0∗1
N
,W
k0∗2
N
,…,W
k0∗(N−1)
N
]
X[k0]=⟨W
−k0n
N
,x[n]⟩=
N−1
∑
n=0
x[n]e−jωk0n.
Note that when we take the inner product over complex numbers, we conjugate-transpose the first term by convention. For example, ⟨x,y⟩=x∗y, where x∗ is the transpose of x and each element is complex conjugated. Thus,
X[k0]=⟨W
−k0n
N
,x[n]⟩=
N−1
∑
i=0
W
k0i
N
x[i]
as desired.
Now, recall that if the dot product between two vectors is large, they are similar. The same is true here! For each frequency, we take the dot product between our signal and a vector represented by a complex exponential rotating at a fixed frquency. The result tells us how much that fixed frequency is found in our signal. Wow. We can take one final step to make the entire DFT a matrix-vector product:
X[k]=W
→
x
X[k]=
[
W
0
N
W
0
N
⋯ W
0
N
W
0
N
W
1
N
⋯ W
N−1
N
⋮ ⋮ ⋱ ⋮
W
0
N
W
N−1
N
⋯ W
(N−1)(N−1)
N
]
⏟
W
[
x[0]
x[1]
⋮
x[N−1]
]
⏟
→
x
=[
⟨W
−0∗n
N
,
→
x
⟩
⟨W
−1∗n
N
,
→
x
⟩
⋮
⟨W
−k∗n
N
,
→
x
⟩
⋮
⟨W
−(N−1)∗n
N
,
→
x
⟩
]
In the above W matrix, each row represents a frequency vector rotating at a certain frequency. More concretely, row k in W represents the frequency vector rotating with frequency ωk=
2π
N
k. This kind of intuition is powerful in signal processing. If this confuses or (hopefully not) scares you, do not worry! It takes time to be comfortable with combining signal processing and linear algebra. Read the above explanation a couple times, ask your TA questions, check out your textbook's coverage of this, or hang out with some friends and chat about it!
And finally the FFT!
We will not focus on the math of the FFT since you have covered it in ECE 310. For now, we should acknowledge the computational efficiency of the FFT. The previous two views of the DFT we have discussed - summation and vectorized versions - all require O(n2) multiply-add operations. Conversely, the FFT is a divide-and-conquer algorithm that can perform the same computation in O(nlogn) multiply-add operations. Keep this in mind when completing Excercise 1.
#import libraries
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio
from skimage.io import imread
from scipy import signal
from scipy.io import wavfile
from linear_convolution import linear_convolution
%matplotlib inline
Excercise 1: Implementing the DFT
There are multiple different implementations we can use to represent the DFT. In this exercise, you will compare three ways to compute the DFT.
a. First, fill in the below function dfl_dft() to create a double for-loop implementation of the DFT. Hint: this will work a lot like your DTFT function from Lab 4!
b. Next, fill in dft_matrix() and vectorized_dft() to create a vectorized implementation of the DFT.
c. Finally, let's test our double for-loop and vectorized methods against the np.fft.fft() implementation. Run the provided code that benchmarks the running time of eahc method and verifies if your methods accurately compute the DFT of a randomly generated signal. Note that we do not benchmark the time to build the W matrix since this could be precomputed in a practical context. For example, if we compute 1000 length-500 DFTs, we would only need to construct "W" once. Comment on the results. Which method is fastest? Slowest? Why? Does anything surprise you?
#Function to complete for 1.a
def dfl_dft(x):
N = len(x)
dft = np.zeros(N, dtype=np.complex64)
#for loop to iterate over 'k'
for k in range(N):
dft_sum = []
#for loop to iterate over 'n'
for n in range(N):
dft_sum.append(x[n]*np.exp(-1j*((2*np.pi*k)/N)*n))
dft[k] = sum(dft_sum)
return dft
#Functions to complete for 1.b
#Construct the matrix W
def dft_matrix(N):
W = np.zeros((N, N),dtype=np.complex64)
#fill in W however you see fit!
for i in range(N):
for j in range(N):
W[i][j] = np.exp(-1j*2*np.pi*i*j/N)
return W
#Take the DFT of signal x using the matrix W
def vectorized_dft(W, x):
#this should only take one line...
dft = W*x
#refer to the above math if you are unsure!
#dft = None
return dft
#Provided code for 1.c
#Generate test signal
N = 500
x = np.random.uniform(size=N)
#Test double for-loop implementation
print('Double For Loop Time:')
%timeit dfl_dft(x)
print('')
#Test vectorized implementation
#don't time W matrix construction since this is precomputation that could be practically stored
W = dft_matrix(N)
print('Vectorized Time')
%timeit vectorized_dft(W,x)
print('')
#Test numpy's fft implementation
print('Numpy Time')
%timeit np.fft.fft(x)
print('')
#Test if results are equivalent
print('Double for-loop and numpy fft are equivalent:',np.allclose(dfl_dft(x),np.fft.fft(x)))
print('Vectorized DFT and numpy fft are equivalent:',np.allclose(vectorized_dft(W,x),np.fft.fft(x)))
Double For Loop Time:
1.05 s ± 62.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Vectorized Time
656 µs ± 3.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numpy Time
9.75 µs ± 31.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Double for-loop and numpy fft are equivalent: True
Vectorized DFT and numpy fft are equivalent: False
Answer for Part 1.c: The fastest method is the numpy fft method, and the slowest method is the double for loop method. This makes sense since python's numpy package should run at the quickest rate or else python would not use this package. The double for loop method is the slowest because it has run time O(N^2) while the numpy fft implementation runs at O(NlogN). Our vectorized version has a run time in the middle of the two which makes sense since it is O(N) since we first take N multiplications then we take the sum.
Windowing and Zero-Padding
We will now briefly discuss the motivation for windowing and zero-padding. Exercises 2 and 3 will demonstrate and reinforce this theory.
Recall that the DFT works under the practical assumption that we only have a finite number of samples for our signal. Unlike the DTFT which uses infinitely points, we must use a finite collection to capture the frequency content of our signal. In the simplest case, we use a rectangular window that removes the N points of our signal we would like to inspect for the DFT. Mathematically, we have,
xwindowed=x[n](u[n−n0]−u[n−(n0+N)]),
where n0 is the start-point of our signal's window. This window signal is the same as a shifted rectangle function. We see that windowing is multiplying in the time domain, which by convolution theory must correspond to multiplication in the frequency domain. Thus, for a window function w[n],
x[n]w[n]
F
↔
1
2π
X(ω)∗W(ω).
Exercise 2 will explore the consequences of using the rectangular window and what other windows may be used. Next, let's explain the use of zero-padding. When zero-padding a signal, we are simply appending some number of zeros to the end of the original signal. What effect does this have? Suppose I have original signal x[n] of length N and a zero-padded signal xzp[n] of length M>N that has been padded with M−N zeros.
X[k] =
N−1
∑
n=0
x[n]e−j
2πk
N
n,0≤k≤N−1
Xzp[k] =
M−1
∑
n=0
xzp[n]e−j
2πk
M
n,0≤k≤M−1
xzp[n]=0 for all n≥N⟹Xzp[k] =
N−1
∑
n=0
xzp[n]e−j
2πk
M
n
Xzp[k] =
N−1
∑
n=0
x[n]e−j
2πk
M
n,0≤n≤M−1
We see that the zero-padded signal's DFT will use the exact same signal values, and thus have the exact same frequency content. The difference is that the spacing of our frequency sampling is tighter!
2π
M
<
2π
N
Consequently, we add no "information" to our signal (only zeros) and gain higher frequency resolution at the cost of storing some zeros. This is what implictly happens when you use np.fft.fft() or np.fft.rfft() and specify a number of points greater than the length of the signal. Exercise 3 will give a brief example of how zero-padding can affect our ability to resolve different frequency components or notes in audio.
Excercise 2: Windowing Effects
In this exercise, we want to investigate the effects of different windowing methods. When we are computing the DFT of a finite-length signal, it is assumed that there is periodic extension of this finite segment (more on this in Exercise 5 and related background). If the segment precisely captures full cycles of all the frequencies, then there will be no problem since we have no periodic interruptions and the transition between periodic copies of the signal will be seamless. However, if the segment does not capture full cycles of some frequencies, then there will be periodic interruptions or discontinuities as the signal is periodically extended. This can lead to changes in our ideal frequency content. This problem is referred to as spectral leakage. This is illustrated in the example below:
Suppose that we are sampling a single sine wave, consider the two cases we have discussed:
We sample a window of the signal that perfectly captures an integer number of signal periods.
We sample a window of the signal that captures a fractional number of signal periods.
Below, our ideal signal is sin(
π
4
n). Thus, we need eight points to capture a full period of the signal.
#Make longer signal that we will window in order to analyze a smaller portion
x = np.array([np.sin(np.pi/4*n) for n in range(200)])
# Case 1
N1 = 80 #10 full periods
x1 = x[:N1]
x1_fft = np.fft.rfft(x1)
omega_1 = np.linspace(0,np.pi,len(x1_fft))
# case 2
N2 = 78 # 7 full period and one fractional period
x2 = x[:N2]
x2_fft = np.fft.rfft(x2)
omega_2 = np.linspace(0,np.pi,len(x2_fft))
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.stem(x1)
plt.title('Case 1 Signal')
plt.subplot(122)
plt.stem(x2)
plt.title('Case 2 Signal')
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.stem(omega_1, abs(x1_fft))
plt.title('FFT for Case 1')
plt.xlabel('$\omega$')
plt.subplot(122)
plt.stem(omega_2, abs(x2_fft))
plt.title('FFT for case 2')
plt.xlabel('$\omega$')
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:17: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:20: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:25: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines. This significantly improves the performance of a stem plot. To remove this warning and switch to the new behaviour, set the "use_line_collection" keyword argument to True.
Text(0.5, 0, '$\\omega$')
From the above example, we can see that although we are sampling the same sine wave signal, we obtain different frequency spectra for the two cases. The mismatch in sampling length for case 2 leads to the spectral leakage witnessed above. Notice how the main peak at
π
4
is lower in case 2 since the energy of this frequency has spread out to adjacent frequencies. Recall that the mismatch in sampling size creates a discontinuity between the periodically extended copies of our finite window. This introduces other frequencies into the spectrum to compensate for this discontinuity. In practice, spectral leakage is unavoidable since many real-world signals have much richer frequency content. This is where windowing comes in.
We would like windowing to smooth out the discontinuity and decrease the impact of spectral leakage. There are a many possible windows we may consider, but we will restrict ourselves to examining the following three:
Rectangular or Boxcar Window
Hamming Window
Kaiser (β=3)
a. Plot the time domain representation of each of these windows (using the N2=78 from above). Hint: use scipy.signal.hamming() and scipy.signal.kaiser().
b. Plot the magnitude spectrum of each window on the same plot on a dB scale. Please specify 512 points and use np.fft.rfft(). You may use plt.plot() instead of plt.stem() since we have many points here. We have provided a function sig2db that converts a magnitude response to dB scale. Comment on the differences in the magnitude spectrum of each window.
c. Apply each window to the sine wave signal we sampled with a fractional number of periods (x2) and plot the magnitude spectrum of each windowed result. Comment on the resulting spectra. How do the main lobe widths and side lobe heights of the sinusoid's peak differ?
N2 = 78 # number of points we use to create our window
x2 = x[:N2]
# quick function for converting a magnitude response to dB
def sig2db(mag_spec):
return 20*np.log10(mag_spec)
#Code for 2.a:
N = 78
ham = np.hamming(N)
kai = np.kaiser(N,3)
rect = np.ones(N)
plt.figure(figsize=(20,8))
plt.subplot(131)
plt.plot(range(len(ham)),ham)
plt.title('Hamming')
plt.xlabel('$\omega$')
plt.subplot(132)
plt.plot(range(len(kai)), kai)
plt.title('Kaiser')
plt.xlabel('$\omega$')
plt.subplot(133)
plt.plot(range(len(rect)), rect)
plt.title('Rect')
plt.xlabel('$\omega$')
#Code for 2.b:
plt.figure(figsize=(10,10))
ham_fft = abs(np.fft.rfft(ham,n=512))
w_1 = np.linspace(0,np.pi,len(ham_fft))
plt.plot(w_1,sig2db(ham_fft),label="Hamming")
kai_fft = abs(np.fft.rfft(kai,n=512))
w_2 = np.linspace(0,np.pi,len(kai_fft))
plt.plot(w_2,sig2db(kai_fft),label="Kaiser")
rect_fft = abs(np.fft.rfft(rect,n=512))
w_3 = np.linspace(0,np.pi,len(rect_fft))
plt.plot(w_3,sig2db(rect_fft),label="Rect")
plt.title('DTFT of the windows')
plt.xlabel('$\omega$')
plt.ylabel('Magnitude response (dB)')
plt.legend()
#Code for 2.c:
RECT = np.fft.rfft(rect*x2,n=512)
HAM = np.fft.rfft(ham*x2, n=512)
KAI = np.fft.rfft(kai*x2, n=512)
plt.figure(figsize=(20,8))
plt.subplot(131)
plt.plot(range(len(HAM)) ,abs(HAM))
plt.title('Hamming')
plt.xlabel('$\omega$')
plt.subplot(132)
plt.plot(np.linspace(0,np.pi,len(KAI)), abs(KAI))
plt.title('Kaiser')
plt.xlabel('$\omega$')
plt.subplot(133)
plt.plot(np.linspace(0,np.pi,len(RECT)), abs(RECT))
plt.title('Rect')
plt.xlabel('$\omega$')
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log10
"""
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log10
"""
Text(0.5, 0, '$\\omega$')
Comments for part 2.b: Rect has the highest magnitude, followed by Kaiser then Hamming. If we compare this result to the plots in time domain, the magnitude spectrum matches the average value of the different windows in time.
Comments for part 2.c: Rect has the highest magnitude for all the lobes, the side lobes are also comparatively highest amongst the three windows. Kaiser is the second highest. This is can reasoned with the fact that all the windows have different attenuation at different points. Rect does not do anything to the signal in terms of magnitude, Hamming attenuates slight more than Kaiser as it curves down on both sides, which is why its lobes are so small.
Exercise 3: Zero-Padding00000
Now let's consider the problem of examining frequency content in a piece of music and how zero-padding affects our ability to do so. Load and listen to the below audio clip.
fs, music_stereo = wavfile.read('Hallelujah_16k.wav') # Import the sound file
music_mono = music_stereo[:,0] # To obtain mono sound track
Audio(music_mono, rate = fs)
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: WavFileWarning: Chunk (non-data) not understood, skipping it.
"""Entry point for launching an IPython kernel.
There's a lot going on for such a short audio clip! What if we only look at the frequency content present in the first 256 samples?
N = 256
music_mono_short = music_mono[:N] # if we are only given the first 256 samples
fft_short = np.fft.rfft(music_mono_short)
omega = np.linspace(0,np.pi,len(fft_short))
plt.figure(figsize =(10,6))
plt.plot(omega, abs(fft_short))
plt.title('DFT for length-256 audio signal')
plt.xlabel(r'$\omega$')
plt.ylabel('Magnitude Response')
Text(0, 0.5, 'Magnitude Response')
a. Zero-pad our length-256 clip to length 2048. Plot the new magnitude spectrum of our zero-padded signal.
b. Comment on the differences (quantitative or qualitative) that you observe. Has zero-padding improved our ability to distinguish different frequencies?
c. Now, if we are only given the first 16 samples instead of 256. Do you think zero-padding to length 2048 will also give us the ability to identify all the peaks like in the previous case? Why or why not?
# Code for 3a:
new_out = np.zeros(2048)
for i in range(len(music_mono_short)):
new_out[i] = music_mono_short[i]
fft_new = np.fft.rfft(new_out)
omega = np.linspace(0,np.pi,len(fft_new))
plt.figure(figsize =(10,6))
plt.plot(omega, abs(fft_new))
plt.title('DFT for length-256 audio signal')
plt.xlabel(r'$\omega$')
plt.ylabel('Magnitude Response')
Text(0, 0.5, 'Magnitude Response')
Comments for 3b: Zero padding improves the resolution of the DFT as the graph shows more peaks within the same length. This helps to distinguish the different frequencies. With a greater N, we have a higher frequency resolution.
Comments for 3c: No, since if we decrease the number of samples, we are getting rid of the actual data. Hence, improving the resolution by doing zero padding does not allow greater ability to identify all the peaks.
Exercise 4: Spectrograms
Now let's look at the spectrogram for an audio signal. A spectrogram can be thought of as a two-dimensional signal with both time and frequency axes. Thus, the value of a spectrogram at a particular pair of time and frequency indicates how much energy we have of that frequency at the given time. Let visualize some examples to make this more concrete. Specifically, we have three audio files that contain three different speech sounds or utterances.
Vowel "a": specifically, we hear the sound "ah"
Consonant "r": speaker repeats the sound "rah"
Constant "b": speaker repeats the sound "bah"
Let's listen to them and visualize their spectrograms!
fs_a, vowel_a = wavfile.read('a.wav')
fs_r, cons_r = wavfile.read('r.wav')
fs_b, cons_b = wavfile.read('b.wav')
Audio(vowel_a, rate = fs_a)
Audio(cons_r, rate = fs_r)
Audio(cons_b, rate = fs_b)
nfft = 512
f_a, t_a, S_a = signal.spectrogram(vowel_a, fs_a, nperseg = nfft, noverlap = int(nfft/2), nfft = nfft)
f_r, t_r, S_r = signal.spectrogram(cons_r, fs_r, nperseg = nfft, noverlap = int(nfft/2), nfft = nfft)
f_b, t_b, S_b = signal.spectrogram(cons_b, fs_b, nperseg = nfft, noverlap = int(nfft/2), nfft = nfft)
plt.figure(figsize=(20,5))
plt.subplot(131)
plt.pcolormesh(t_a, f_a, sig2db(S_a))
plt.title('Spectrogram for "a"')
plt.ylim([0, 3000])
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.colorbar()
plt.subplot(132)
plt.pcolormesh(t_r, f_r, sig2db(S_r))
plt.title('Spectrogram for "r"')
plt.ylim([0, 4000])
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.colorbar()
plt.subplot(133)
plt.pcolormesh(t_b, f_b, sig2db(S_b))
plt.title('Spectrogram for "b"')
plt.ylim([0, 3000])
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7fb2d9103438>
a. Comment on the differences in the the above spectrograms. Which utterances have higher frequency content? Which have lower frequency content? Try thinking about the pitch of your voice when forming these sounds. Remember that the "r" and "b" clips both have the "ah" sound of the first clip, so some frequency content will be shared between all three clips
b. We have included a sound file robot.wav in the lab folder. This file contains a person saying the word "robot". Load the sound file and plot its spectrogram. Refer to the above spectrogram examples and the signal.spectrogram() documentation if you are unsure how to do this.
c. We know that this word contains the two consonants "r" and "b". From the example spectrograms provided above, can you tell where these consonants appear in the word's spectrogram (i.e. what time do they start)? If so, where do the consonants "r" and "b" begin?
#Code for 4.b here:
fs_robot, vowel_robot = wavfile.read('robot.wav')
f_robot, t_robot, S_robot = signal.spectrogram(vowel_robot, fs_robot, nperseg = nfft, noverlap = int(nfft/2), nfft = nfft)
plt.figure(figsize=(20,5))
plt.pcolormesh(t_robot, f_robot, sig2db(S_robot))
plt.title('Spectrogram for "Robot"')
plt.ylim([0, 3000])
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.colorbar()
/Users/ryanyoseph/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in log10
"""
<matplotlib.colorbar.Colorbar at 0x7fb2d8cd36d8>
Comments for 4.a here: The area with a certain colour corrspound to the amount of energy at a specific frequency at a time. As the graphs indicate, "r" has the biggest area at higher frequency regions. On the other hand, "a" has the lower freqency content for the same reason.
Comments for 4.c here: "r" starts at t =0, "b" starts at t slightly before t = 0.4 where the 'vertical line' as the graph from part a suggests.
Circular Convolution and Related DFT Properties
A key assumption in the computation of the DFT is the periodic extension of the signal in the time domain. This periodicity also relates to the 2π periodicity of the DTFT and DFT.
x[n]=x[n+N],∀k
X[k]=X[k+N],∀k
Please refer to the ECE 310 textbook for further mathematical background here. The important takeaway here is that any shifts in time must be circular. Consequently, the convolution that is induced in the time domain by multiplication in the DFT domain is circular. More concisely, multiplication in the DFT domain is circular convolution in the time domain. Thus, for a signal x[n] and system h[n] with DFTs X[k] and H[k], respectively, we have:
X[k]H[k]⟹
N−1
∑
n=0
x[m−n]Nh[m]=
N−1
∑
n=0
x[m]h[m−n]N,
where [k]N≡(kmodN). Notice that circular convolution between two sequences produces a very different result from the ordinary linear convolution. Thus, we must be careful to make the circular convolution we perform equivalent to the linear convolution of the two sequences. This will guarantee that multiplying in the DFT domain will produce a result consistent with multiplication in the DTFT domain. How do we accomplish this? With zero-padding!
Recall that the linear convolution between length L and length M sequences lead to a length L+M−1 result. Conversely, the circular convolution of these sequences would be of length max{L,M}; although, the two sequences are typically the same length by convention. Intuitively, if we want our circular conovlution to be the same as our linear convolution, we should guarantee the result is of length L+M−1. And this will work! In order to make our circular convolution equivalent to the linear convolution result, we do the following:
length(x)=L,length(h)=M
x⟹pad M−1 zeros
h⟹pad L−1 zeros.
This procedure will guarantee our convolution is the correct length and will also prevent aliasing due to the circular modulation of the shifting sequence. Now, why are we concerned with using the DFT and bothering with circular convolution? Consider that linearly convolving two sequences requires O(n2) multiply-add operations. On the other hand, computing the DFT via the FFT takes O(nlogn) operations. Thus, if we take the DFT of both our signal and filter, multiply in the DFT domain, and perform the inverse DFT via the IFFT, we will have the following computational complexity:
O(nlogn)+O(nlogn)
⏟
DFT of signal/filter
+
O(n)
⏟
X[k]H[k]
+
O(nlogn)
⏟
Inverse DFT
=O(3nlogn)+O(n)⟹O(nlogn).
These computational savings are especially impressive when n becomes large for something like an audio signal. Thus, we can use the DFT to perform fast linear convolution for our LSI systems.
Exercise 5: Fast Linear Convolution
In this exercise, you will create your own implementation of fast linear convolution via the DFT.
a. Fill in the function fast_convolution() to implement fast linear convolution via the DFT. Refer to the above background if you need help with the necessary steps. Verify the results of your function against signal.convolve() with the provided sample input and filter by printing the output of your function and signal.convolve(). Hint: use np.fft.ifft() to take the inverse DFT.
Next, we want to compare the running time of your fast linear convolution function against regular linear convolution via summation. Unfortunately, numpy and scipy's implementations of convolution via summation are too efficient (likely written in C code via Cython) to be compared against our fast convolution. As such, we have provided a function for you called linear_convolution(x1,x2) that implements convolution by summation in Python. This will be a fairer test of the efficiency of your fast linear convolution function.
We have provided a testing framework that builds a random signal and filter of length L, calls your function and the linear convolution function, and outputs the running time.
b. Test the two functions for L=26. Report the average running time in the below Markdown cell.
c. Test the two functions for L=210. Report the average running time.
d. Test the two functions for L=214. Report the average running time. (You may use 212 or 213 here if your laptop does not have enough memory for such large sequences.)
e. Comment on the results for each length. How do your results compare to the theoretical background given above?
#Code for 5.a:
#Fill in this function!
def fast_convolution(x,h):
lent = len(x)+len(h)-1
X = np.fft.fft(x,lent)
H = np.fft.fft(h,lent)
Y = X*H
y = np.fft.ifft(Y)
return y
#Test signals
x1 = [1,3,-2,-1]
x2 = [0,2,4,1]
#Testing framework for parts 5.b-5.d:
L = 2**14 #Change this for each part!
x = np.random.uniform(size=L)
h = np.random.uniform(size=L)
print('Regular Linear Convolution Results:')
%timeit linear_convolution(x,h)
print('Fast Linear Convolution Results:')
%timeit fast_convolution(x,h)
Regular Linear Convolution Results:
2.34 s ± 154 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Fast Linear Convolution Results:
5.15 ms ± 33.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Results for 5.b: Regular Linear Convolution Results: 30 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Fast Linear Convolution Results: 46.8 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Results for 5.c: Regular Linear Convolution Results: 3.02 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Fast Linear Convolution Results: 380 µs ± 657 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Results for 5.d: Regular Linear Convolution Results: 2.55 s ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Fast Linear Convolution Results: 1.32 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Comments for 5.e: The result matches the theoretical background. FFT is relatively slower for small samples but much faster for bigger samples.
Exercise 6: Where's Waldo Going?
We will conclude this lab with an interesting (and cute) demonstration of DFT properties. The notion of the DFT is not limited to one-dimensional signals: we may go to arbitrary numbers of dimensions. For this exercise, we will work with the (two-dimensional) DFT of an image. We have provided the DFT of the test image.
a. For this part, we will apply a linear phase to each row of the image's DFT. Mathematically, this means you will multiply each row r in the image's DFT by the following complex number
Phase at row r=e−j
2πn0
R
r,
where R is the number of rows in the image and n0 is the phase offset we would like. Notice that each row is multiplied by its own constant complex number. This phase scales linearly from row-to-row. Apply linear phase along the rows of the test image for n0=100. Take the inverse two-dimensional DFT of the resulting DFT and plot the real part of the resulting image. (Hint: use np.fft.ifft2() and np.real() here.)
b. Apply linear phase to each column of the image's DFT for n0=100. Take the inverse two-dimensional DFT of the resulting DFT and plot the real part of the resulting image. Note that you should divide by the number of columns in the above complex exponential for this case.
c. Apply linear phase to the rows and columns of the image's DFT. You may apply the phase along the rows then columns or columns then rows. Choose any non-zero offsets for the rows and columns, respectively.
d. Describe what is happening in the previous three parts. Why is this happening? Think about the DFT properties!
#Load and plot original image
img = imread('test-image.jpg')
n_rows = img.shape[0]
n_cols = img.shape[1]
plt.figure(figsize = (10,6))
plt.imshow(img,'gray')
plt.title('Original Image')
#Compute 2D-DFT
fft2 = np.fft.fft2(img)
row = fft2.shape[0]
col = fft2.shape[1]
#Code for part 6.a:
phase = np.zeros((row,col), dtype='complex128')
for i in range(row):
r = np.exp(-1j*2*np.pi*100*i/row)
phase[i,:]=fft2[i,:]* r
plt.figure(figsize = (10,6))
plt.imshow(np.real(np.fft.ifft2(phase)),'gray')
plt.title('Row Shifted Image')
#Code for part 6.b:
phase_b = np.zeros((row,col), dtype='complex128')
for i in range(col):
r_b = np.exp(-1j*2*np.pi*100*i/col)
phase_b[:,i]=fft2[:,i]* r_b
plt.figure(figsize = (10,6))
plt.imshow(np.real(np.fft.ifft2(phase_b)),'gray')
plt.title('Col Shifted Image')
#Code for part 6.c:
#Apply linear phase to the rows and columns of the image's DFT.
#You may apply the phase along the rows then columns or columns then rows.
#Choose any non-zero offsets for the rows and columns, respectively
fft2 = np.fft.fft2(img)
row = fft2.shape[0]
col = fft2.shape[1]
phase_c = np.zeros((row,col), dtype = 'complex128')
for i in range(row):
r_c1 = np.exp(-1j*2*np.pi*100*i/row)
phase_c[i,:]=fft2[i,:]* r_c1
for i in range(col):
r_c2 = np.exp(-1j*2*np.pi*100*i/col)
phase_c[:,i]=phase_c[:,i]* r_c2
plt.figure(figsize = (10,6))
plt.imshow(np.real(np.fft.ifft2(phase_c)),'gray')
plt.title('Row Shifted Image')
Text(0.5, 1.0, 'Row Shifted Image')
Answer for part 6.d: Multiplying by an exponential corrspound the circular shift proporty of DFT. Since it is circularly shifted, shifting with N for x[0] for example would bring it back to the same place. With this in mind, pixels that are shifted out of bound wrap around when shifted.
Submission Instructions
Please rename this notebook to "netid_Lab5" and submit a zip file including all the supplied files for this lab to Compass. Please also name your zip file submission "netid_Lab5".