$30
MP1: LPC
In this lab, you'll use linear predictive coding (LPC) to analyze and then resynthesize a speech sound.
In order to make sure everything works, you might want to go to the command line, and run
pip install -r requirements.txt
This will install the modules that are used on the autograder, including soundfile, numpy, h5py, and the gradescope utilities.
Part 1: Plotting and understanding the spectrogram
First, let's load a speech waveform. This is an extract from Nicholas James Bridgewater's reading of the Universal Declaration of Human Rights (https://librivox.org/universal-declaration-of-human-rights-volume-03-by-united-nations/).
LPC-10 synthesis was designed for an 8kHz sampling rate, so this file has been resampled to an 8kHz sampling rate.
import soundfile as sf
speech, samplerate = sf.read('humanrights.wav')
import numpy as np
time_axis = np.arange(len(speech))/samplerate
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 5))
plt.plot(time_axis,speech)
plt.xlabel('Time (seconds)')
plt.title('Speech sample')
Text(0.5, 1.0, 'Speech sample')
Let's zoom in on part of it.
plt.figure(figsize=(14, 5))
plt.plot(time_axis[:samplerate],speech[:samplerate])
plt.xlabel('Time (seconds)')
plt.title('Speech sample')
Text(0.5, 1.0, 'Speech sample')
Let's also look at a spectrogram of the first 1.5 seconds or so, using a window length that's much shorter than one pitch period, so you can see the vertical striations corresponding to glottal closure instants. For this we'll use librosa. Note: librosa is not available on the autograder, so don't get too dependent on it.
import librosa, librosa.display
S = librosa.stft(speech[:int(1.5*samplerate)], hop_length=int(0.002*samplerate), win_length=int(0.005*samplerate))
Sdb = librosa.amplitude_to_db(abs(S))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Sdb, sr=samplerate, hop_length=int(0.002*samplerate), x_axis='time', y_axis='hz')
print(Sdb.shape)
(1025, 751)