Starting from:

$30

MP1: LPC

MP1: LPC
In this lab, you'll use linear predictive coding (LPC) to analyze and then resynthesize a speech sound.

In order to make sure everything works, you might want to go to the command line, and run

pip install -r requirements.txt

This will install the modules that are used on the autograder, including soundfile, numpy, h5py, and the gradescope utilities.

Part 1: Plotting and understanding the spectrogram
First, let's load a speech waveform. This is an extract from Nicholas James Bridgewater's reading of the Universal Declaration of Human Rights (https://librivox.org/universal-declaration-of-human-rights-volume-03-by-united-nations/).

LPC-10 synthesis was designed for an 8kHz sampling rate, so this file has been resampled to an 8kHz sampling rate.

import soundfile as sf
speech, samplerate = sf.read('humanrights.wav')
import numpy as  np
time_axis = np.arange(len(speech))/samplerate
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 5))
plt.plot(time_axis,speech)
plt.xlabel('Time (seconds)')
plt.title('Speech sample')
Text(0.5, 1.0, 'Speech sample')

Let's zoom in on part of it.

plt.figure(figsize=(14, 5))
plt.plot(time_axis[:samplerate],speech[:samplerate])
plt.xlabel('Time (seconds)')
plt.title('Speech sample')
Text(0.5, 1.0, 'Speech sample')

Let's also look at a spectrogram of the first 1.5 seconds or so, using a window length that's much shorter than one pitch period, so you can see the vertical striations corresponding to glottal closure instants. For this we'll use librosa. Note: librosa is not available on the autograder, so don't get too dependent on it.

import librosa, librosa.display
S = librosa.stft(speech[:int(1.5*samplerate)], hop_length=int(0.002*samplerate), win_length=int(0.005*samplerate))
Sdb = librosa.amplitude_to_db(abs(S))
plt.figure(figsize=(14, 5))
librosa.display.specshow(Sdb, sr=samplerate, hop_length=int(0.002*samplerate), x_axis='time', y_axis='hz')
print(Sdb.shape)
(1025, 751)

More products