Engineering Tripos Part IIB, 4F11: Speech & Language Processing, 2014-15

Module Leader

Prof P Woodland


Prof P Woodland and Prof B Byrne

Timing and Structure

Lent term. 14 lectures + 2 examples classes. Assessment: 100% exam.


3F1 and 3F3 useful


The aims of the course are to:

  • introduce major techniques for recognising and synthesising speech signals, and for statistical machine translation.


As specific objectives, by the end of the course students should be able to:

  • understand techniques for speech analysis, recognition and synthesis.
  • understand techniques for statistical machine translation.
  • apply these techniques in order to build simple speech recognition, synthesis and machine translation systems.
  • show awareness of the current state-of-the-art in speech recognition and synthesis and in machine translation technology.


Lecture 1: Overview/Introduction

  • Speech production mechanisms, types of speech sound, source-filter model, applications of speech and text processing

Lectures 2-3: Acoustic Analysis

  • FFT based methods, Mel scale, cepstral analysis, all-pole filter models, calculation of LP coefficients. formant and voicing analysis. Front-end analysis for speech recognition.

Lectures 4-5: ASR Introduction and Isolated Word Recognition

  • Statistical speech recognition, task complexity. Hidden Markov models. Continuous density HMM parameter estimation, Baum-Welch algorithm, Viterbi algorithm, Gaussian mixture models for HMMs.

Lecture 6: Sub-word Acoustic Models

  • Large vocabulary speech recogntion, limitations of word models, context dependent phones, parameter tying.

Lecture 7: Language Models

  • Perplexity, N-gram language models, discounting, interpolation.

Lecture 8: ASR Search Issues

  • Continuous speech recognition. Pruning. Integrating context dependent HMMs and N-gram language models.

Lectures 9-10: Weighted Finite State Transducers for Speech and Language Processing

  • Efficient realization of probabilistic models for sequence processing. Transduction, composition, determinization, minimum-cost search. WFSTs in ASR search and other language processing applications.

Lecture 11: Introduction to Statistical Machine Translation

  • Statistical pattern processing approaches to translation. Automatic evaluation of translation quality.

Lecture 12: SMT - Alignment

  • Parallel text as training data. Models of word and phrase alignment in translation. Model estimation procedures.

Lecture 13: SMT - Translation

  • Phrase-based translation systems. Implementation via WFSTs.

Lecture 14: Text-to-Speech Synthesis

  • Introduction to TTS. Data-driven synthesis and Hidden Markov Model approaches to TTS.


