Create Your First Speech-to-Text Model with Connectionist Temporal Classification

Total time needed: ~4 hours
Learning Objectives
Learn about methods of speech recognition, and create your own speech recognition model that recognizes speech on-the-fly
Potential Use Cases
People creating a speech recognition model, or people using an existing speech recognition model that want to understand and tune it better
Target Audience
BEGINNERData scientists new to speech recognition
Go through the following annotated items in order:
ARTICLE 1. Speech Recognition — Deep Speech, CTC, Listen, Attend, and Spell
  • What are the different methods of speech recognition?
25 minutes
ARTICLE 2. Sequence Modeling with CTC
  • What is the mathematical background to connectionist temporal classification?
15 minutes
VIDEO 3. Real-time Speech to Text with DeepSpeech - Getting Started on Windows and Transcribe Microphone Free
  • How do you run the Deep Speech speech recognition model, which was trained with connectionist temporal classification?
40 minutes
ARTICLE 4. Train Your Own Speech Recognition Model in 5 Simple Steps
  • How do you tune an existing connectionist temporal classification speech recognition model?
30 minutes
ARTICLE 5. Building an End-to-End Speech Recognition Model in PyTorch
  • How do you build a speech recognition model from scratch with a connectionist temporal classification loss function?
120 minutes

Concepts Covered