Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Comparing 11 Speech-to-Text models using Tensorflow

Here I compare 11 Speech-to-Text models using Tensorflow, 100% jupyter notebook and simplify. Accuracy based on character position.

80% of the dataset to train, 20% of the dataset to test.

  1. Tacotron, test accuracy 77.09%
  2. BiRNN LSTM, test accuracy 84.66%
  3. BiRNN Seq2Seq + Luong Attention + Cross Entropy, test accuracy 87.86%
  4. BiRNN Seq2Seq + Bahdanau Attention + Cross Entropy, test accuracy 89.28%
  5. BiRNN Seq2Seq + Bahdanau Attention + CTC, test accuracy 86.35%
  6. BiRNN Seq2Seq + Luong Attention + CTC, test accuracy 80.30%
  7. CNN RNN + Bahdanau Attention, test accuracy 80.23%
  8. Dilated CNN RNN, test accuracy 31.60%
  9. Wavenet, test accuracy 75.11%
  10. Deep Speech 2, test accuracy 81.40%
  11. Wav2Vec Transfer learning BiRNN LSTM, test accuracy 83.24%

Link to repository, https://github.com/huseinzol05/NLP-Models-Tensorflow#speech-to-text

Link to dataset, https://tspace.library.utoronto.ca/handle/1807/24487, also included a notebook how to download the dataset.

Discussion

  1. Dataset is not that really big, only 286MB.
  2. Transfer learning Wav2Vec accuracy is not that high, maybe need more dataset.
  3. I use my own hyperparameters for Wav2Vec, use original hyperparameters caused my GPU sync problem, sequence is too long.
  4. I need to use bigger dataset.

submitted by /u/huseinzol05
[link] [comments]