Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] SpecAugment experiments using tensor2tensor

SpecAugment experiments using tensor2tensor

This is an implementation of SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition.


  • The paper introduces three techniques for augmenting speech data in speech recognition.
  • They come from the observation that spectrograms which often used as input can be treated as images, so various image augmentation methods can be applied.
  • I find the idea interesting.
  • It covers three methods: time warping, frequency masking, and time masking.
  • Details are clearly explained in the paper.
  • While the first one, time warping, looks salient apparently, Daniel, the first author, told me that indeed the other two are much more important than time warping, so it can be ignored if necessary. (Thanks for the advice, Daniel!)
  • I found that implementing time warping with TensorFlow is tricky because the relevant functions are based on the static shape of the melspectrogram tensor, which is hard to get from the pre-defined graph.
  • I test frequency / time masking on Tensor2tensor’s LibriSpeech Clean Small Task.
  • The paper used the LAS model, but I stick to Transformer.
  • To compare the effect of specAugment, I also run a base model without augmentation.

submitted by /u/longinglove
[link] [comments]

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.