Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Audio/Digital Signal Processing/Recurrent NN – Need help understanding and reproducing this paper in Python

Hello everyone!

I am trying to reproduce this paper in Python: A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement by Jean-Marc Valin

Additionally, there is a blog post by the author explaining the paper differently: RNNoise: Learning Noise Suppression, and a GitHub repository with the code for training the proposed network.

However, I have difficulty understanding the concepts regarding preparing input data for training and prediction. Can someone give me practical notes on how I can achieve this?

Some questions I have in section II:

  1. The paper and blog post computes 22 bands at first. Where a DCT is applied on the log spectrum, resulting in 22 Bark-frequency cepstral coefficients. Which is closely related to the Mel-Frequency Cepstral Coefficients. What does this mean, and how does this work?

  2. The author also includes the temporal derivative and the second temporal derivative of the first six Bark-frequency cepstral coefficients across frames. What does this mean?

  3. In formula (5) the pitch correction for every band is calculated, with that the author computes the DCT of the pitch correlation across frequency bands and include the first six coefficients. I assume DCT returns a finite set of results. So only 6 of the first coefficients is used per band, correct?

  4. The author mentions including the pitch period as well as a spectral non-stationarity metric. What does this mean?

Some background: I have mostly worked with visual data and convolutional neural networks, so I have almost no knowledge about digital signal processing. Please bear with me.

Thanks in advance!

submitted by /u/VividFee
[link] [comments]