Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[D] Classification of Misinformation and Scaled Sentiment Analysis

I do not know if this title best captures my idea for this post but I will nonetheless continue. Recently, I have been concerned with the extent of misinformation present in the media in the form of unsubstantiated claims and conspiracies as well as deepfakes and other similar things. This, along with human biases and unconscious judgements towards certain believes or topics, creates a climate where a tremendous amount of attention and force can be maliciously and wrongly directed towards some person or other entity and there is no “truth” or authority to set things right or to educate people correctly. My topic for discussion here, given this opening, is what measures can be taken, employing different ML techniques, especially those for sentiment analysis, to prevent digital mobs from unjustly releasing their wrath and to undo the effects of users having been exposed to unsubstantiated and potentially dangerous false information? What would you enact if you worked at some the company of some web browser or social media company to ensure that misinformation didn’t dominate conversation or proliferate further? Some topics I have in mind that I have yet to examine deeply are YouTube’s algorithm’s progressive recommendations of more radical content or Instagram failing to qwell networks that propogate anti-vaccination and other conspiracy rhetoric. There seems to me to be an erosion of evidenced information and a reversion to a chaotic environment where people are guided by the simplest, most tribalist claims. What are your thoughts?

submitted by /u/unsupervisedmodeler
[link] [comments]

Assessing the Quality of Long-Form Synthesized Speech

Automatically generated speech is everywhere, from directions being read out aloud while you are driving, to virtual assistants on your phone or smart speaker devices at home. While much research is being done to try to make synthesized speech sound as natural as possible—such as generating speech for low-resource languages and creating human-like speech with Tacotron 2—how does one evaluate if the generated speech sounds natural or not? The best way to find out is to ask people, who are very good at telling if something sounds natural or not.

In the field of speech synthesis, subjects are routinely asked to listen to samples of synthesized speech and rate their quality. Yet, until now, evaluation of synthesized speech has been done on a sentence by sentence basis. But often one wants to know the quality of a series of sentences that belong together, such as a paragraph in a news article or a turn in a conversation. This is where it gets interesting, as there is more than one way of evaluating sentences that naturally occur in a sequence, and, surprisingly, a rigorous comparison of these different methods has not been carried out. This in turn can hinder research progress in developing products that rely on generated speech.

To address this challenge, we present “Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs”, a publication to appear at SSW10 in which we compare several ways of evaluating synthesized speech for multi-line texts. We find that when a sentence is evaluated as part of a longer text involving several sentences, the outcome is influenced by the way in which the audio sample is presented to the people evaluating it. For example, when the sentence is presented by itself, without any context, the rating people give on average is substantially different from the rating they give when they listen to the same sentence with some context (while the context doesn’t have to be rated).

Evaluating Automatically Generated Speech
To determine the quality of speech signals, it is common practice to ask several human raters to give their opinion for a particular sample, on a 1-to-5 scale. This sample can be automatically generated, but it can also be natural speech (i.e., an actual person saying a sentence out loud), which serves as a control. The scores of all reviewers rating a particular speech sample are averaged to get a Mean Opinion Score (MOS).

Until now, MOS ratings were typically collected per sentence, i.e., raters listened to sentences in isolation to form their opinion. Instead of this typical approach, we consider three different ways of presenting speech samples to raters—both with and without context—and we show that each approach yields different results. The first, presenting the sentence in isolation, is the default method commonly used in the field. An alternative method is to provide the full context for the sentence. In this case, the entire paragraph to which the sentence belongs is included and the ensemble is rated. The final approach is to provide a context-stimulus pair. Here, rather than providing full context, only some context is provided, such as the preceding sentence(s) from the original paragraph.

Interestingly, these three different approaches for presenting speech give different results even when applied to natural speech. This is demonstrated in the figure below, where the MOS scores are presented for natural speech samples rated using the three different methods of presentation. Even though the sentences being rated are identical across the three different settings, the scores are different on average, depending on the context in which they were presented.

MOS results for natural speech from a dataset consisting of news articles. Though the differences appear small, they are significant between all conditions (two-tailed t-test with α=0.05).

Examination of the figure above reveals that raters rarely give top scores (a five) even to recorded human speech, which may be surprising. However, this is a typical result seen in sentence evaluation studies and probably has to do with a more generic pattern of behavior, that people tend to avoid using the extreme ends of a scale, regardless of the task or setting.

When evaluated synthesized speech, the differences are more pronounced.

MOS results for synthesized speech on the same news article dataset used above. All lines are synthesized speech, unless indicated otherwise.

To see if the way context is presented makes a difference, we tried several different ways of providing it: one or two sentences leading up to the sentence to be evaluated, provided as generated speech or real speech. When context is added, the scores get higher (the four blue bars on the left) except when the context presented is real speech, in which case the score drops (the rightmost blue bar). Our hypothesis is that this has to do with an anchoring effect—if the context is very good (real speech) the synthesized speech, in comparison, is perceived as less natural.

Predicting Paragraph Score
When an entire paragraph of synthesized speech is played (the yellow bar), this is perceived as even less natural than in the other settings. Our original hypothesis was a weakest-link argument—the rating is probably as bad as the worst sentence in the paragraph. If that were the case, it should be easy to predict the rating of a paragraph by considering the ratings of the individual sentences in it, perhaps simply taking the minimum value to get the paragraph rating. It turns out, however, that does not work.

The failure of the weakest-link hypothesis may be due to more subtle factors that are difficult to tease out with such a simple approach. To test this, we also trained a machine learning algorithm to predict the paragraph score from the individual sentences. However, this approach, too, was unable to successfully predict paragraph scores reliably.

Conclusion
Evaluating synthesized speech is not straightforward when multiple sentences are involved. The traditional paradigm of rating sentences in isolation does not give the full picture, and one should be aware of anchoring effects when context is provided. Rating full paragraphs might be the most conservative approach. We hope our findings help advance future work in speech synthesis where long-form content is concerned, such as audio book readers and conversational agents.

Acknowledgments
Many thanks to all authors of the paper: Rob Clark, Hanna Silen, Ralph Leith.

[P] Neural Network Model Builder & Visualiser | Netbrix.ml

[P] Neural Network Model Builder & Visualiser | Netbrix.ml

https://netbrix.ml/

Introduction

I recently discovered the Mithril.js JavaScript library and wanted a project to build up my skills with it! I ended up going with a simple web app for visualising and editing network models which I’ve named netbrix.ml. I’ve wanted to build something like this for a while since it seemed like a really good project to improve my web development skills and my understanding of the process of building deep learning models. After reading this post on /r/deeplearning where the writer gives insight into the modular nature of deep learning and gives the analogy of a deep learning ‘lego set’ it gave me the motivation to start work on this with that sort of vision in mind and I’ve now got a decent working web app!

I’m sure there are existing tools similar to this in existence, so I wanted to keep it as simple as possible and not try to over-engineer it. It’s meant to be easy and simple to use!

Importing Models

It has some cool features at the moment, like the ability to parse Sequential() model definitions in Keras e.g.

test_model = Sequential() test_model.add(Conv2D(64,(5,5), activation='relu')) test_model.add(MaxPooling2D((2,2))) test_model.add(Dropout(0.25)) 

(Alternatively you can paste in Keras JSON definitions from for example test_model.to_json())

Once that simple model definition is imported it will be parsed by the app to create the following neat visualisation:

https://i.redd.it/g60mc0qh2ll31.png

Editing Models

From here, you can make all the typical changes you would want to make to a model, including changing/adding layer parameters, adding new layers, changing the order of layers and removing layers, all without having to rely on Google to find the names of layers or their attributes. It can then be easily exported (or copied to the clipboard) with one click as either Python Keras code or a JSON spec which can be imported into Keras.

Building Models

It also has some nice features for building models from scratch, like the ability to add blocks of layers that come up in models frequently. Often it’s easy to forget the exact optimal order of layers for say a Convolutional block e.g. should pooling come before dropout or vice versa, and what about BatchNorm? Having preconfigured blocks of layers to choose from when building a model helps with this.

On top of that, just having an easily indexable list of layers is useful in itself.

Browsing Models

There is also a host of existing model architectures for a variety of machine learning tasks/datasets to explore and this is one of the most helpful features for me personally. Having an easy and centralised way to access a bunch of existing standard model architectures to take inspiration from is really useful! Rather than creating a model from scratch, you can find an existing model on the site, change say the input shape and a few of the hyperparameters and export it as working Keras code in just a few clicks. It’s also great for learning about the different architectures commonly used in building models.

Planned Features

  • Integration with TensorFlow.js to have some kind of in-browser training/prototyping
  • The ability to share your own model architectures
  • Support for variables. Currently you can specify meta tags for a model like ‘name’ and ‘source’. It would be cool to be able to specify variables in this same key/value fashion to allow even easier tuning.
  • Better mobile support. The site is relatively responsive at the moment, but not perfect. Hopefully you’re not building Keras models on your phone to begin with, but it will work if that’s something you wish to do 😁

So, I’d love to get some feedback on this project! Is it useful? Do you like the design? I’m open to criticism, this is primarily for me to learn 😁

Thanks for reading!

submitted by /u/DataSnaek
[link] [comments]

[D] Deep Learning rack server suggestions

Hello ML, my research lab just received a grant (25K €) for purchasing new hardware and we would like some recommendation so we can build the best deep learning rack. The team consists of 10-15 members. Our work involves different deep learning disciplines from images and videos to text or graphs.

We already have some servers with one or two GPU each shared between a few people and that worked out perfectly. Now we want to install a powerful rack with a setup similar to the DevBox spending all our budget. Our proposal, written one year ago:

Component Quantity
HPC 4u with space up to 8Gpus 1
Intel Skylake Xeon Gold Serie 6xxx o Power PC 9 2
32 Gb DDR4-2666 ECC REG 18
NVIDIA RTX 2080TI 4
Intel DC P4600 2TB NVMe PCIe 3.0 2
SATA3 2TB 6GB/s 7.200 RMP 1

Do you guys have any suggestion for a good/cost-efficient setup or some requirements that we must meet when configuring our rack server setup?

Thanks

submitted by /u/Gusinato95
[link] [comments]

[D] Forced alignment for low resource languages

So leave out Kaldi, Gentle, Montreal Fixed Alignment, etc. All those need detailed phonetic dictionaries etc which are only available for the most popular languages.

How does ML help if you can’t leverage ANY of the existing language toolsets? Does anyone have a good guess at possible ML solutions, can be semi supervised. Keep in mind when most of the work on forced alignment was done, it predates modern ML like deep neural networks etc.

Just want to hear your crazy ideas and hopefully I can take one and run with it. Am stumped right now with about 200 hours of unlabelled data with nothing to do with it except manual labelling which is soul crushing.

submitted by /u/sicp4lyfe
[link] [comments]

[D] 2nd Order Approximation in XGboost’s Objective Function

Hi all,

I have a quick question regarding XGboost’s objective function. I was reading the XGboost paper (https://arxiv.org/pdf/1603.02754.pdf). I see that authors approximated the original objective function using a 2nd order Taylor series (page 2, section 2.2). Is there a particular reason why it’s expanded to 2nd degree and not higher? I’m guessing that linear apprx. is not enough and higher orders require more computational power, but is there a mathematical background or is this a design choice?

submitted by /u/_kty
[link] [comments]