Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[D] – Finding research collaborators for medical AI projects outside of my university

As the title entails; how may I find researchers / engineers / students (MSc or PhD) to collaborate with on a project outside of the sphere of my own university? Is there any community where experts are able to network and form collaborations? I’m super eager to do some research on the side.

Already been offered to be involved with several projects at my university, but none of them interest me (mainly been related to either utilizing geometric DL for glaucoma or U-net for lung carcinoma) – and so I wish to look beyond this city.

More information, if relevant:

  • My domain expertise lies in healthcare/medicine, education, and gaming industry – but I am open to exploring other domains.
  • Since I know some people care; my university is ranked worldwide top <40 in engineering. My master’s degree is in machine learning, with a bachelor’s degree in medical engineering (I can verify this).
  • Due to affiliation with my universty, I am eligible to publish to arXiv without an endorsement from a registered author.
  • I work as a machine learning consultant and data science instructor alongside my studies.
  • I’m very familiar with other technologies, such as Kafka and its ecosystem, Flutter, and more.

To clarify; I am looking for, in particular, people that already know a lot of machine learning who wish to collaborate on a research project – whether it’d be deep reinforcement learning, bayesian deep learning, or whatever else. Hence why I didn’t feel like this would be appropriate to post at /r/learnmachinelearning

submitted by /u/Naveos
[link] [comments]

Using Amazon Polly in Windows Applications

AWS offers a vast array of services that allow developers to build applications in the cloud. At the same time, Windows desktop applications can take advantage of these services as well. Today, we are releasing Amazon Polly for Windows, an open-source engine that allows users to take advantage of Amazon Polly voices in SAPI-compliant Windows applications.

What is SAPI? SAPI (Speech Application Programming Interface) is a Microsoft Windows API that allows desktop applications to implement speech synthesis. When an application supports SAPI, it can access any of the installed SAPI voices to generate speech.

Out of the box, Microsoft Windows provides one SAPI male and female voice that can be used in any supported voice application. With Amazon Polly for Windows, users can install over 50 additional voices across over 25 languages, paying only for what they use.  For more details, please visit the Amazon Polly documentation and check the full list of text-to-speech voices.

Create an AWS account

If you don’t already have an AWS account, you can sign up here, which gives you 12-months in our free tier. During the first 12 months, Amazon Polly is free for the first 5 million characters/month. How many characters is that? As an example, “Ulysses” by James Joyce is 730 pages and contains approximately 1.5 million characters. So you could have Amazon Polly read the entire book three times and still have an additional 500,000 free characters for the remainder of the month.

Configure your account

  1. Log in to your AWS account.
  2. After you’ve logged in, click Services from the top menu bar, then type IAM in the search box. Click IAM when it pops up.
  3. On the left, click Users
  4. Click Add User
  5. Type in polly-windows-user (you can use any name)
  6. Click the Programmatic access check box and leave AWS Management Console access unchecked
  7. Click Next: Permissions
  8. Click Attach existing policies directly
  9. At the bottom of the page, in the search box next to Filter: Policy type, type polly
  10. Click the check box next to AmazonPollyReadOnlyAccess
  11. Click Next: Review
  12. Click Create user

IMPORTANT: Don’t close the webpage. You’ll need both the access key ID and the secret access key in Step 3.

Step 2: Install the AWS CLI for Windows

Click here to download the AWS CLI for Windows.

Step 3: Configure the AWS client

Amazon Polly for Windows requires an AWS profile called polly-windows. This ensures that the Amazon Polly engine is using the correct account.

  1. Open a Windows command prompt
  2. Type this command:
    aws configure --profile polly-windows 

  3. When prompted for the AWS Access Key ID and AWS Secret Access Key, use the values from the previous step.
  4. For Default Region, you can hit Enter for the default (us-east-1) or enter a different Region. Make sure to use all lower-case.
  5. For Default output format, just hit Enter
  6. Verify this worked by running the following command. You should see a list of voices:
    aws --profile polly-windows polly describe-voices 

Step 4: Install Amazon Polly TTS Engine for Windows

Click here to download and run the installer. You can verify that the installer worked properly. Amazon Polly for Windows comes with PollyPlayer, an application that allows you to experiment with the voices without additional software. Simply pick a voice, enter text, and then click Say It.

Using Amazon Polly Voices in Applications

The Amazon Polly voices are accessible in any Windows application that implements Windows SAPI. This means that after the Amazon Polly voices are installed, you simply need to select the Amazon Polly voice that you want to use from the list of voices in the application.

Amazon Polly supports SSML (Speech Synthesis Markup Language), which allows users to add tags to customize the speech generation. With Amazon Polly for Windows, users can either use plaintext or SSML tags when submitting requests. The standard Amazon Polly limits apply of 3000 maximum billed characters per request, or 6000 characters total (SSML tags are not billed).

Example: Using Amazon Polly for Windows with Adobe Captivate

Building eLearning content is a great use case for generated speech. In the past, content managers would need to record voice content, and then re-record as content changes. Using an eLearning designer such as Adobe Captivate along with Amazon Polly voices allows you to easily create and dynamically update content whenever you need.

You can use any SAPI-enabled eLearning solution. In this demonstration, we walk through creating a simple slide with Captivate to show how quickly and easily you can add voice content. If you don’t already have Captivate, you can download a free trial here.

Step 1: Create a project

Start Captivate and click New Project / Blank Project to create a new project.

At this point, you have a new blank project with a single slide.

Step 2: Add speech content

From the Audio menu, click Speech Management.

This brings up a Speech Management modal window, where you can add speech content to the slide. Click on the Speech Agent drop-down and select Amazon Polly – US English – Salli (Neural).  By default, all slides to use this voice.

Click the + button to add content.

In the textbox, type My name is Salli. My speech is generated by Amazon Polly.

Now we must generate the audio. Behind the scenes, Captivate uses the Windows SAPI driver to call back to AWS to generate the speech. Click Save and Generate Audio.

After the speech is generated, you can preview the audio by clicking the Play button next to the Generate Audio button.

You hear Salli speaking the text. Click the Close button.

After closing the window, you can preview the entire project to hear the speech with the slide.

The wide selection of Amazon Polly voices allows a content manager to build and experiment with limitless combinations of speech. Because content and voice selections can be updated at any time, content managers can keep both the audio presentation and content fresh without ever having to go near a recording studio.

Now that you’ve installed Amazon Polly for Windows, you can have fun experimenting with different variations of speech using using SSML tags, which are all fully supported in Windows. And because Amazon Polly for Windows is open-source, you can feel free to contribute features and submit feature requests. You can share feedback at the Amazon Polly forum. We’d love to hear how you’re using Amazon Polly for Windows!


About the Author

Troy Larson is a Senior DevOPs Cloud Architect for AWS Professional Services.

 

[D] Why does backtranslation work?

I think I must be misunderstanding how backtranslation, because I’m not seeing how this could help. I’ll describe my current understanding then I’ll ask my question.

The usual setup is that you have some some small set B of parallel data between a source and target language. Your goal is to make a model that a language in the source language and produced the translated version in the target language.

In addition to the small dataset B, you also have some potentially very large corpus A of monolingual data in the target language. In order to leverage this data, you train a model in the reverse direction i.e target to source, by using B with the entries flipped. Then you use this model to make A’, which consists of the translations of entries in A by using the reverse model. Finally, you add A’ to B, get some final set C which you then train source –> target model.

In some sense, this should only help if your target –> source model is good. However, you trained this model only on B. This raises the following questions:

1) if you can build a good target –> source model from just B, why can’t you do the same with source –> target?

2) If you do get some improvements, why can’t you continue this process again? i.e. Train the source –> target model using C, then grab some large monolingual corpus from the source language, backtranslate that to make some new set A”, then add A” to C and re-train the target –> source model then make more source –> target examples by backtranslating the new model? Rise and repeat till you run out of compute.

Finally, is there a good reference for this kind of stuff? Most papers which use backtranslation are extremely vague about it.

submitted by /u/TheRedSphinx
[link] [comments]

[P] Curated Papers (early release)

Hi all,

I’m launching Curated Papers, for the first time in this subreddit!

It’s a website that let you organize lists of academic papers, share curated lists and discover lists made by others.

Think that you need to get into a new field of study… so instead of manually researching what papers to read, going over references, juggling papers back and forth (research that can take a long time), you could instead discover a curated list on the subject, made by a researcher coming from this field.

Of course that it will not entirely replace your need to search for papers, but at least you’ll start with a good basis moving forward.

It’s also some sort of a social network built around curated lists and academic papers, you can for example, like, discuss and follow curated lists, academic papers, or other users and stay up to date with your interests.

I’ll be happy to get your feedback, did you like it? do you have any feature request?

Thanks!

submitted by /u/getlasterror
[link] [comments]

[R] What do you think about the idea of creating a random forest using DL for tabular data?

There seems to be no reason this is not possible. There’s a reason I think it’s good not only in concept but also in performance. It is not a good performance that is required of the models that consist of RF, but RF requires the models to be overfitting as possible and to be different from each other as possible. This seems to be possible enough for DL for tabular data.

The nice thing about this is that the model could deal with recursion on tabular data. DL is weak at processing tabular data and tree-based models cannot handle recursion. So it would be nice to be good at both.

I looked for a while but couldn’t find anything about this. I wonder if there’s anything I couldn’t find…

submitted by /u/SunghoYahng
[link] [comments]

[D] On pornographic, NSFW and non-consensual images in the ImageNet dataset. What’s the path forward?

Dear Reddit-ML community,

In the imagenet dataset, ( classes 445 -n02892767- [’bikini, two-piece’] and 459- n02837789- [’brassiere, bra, bandeau’]) there are many images that are verifiably pornographic (you can see the porn-star’s webpage in the pic!), shot in a non-consensual setting, voyeuristic and also entail underage nudity (See collage here).This has deep ramifications not just in the legal realm for downloading and storing these images, but also has a trickle down effect with regards to the models trained on this dataset. Ex: If you are an artist making/selling neural art, the unethical nature of the seed images could sully the sanctity of the art (See: https://openreview.net/forum?id=HJlrwcP9DB )

The question now is: What’s the best path forward? Image deletion and replacement? Do chime in with your thoughts!

PS: I had written to the creators of the dataset (waay before the ImageNet Roulette thingy), but received no replies.

submitted by /u/VinayUPrabhu
[link] [comments]

Is there a way to train a scikit classifier to make one prediction per N samples? [Project]

So, I originally posted this on StackOverflow, but I was told that my question was “too broad” and my thread was closed.

I’m working on replicating the research done in this paper.

I have a pandas DF which looks like this:

Date In1 In2 In3 ... Out Day1 -1 1 -1 -1 Day2 1 -1 1 1 Day3 -1 1 -1 -1 Day4 -1 1 1 1 Day5 1 1 1 1 ... 

Now, I’ve already done what they did in the paper. Which is to say, I’ve trained multiple models in scikit to predict "Out" based on all the feature columns "In1", ..., "In10".

However, these are daily predictions and I wanna see what would happen if I make weekly predictions.

Essentially, I want to use df.loc[Day1:Day5, In1:In10] to predict df.loc[Day5, "Out"].

Of course, "Out" would be redefined as cumulative returns over the last 5 days, rather than what it currently is i.e. daily returns.

The problem is, I have absolutely no idea how to go about making a single prediction with N samples. (in this case 5)

My X_train/X_test are DataFrames with the "Out" column dropped & my y_train/y_test is a Series of the "Out" column. I prefer this because I’m not entirely comfortable with arrays.

Is there a way to make scikit use N samples for a single prediction?

submitted by /u/JebusWasAnAlien
[link] [comments]