Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[R] Audio Conversion GAN: I wrote a Paper

A month ago I wrote a post in this subreddit (here) about a voice conversion and audio style transfer system on unpaired data I had been working on. Many users were quite surprised by the results and recommended me to write a paper about it, despite my completely lack of knowledge in the academic world (which I was very afraid of).

Well, I wrote that paper: https://arxiv.org/abs/1910.03713 (Here the old demo video: https://youtu.be/3BN577LK62Y )

I have no idea if the final product is up to standard or if there are some major holes or mistakes in it (quite likely honestly): this is my first ever paper and I asked advice to multiple subject-related professors in my university but nobody offered to help.

I would love to know what I should now do: I know papers can be published in publications of various level, but the entire process is entirely obscure to me, and I am quite happy to just have it on arXiv.

Finally, I really need to thank all the people that told me that writing a paper was the right idea: I feel like I learned a lot in the process. So, thank you again!

submitted by /u/artika_labs
[link] [comments]

Verifying and adjusting your data labels to create higher quality training datasets with Amazon SageMaker Ground Truth

Building a highly accurate training dataset for your machine learning (ML) algorithm is an iterative process. It is common to review and continuously adjust your labels until you are satisfied that the labels accurately represent the ground truth, or what is directly observable in the real world. ML practitioners often built custom systems to review and update data labels because accurately labeled data is critical to ML model quality. If there are issues with the labels, the ML model can’t effectively learn the ground truth, which leads to inaccurate predictions.

One way that ML practitioners have improved the accuracy of their labeled data is through using audit workflows. Audit workflows enable a group of reviewers to verify the accuracy of labels (a process called label verification) or adjust them (a process called label adjustment) if needed.

Amazon SageMaker Ground Truth now features built-in workflows for label verification, and label adjustment for bounding boxes and semantic segmentation. With these new workflows, you can chain an existing Amazon SageMaker Ground Truth labeling job to a verification or adjustment job, or you can import your existing labels for a verification or adjustment job.

This post walks you through both options for bounding boxes labels. The walkthrough assumes that you are familiar with running a labeling job or have existing labels. For more information, see Amazon SageMarker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%.

Chaining a completed Amazon SageMaker Ground Truth labeling job

To chain a completed labeling job, complete the following steps.

  1. From the Amazon SageMaker Ground Truth console, choose Labeling jobs.
  2. Select your desired job.
  3. From the Actions drop-down menu, choose Chain.

The following screenshot shows the Labeling jobs page:

For more information, see Chaining labeling jobs.

The Job Overview page carries forward the configurations you used for your chained job. If there are no changes, you can move to the next section Task Type.

Configuring label verification

To use label verification, from Task type, choose Label verification.

See the following screenshot of the Task type page:

The Workers section is preconfigured to the selections you made for the chained labeling job. You can opt to choose a different workforce or stick with the same configurations for your label verification job. For more information, see Managing Your Workforce.

You can define your verification labels, for example, Label Correct, Label Incorrect – Object(s) Missed, and Label Incorrect – Box(es) Not Tightly Drawn.

You can also specify the instructions in the left-hand panel to guide reviewers on how to verify the labels.

See the following screenshot of the Label verification tool page:

Configuring label adjustment

To perform label adjustment, from the Task type section, choose Bounding box. See the following screenshot of the Task type page:

The following steps for configuring the Workers section and setting up the labeling tool are similar to creating a verification job. The one exception is that you must opt into displaying existing labels in the Existing-labels display options section. See the following screenshot:

Uploading your existing labels from outside Amazon SageMaker Ground Truth

If you labeled your data outside of Amazon SageMaker Ground Truth, you can still use the service to verify or adjust your labels. Import your existing labels by following these steps.

  1. Create an augmented manifest with both your data and existing labels.For example, in the following example code, the source-ref points to the images that were labeled, and the “bound-box” attribute is the label.
    {"source-ref": "<S3 location of image 1>", "bound-box": <bounding box label>}
    {"source-ref": "<S3 location of image 2>", "bound-box": <bounding box label>}

  2. Save your augmented manifest in Amazon S3.You should save the manifest in the same S3 bucket as your images. Also, remember the attribute name of your labels (in this post, bound-box) because you need to point to this when you set up your jobs.Additionally, make sure that the labels conform to the label format prescribed by Amazon SageMaker Ground Truth. For example, you can see the label format for bounding boxes in Bounding Box Job Output.You are now ready to create verification and adjustment jobs.
  3. From the Amazon SageMaker Ground Truth console, create a new labeling job.
  4. In Job overview, for Input dataset location, point to the S3 path of the augmented manifest that you created.See the following screenshot of the Job overview page:
  5. Follow the steps previously outlined to configure Task Type, Workers, and the labeling tool when setting up your verification or adjustment job.
  6. In Existing-labels display option, for Label attribute name, select the name of your augmented manifest from the drop-down menu.See the following screenshot of Existing-labels display options:

Conclusion

A highly accurate training dataset is critical for achieving your ML initiatives, and you now have built-in workflows to perform label verification and adjustment through Amazon SageMaker Ground Truth. This post walked you through how to use the new label verification and adjustment features. You can chain a completed labeling job, or you can upload labels. Visit the Amazon SageMaker Ground Truth console to get started.

As always, AWS welcomes feedback. Please submit any comments or questions.


About the Authors

Sifan Wang is a Software Development Engineer for AWS AI. His focus is on building scalable systems to process big data and intelligent systems to learn from the data. In his spare time, he enjoys traveling and hitting the gym.

 

 

 

Carter Williams is a Web Development Engineer on the Mechanical Turk Requester CX team with a focus in Computer Vision UIs. He strives to learn and develop new ways to gather accurate annotation data in intuitive ways using web technologies. In his free time, he enjoys paintball, hockey, and snowboarding.

 

 

 

Vikram Madan is the Product Manager for Amazon SageMaker Ground Truth. He focusing on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys running long distances and watching documentaries.

 

 

Amazon Textract is now HIPAA eligible

Today, Amazon Web Services (AWS) announced that Amazon Textract, a machine learning service that quickly and easily extracts text and data from scanned documents, is now eligible for healthcare workloads that require HIPAA certification. This launch builds upon the existing portfolio of AWS artificial intelligence services that are HIPAA-eligible, including Amazon Translate, Amazon Comprehend, Amazon Transcribe, Amazon Polly, Amazon SageMaker and Amazon Rekognition – that help deliver better healthcare outcomes.

Healthcare providers routinely extract text and data from documents such as medical records and forms through manual data entry or simple optical character recognition (OCR) software. This is a time-consuming and often inaccurate process that produces outputs requiring extensive post-processing before it can be used by other applications. What organizations want instead is the ability to accurately identify and extract text and data from forms and tables in documents of any format and from a variety of file types and templates.

Amazon Textract analyzes virtually any type of document, automatically generating highly accurate text, form, and table data. Amazon Textract identifies text and data from tables and forms in documents – such as patient information from an insurance claim or values from a table in a scanned medical chart – and recognizes a range of document formats, including those specific to healthcare and insurance, without requiring any customization or human intervention. Amazon Textract makes it easy for customers to accurately process millions of document pages in a matter of hours, significantly lowering document processing costs, and allowing customers to focus on deriving business value from their text and data instead of wasting time and effort on post-processing. Results are delivered via an API that can be easily accessed and used without requiring any machine learning experience.

Starting today, Amazon Textract is now a HIPAA-eligible service, which means healthcare customers can take full advantage of it. Many healthcare customers like Cerner, Fred Hutchinson Cancer Research Center, and The American Heart Association, are already exploring new ways use the power of ML to automate their current workloads and transform how they provide care to patients, all while meeting the security and privacy requirements required by HIPAA.

Change Healthcare is a leading independent healthcare technology company that provides data and analytics-driven solutions to improve clinical, financial, and patient engagement outcomes in the U.S. healthcare system. “At Change Healthcare, we believe that we can make healthcare affordable and accessible to all by improving the timeliness and quality of financial and administrative decisions.  This can be achieved by the power of machine learning technology to understand more from our data. But unlocking the potential of this information can often be difficult as it’s siloed in tables and forms that traditional optical character recognition hasn’t been able to analyze,” said Nick Giannasi, EVP and Chief AI Officer at Change Healthcare. “Amazon Textract further advances document understanding with the ability to retrieve structured data in addition to text, and now with the service becoming HIPAA eligible, we’ll be able to liberate the information from millions of documents and create even more value for patients, payers, and providers.”

Cambia Health Solutions is a total health solutions company and the parent company of six regional health plans, including Regence, an insurer serving 2.6 million members in Oregon, Idaho, Utah, and Washington. Cambia is transforming the health care system to be more economically sustainable and efficient for people and their families. “Over the past 100 years Cambia has been dedicated to improving health care for people and their families. To help us achieve that goal, we’re always evaluating new innovations and opportunities to optimize care coordination. One area of focus is streamlining administrative processes that are time and labor intensive. We’re excited to explore Amazon Textract to help us automate the process of extracting valuable data from paper forms accurately and efficiently. The powerful combination of data science, A.I., and a person-focused approach is key to our mission of transforming the health care system” said Faraz Shafiq, Cambia Health Solutions Chief Artificial Intelligence Officer.

ClearDATA is a HITRUST certified AWS Managed Service Provider trusted by customers across the globe to safeguard their sensitive data and power their critical applications. Matt Ferrari, Chief Technology Officer at ClearDATA, says “It’s exciting to see AWS add their optical character recognition service powered by machine learning, Amazon Textract, to their list of HIPAA eligible services. A lot of medical data that is shared among payers and providers is locked in image-based files like PDFs. Instead of manually processing that kind of data, healthcare organizations can now use Amazon Textract service to extract medical data from files that previously have been non-machine readable. This brings an opportunity to integrate this data with their electronic health records, or other cloud technologies like Amazon Comprehend Medical that can identify protected health information in the dataset.This is just another step forward in increasing the opportunity to use these emerging technologies to improve access to data, get better insights, lower costs, and improve patient and member experiences”. ClearDATA offers solutions and services that protect healthcare organizations from data privacy risks, improves their data management, and scales their healthcare IT infrastructure, along with one of the most comprehensive Business Associate Agreements in the healthcare industry.

For additional information on Amazon Machine Learning services and how healthcare and life sciences companies can run HIPAA-eligible workloads on AWS please reference the following materials:

To get started with Amazon Textract, you can click the “Get Started with Amazon Textract”, button on the Amazon Textract page. You must have an Amazon Web Services account; if you do not already have one, you will be prompted to create one during the process. Once you are signed in to your AWS account, try out Amazon Textract with your own images or PDF documents using the Amazon Textract Management Console. You can also download the Amazon Textract SDKs to start creating your own applications. Please refer to our step-by-step Getting Started Guide for more information.


About the author

Kriti Bharti is the Product Lead for Amazon Textract. Kriti has over 15 years’ experience in Product Management, Program Management, and Technology Management across multiple industries such as Healthcare, Banking and Finance, and Retail. In her time at AWS, she has helped launch a number of new services including AWS IoT Device Management and AWS IoT Device Defender. In her spare time, you can find Kriti having a pawsome time with Fifi and her cousins, reading, or learning different dance forms.

[P] higher. A PyTorch library to do gradient-based hyperparameter optimization and meta-learning without changing models/optimizers

I wanted to share with you this really new project I just stumbled upon from facebook AI research.

Implementing gradient-based hyper-parameter optimization and meta-learning has always been hard because of the non-differentiable optimizers and the stateful, non functional models. This library is supposed to make things easier by replacing existing stateful models with stateless ones automatically at run-time. It also implement differentiable version for most of the the torch.optim optimizers (although you cant use third party ones out of the box).

This means that we can finally differentiate through the usual training loop code with very little changes!

I didn’t try the library myself but it seems really easy to implement and from the stars looks really promising. Let me know what you think.

repo: https://github.com/facebookresearch/higher

submitted by /u/rikkajounin
[link] [comments]

[R] Machine learning edge devices: benchmark report

We benchmarked five novel edge devices:

We used different frameworks and models, to see which combinations perform best. In particular, we focused on performance outcomes for machine learning on the edge.

You can see the results here:

Edit: Added links to the hardware in question.

submitted by /u/tryo_labs
[link] [comments]