Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Visualizing Effect of Deep Double Descent on Model “Lottery Ticket” Architecture? [D]

Has anyone done any work on visualizing how the internal “lottery ticket” structure of a neural network changes as it goes through deep-double-descent?

Background:
One popular theory for explaining Deep Double Descent is that double descent occurs as a model truly learns to generalize by finding the “Occam’s Razor” model — the idea that the simplest model to fit the data is the best model for generalizing a solution. This is closely associated with Lottery Ticket Hypothesis and model compression, where you can cull a model’s under-used weights to arrive at a smaller model that provides almost identical accuracy. Lottery Ticket Hypothesis says (roughly paraphrased) that there is a “model within the model” that is the most significant portion of a deep neural network, and once you find that “winning ticket”, then the other nodes in the network aren’t that important.

What I’m wondering is — has there been any work done on visualizing the network architecture of most-significant weights as a model goes through the stages of Deep Double-Descent — from first trough, to plateau, to second descent?

I’m curious to know how much the core “internal architecture” changes in each of those stages, and if we can actually visualize the architecture narrowing in on that “Occam’s Lottery Ticket”…?

submitted by /u/CHerronAptera
[link] [comments]

[D] BERT Large Fine-tune Benchmarks with NVIDIA Quadro RTX 6000 & RTX 8000 GPUs

Hey ML community,

We recently ran a series of benchmark tests showing the capabilities of NVIDIA Quadro RTX 6000 and RTX 8000 GPUs on BERT Large with different batch sizes, sequence lengths, and FP32 and FP16 precision. These were ran using the NVIDIA benchmark script found on their github, and show 1, 2, and 4 GPU configs in a workstation.

RTX 6000 https://blog.exxactcorp.com/nvidia-quadro-rtx-6000-bert-large-fine-tune-benchmarks-with-squad-dataset/

RTX 8000 https://blog.exxactcorp.com/nvidia-quadro-rtx-8000-bert-large-fine-tuning-benchmarks-in-tensorflow/

What types of tests/benchmarks would you like to see ran on these GPUs? What are your thoughts?

Cheers,

JM

submitted by /u/exxact-jm
[link] [comments]

[P] Seeing music using deepsing: Creating machine-generated visual stories of songs

Can machines dream while listening to music? Is it possible to turn music into images in a meaningful way? deepsing was born to materialize our idea of translating audio to images inspired by Futurama Holophoner. In this way, deepsing is able to autonomously generate visual stories which convey the emotions expressed in songs. The process of such music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. To overcome these limitations, deepsing employs a trainable cross-modal translation method, leading to a deep learning method for generating sentiment-aware visual stories.

We have implemented a front-end to our method at https://www.deepsing.com You can find an example of a purely machine-generated visual story using our method at https://deepsing.com/engine/9C0xGB73Uuc/5dfbcd1ec9e5f7311d8a9fcf Note that the version available at https://www.deepsing.com is currently lacking many essential features, but demonstrates the basic concept of our idea! Also, note that song lyrics are NOT used in this process, since the proposed method currently works based SOLELY on the sentiment induced by the audio!

Furthermore, you can find more information in our preprint https://arxiv.org/abs/1912.05654, while we have also released the code of our method at https://github.com/deepsing-ai/deepsing Feel free to hack with us and share your opinions with us!

submitted by /u/deepsing-ai
[link] [comments]

Cinnamon AI saves 70% on ML model training costs with Amazon SageMaker Managed Spot Training

Developers are constantly training and re-training machine learning (ML) models so they can continuously improve model predictions. Depending on the dataset size, model training jobs can take anywhere from a few minutes to multiple hours or days. ML development can be a complex, expensive, and iterative process. Being compute intensive, keeping compute costs low for ML development is vital and a key enabler to achieving scale.

Amazon SageMaker is a fully managed service to build, train, tune, and deploy ML models at scale. Amazon SageMaker Managed Spot Training enables you to save up to 90% in training costs by using Amazon EC2 Spot Instances for training.

EC2 Spot Instances are a great way to optimize compute costs for ML training workloads, they use spare Amazon EC2 capacity which is available for up to a 90% discount over On-Demand Instances. When there is a spike in requests for a particular On-Demand instance type in a specific availability zone (AZ), AWS can reclaim the Spot Instances with a two-minute notification.

This post describes how Cinnamon AI reduced their ML training costs by 70% and increased the number of daily training jobs by 40% without increasing their budgets by using Amazon SageMaker Managed Spot Training.

Amazon SageMaker Managed Spot Training

Managed Spot Training uses EC2 Spot Instances to run training jobs instead of On-Demand Instances. With Managed Spot Training, Amazon SageMaker manages the Spot capacity and handles interruptions. In case of a Spot interruption, Managed Spot Training pauses the training job and reliably resumes as Spot capacity becomes available. As a result, Managed Spot Training is best suited for model training jobs with flexible starting times and run durations. You can configure your training job to use checkpoints. When enabled, Amazon SageMaker copies checkpoint data from a local path to Amazon S3 and resumes interrupted training jobs from the last checkpoint instead of restarting. Managed Spot Training eliminates the need for you to build additional tooling to poll for Spot capacity or manage interruptions. You can use Managed Spot Training when training models built using the popular ML frameworks, Amazon SageMaker built-in algorithms, and custom-built models.

To enable the feature, choose Enable managed spot training on the Amazon SageMaker console. See the following screenshot.

If you are using Amazon SageMaker SDK, set train_use_spot_instances to true in the Estimator constructor. You can also specify a stopping condition that controls how long Amazon SageMaker waits for Spot Instances to become available.

Cinnamon AI saves 70% on model training costs

With the massive opportunity that cognitive and artificial intelligence (AI) systems present, many companies are developing AI-powered products to build intelligent services. One such innovator is Cinnamon AI, a Japan-based startup with a mission to “extend human potential by eliminating repetitive tasks” through their AI service offerings.

Cinnamon AI’s flagship product, Flax Scanner, is a document reader that uses natural language processing (NLP) algorithms to automate data extraction from unstructured business documents such as invoices, receipts, insurance claims, and financial statements. It converts these documents into database-ready files. The goal is to eliminate the need for humans to read such documents to extract the required data, thus saving businesses millions of hours in time and reducing their operational costs. This service also works on hand-written documents and with Japanese characters.

Cinnamon AI has also developed two other ML-powered services called Rossa Voice and Aurora. Rossa Voice is a high-precision, real-time voice recognition service that has applications around voice fraud detection and to transcribe records at call centers. And Aurora is a service that automatically extracts necessary information from long sentences in documents. You can use this service to find important information from specifications and contract documents.

Cinnamon AI had a goal to reduce their ML development costs, so they decided to consolidate their disparate development environments into a single platform and then continuously optimize on costs. They chose AWS to develop their ML services on because of AWS’s breadth of services, cost effective pricing options, granular security controls, and technical support. As a first step, Cinnamon AI migrated all their ML workloads from on-premises environments and other cloud providers onto AWS. Next, the team optimized their EC2 usage and started using Amazon SageMaker to train their ML models. More recently, they started using the Managed Spot Training feature to use Spot Instances for training, which helped them optimize their cost profile significantly.

“The Managed Spot Training feature of Amazon SageMaker has had a profound impact on our AWS cost savings. Our AWS EC2 costs reduced by up to 70% after using Managed Spot Training,” said Tetsuya Saito, General Manager of Infrastructure and Information Security Office at Cinnamon AI. “In addition, Managed Spot Training does not require complicated methods and can be used simply from the SageMaker SDK.”

The following graph shows Cinnamon AI’s model training cost savings journey over six months. In June 2019, after moving their ML workloads onto AWS, they started using EC2 On-Demand Instances for model training. You can use this as a point of reference from a training cost perspective. Over the next few months, they optimized their EC2 On-Demand usage mainly through instance right sizing and using GPU instances (P2, P3) for large training jobs. They also adopted Amazon SageMaker for model training with On-Demand Instances and reduced their training costs by approximately 20%. Furthermore, they saw substantial cost savings of 70% by using Managed Spot Training to use Spot Instances for model training in November 2019. Their cost optimization effort also resulted in their ability to increase the number of daily model training jobs by 40%, while maintaining a reduced cost profile.

Cinnamon AI’s Model Development Environment

As Cinnamon AI is developing multiple ML-powered products and services, their data types vary based on the application and include 2D images, audio, and text with dataset sizes ranging from 100 MB to 40 GB. They predominantly use custom deep learning models and their frameworks of choice are TensorFlow, PyTorch, and Keras. They use GPU instances for time-consuming neural network training jobs, with run times ranging from a few hours to days, and use CPU instances for smaller model training experiments.

The following architecture depicts Cinnamon AI’s ML environment and workflow at a high level.

The AI researchers develop code on their workstations and then synchronize it to a shared always-on EC2 server (On-Demand instance). This instance is used to call Amazon SageMaker local mode to run, test, and debug their scripts and models on small datasets. After the executed code is tested, it is packaged into a Docker image and stored in Amazon ECR. This enables researchers to share their work across teams and to pull the required Docker image from ECR on their respective Amazon SageMaker training instances. Also, on the EC2 server, the researchers can use the Amazon SageMaker Python SDK to initialize an Amazon SageMaker Estimator and then launch training jobs in Amazon SageMaker.

Almost all of Cinnamon AI’s training jobs run on Spot Instances in Amazon SageMaker via Managed Spot Training, with checkpointing enabled to save the state of the models. Amazon SageMaker saves the checkpoints to Amazon S3 while the training is in progress, and they can use them to resume training in events of Spot interruptions. In addition to S3, Cinnamon AI uses Amazon FSx for Lustre to feed data to Amazon SageMaker for training ML models. Using Amazon FSx for Lustre has reduced the data loading time to the SageMaker training instance compared to directly loading the data from S3. They can access the data in S3 and Amazon FSx for Lustre by both the EC2 instance and the SageMaker training instance. Amazon SageMaker publishes training metrics to Amazon CloudWatch, which Cinnamon AI researchers use to monitor their training jobs.

Conclusion

Managed Spot Training is a great way to optimize model training costs for jobs with flexible starting times and run durations. The Cinnamon AI team has successfully taken advantage of the cost-saving strategies with Amazon SageMaker and has increased the number of daily experiments and reduced training costs by 70%. If you are not using Spot Instances for model training, try out Managed Spot Training. For more information, see Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs. You can get started with Amazon SageMaker here.

 


About the Authors

Sundar Ranganathan is a Principal Business Development Manager on the EC2 team focusing on EC2 Spot for AI/ML in addition to Big Data, Containers, and DevOps workloads. His experience includes leadership roles in product management and product development at NetApp, Micron Technology, Qualcomm, and Mentor Graphics.

 

 

Yoshitaka Haribara, Ph.D., is a Startup Solutions Architect in Japan focusing on machine learning workloads. He helped Cinnamon AI to migrate their workload on to Amazon SageMaker.

 

 

Additional contributions by by Shoko Utsunomiya, a Senior Solution Architect at AWS.

[D] Any advice to publish a research paper ?

Hello everyone,

I’m new using reddit but seems to be a very interesting to place to ask questions.

Basically I was doing a research process during the first year of my PhD, and now I have some interesting results that I would like to send to some conference. I was doing my research in the use of a new reduced precision numerical data type to train deep learning models (CNN, RNN) without accuracy penalties. The numerical data format is not implemented on hardware right now but probably during this year some processors will implement it.

Any advice about some conference to send my paper ? The idea is send it to some conference having the deadline between January-March.

Thanks for your help.

submitted by /u/kala855
[link] [comments]

[D] Best way to represent cross reference matrix in a graph database?

Not sure if this is the best place to ask this, but let’s say I have a ginormous cross reference matrix file, where the first row and column represent existing nodes. So then any cells they have in common represent new relationship nodes. What’s the best way to incorporating these new nodes into a graph? Assuming I don’t really want to create these new nodes.

The reason is because I’d like to keep the graph condense and maintainable, and creating all of the relationship nodes can make the graph unruly and hard to look at. Is there some structure I can store the data in, like a lookup table of sort for property graphs/Neo4j?

submitted by /u/slowestflashalive
[link] [comments]

[P]Basic Feed Forward Neural Network Library in C++

Well it’s actually really basic library which I just made for fun.

I tested it with MNIST_Digits (acc = 0.95). {784,250,10 with sigmoid and 0.3 learning rate}

I’ve attached demo of XOR gate training.

check out library on github

Any feedback from you guys are highly appreciated !

P.s. I’m new at both GitHub and Machine Learning so bear with me.

submitted by /u/deepraval2905
[link] [comments]

Best tool for large-scale image processing

In the early 2010s, I actively used Hadoop / Hive and HBase for large-scale data processing. Since then, I’ve been somewhat out of the loop, except for using Spark infrequently. I am now wondering what would be the best open source software for storing a very large image dataset (100s of terabytes if not multiple petabytes) on commodity hardware. The reason I post this here is that the objective will be to run ML algorithms over subsets of the images in this dataset. Thus, it would be desirable to execute ML code in situ, if possible. For my purposes, it’s also safe to assume that writes are fairly infrequent.

submitted by /u/bissias
[link] [comments]

[N] Henry AI Labs on YouTube

https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw/videos

This YouTube channel deserves more attention IMO. I don’t know who it is but it started pretty recently and features frequent updates with mid-level overviews of recent ML research papers. I like having it on in the evening as a healthier alternative to the random crap* I binge on YouTube when I get off work.

*as in: longplays of 90s video games and pensioners eating watermelons ¯_(ツ)_/¯

submitted by /u/carlthome
[link] [comments]