Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[P] Nearing BERT’s accuracy on Sentiment Analysis with a model 56 times smaller by Knowledge Distillation

Hello everyone,

I recently trained a tiny bidirectional LSTM model to achieve high accuracy on Stanford’s SST-2 by using knowledge distillation and data augmentation. The accuracy is comparable to BERT after fine-tuning, but the model is small enough to run at hundreds of iterations per second on a laptop CPU core. I believe this approach could be very useful since most user-devices in the world are low-power.

I believe this can also give some insight into the success of huggingface’s DistilBERT, as it seems their success doesn’t stem solely from knowledge distillation but also from the Transformer’s unique architecture and the clever way they initialize its weights.

If you have any questions or insights, please share 🙂

For more details please take a look at the article:

https://blog.floydhub.com/knowledge-distillation/

submitted by /u/alexamadoriml
[link] [comments]

New Solutions for Quantum Gravity with TensorFlow

Recent strides in machine learning (ML) research have led to the development of tools useful for research problems well beyond the realm for which they were designed. The value of these tools when applied to topics ranging from teaching robots how to throw to predicting the olfactory properties of molecules is now beginning to be realized. Inspired by advances such as these, we undertook the challenge of applying TensorFlow, a computing platform normally used for ML, to advance the understanding of fundamental physics.

Perhaps the biggest open problem in fundamental theoretical physics may be that our current understanding of quantum mechanics only includes three of the four fundamental forces — the electromagnetic, strong, and weak forces. There is currently no complete quantum theory that also includes the force of gravitation, while still matching experimental observations, i.e., an accurate model of quantum gravity.

One promising approach to a unified model that includes quantum gravity, which has survived many mathematical consistency checks, is called M-Theory, or “The Theory formerly known as Strings,” introduced in 1995 by Edward Witten. In the everyday world, we all experience four dimensions—three spatial dimensions (x, y, and z), plus time (t). M-Theory predicts that, at very short lengths, the Universe is described, instead, by eleven dimensions. But, as one can imagine, establishing the connection between the four-dimensional world that we observe and the 11-dimensional world predicted by M-theory is exceedingly difficult to solve analytically. In fact, it might require analytic manipulation of equations having more terms than there are electrons in the Universe.

This summer, we published an article in the Journal of High Energy Physics where we introduced novel ways to address such problems through creative use of ML technology. Using simplifications enabled by TensorFlow, we managed to bring the total number of known (stable or unstable) equilibrium solutions for one particular type of M-Theory spacetime geometries to 194, including a new and tachyon-free four-dimensional model universe. The geometries that we studied are special in that they are still (barely) accessible with exact calculations that do not require neglecting potentially important terms. We have also released a short instructive Google colab as well as a more powerful Python library for use in related research.

Applying TensorFlow to M-Theory
This work is predicated on a key observation that a mixed numerical and analytic approach can be more powerful than a purely analytical method. Instead of attempting to find analytic solutions with brute force, we use a numerical approach that leverages TensorFlow for the initial search for solutions to the model. This then yields hypotheses on which specific combinations can be tested and analyzed with stringent mathematical methods, ultimately proving the actual existence of a conjectured solution. This represents a novel methodology for making further progress in theoretical physics.

Conclusion
We hope that these results will be an important step in interpreting M-theory, and demonstrate how the research community can use new ML tools, such as TensorFlow, to approach other similarly complex problems. We are already applying the newly discovered methods in further theoretical physics research.

Acknowledgements
This research was conducted by Iulia M. Comşa, Moritz Firsching, and Thomas Fischbacher. Additional thanks go to Jyrki Alakuijala, Rahul Sukthankar, and Jay Yagnik for encouragement and support.

At SC19, GPU Accelerators Power Supercomputers to AI and Exascale

The Mile High City plays host next week to SC19, where GPUs will be key ingredients for computational science in some of the world’s most powerful supercomputers.

The race to AI and to exascale performance will be much of the buzz at the annual supercomputing event this year. For both, experts are relying on GPU accelerators.

In a special address Monday at 3pm MT, NVIDIA founder and Chief Executive Officer Jensen Huang will help kick off the conference. (Watch a mobile-friendly livestream here.) He’ll provide an in-depth look at the latest innovations in GPUs and how they’re transforming computational science and AI.

Modeling Brains, Earthquakes and More

A handful of demos at NVIDIA’s booth will give attendees a closeup look at how GPUs are pushing the envelope in science. NVIDIA Quadro RTX GPUs will host a visualization of an earthquake, and NVIDIA V100 Tensor Core GPUs will show a simulation of a human brain at nanometer-level resolution.

Ten partners will demo offerings using NVIDIA GPUs — ASRock Rack, Bright Computing, Boston, BOXX, Colfax, KISTI, Microway, One Stop Systems, Penguin Computing and Silicon Mechanics.

Huang’s overview is one of the first of many sessions on how GPUs can supercharge high performance computing with deep learning.

SC19 is host to three technical tracks, two panels and three invited talks that touch on AI or GPUs. For example, in one invited talk, a director from the Pacific Northwest National Laboratory will describe six top research directions to increase the impact of machine learning on scientific problems.

In another invited talk, the assistant director for AI at the White House Office of Science and Technology Policy will share the administration’s priorities in AI and HPC. She’ll detail the American AI Initiative the U.S. President announced in February.

Deep Dives in Deep Learning

A group of experts will give a deep dive Monday morning on how to tool high-performance computers for deep learning. They include senior engineers, scientists and researchers from Fraunhofer Institute, NVIDIA and Oak Ridge National Lab.

“Today we see excitement with machine learning being applied to many areas in computational science,” said Jack Dongarra, a professor at the University of Tennessee and one of three experts who maintain the TOP500 list of the world’s largest supercomputers. “As we go forward, I expect artificial intelligence to play an ever more important role in science.”

Back at NVIDIA’s in-booth theater, Marc Hamilton, vice president of solutions architecture and engineering, will kick off a slate of more than a dozen speakers, including talks from Mellanox on fast networking.

Other speakers will give updates on NVIDIA’s partnership to accelerate Arm-based supercomputers and on OpenACC, a parallel-programming model used on more than 200 applications. In a separate session Tuesday afternoon, Duncan Poole, the president of OpenACC, and a strategic partnership manager for NVIDIA, will host a birds-of-a-feather session on OpenACC.

Tracking the Race to Exascale

Meanwhile, many eyes are fixed on the exascale finish line for supercomputers able to calculate more than a quintillion floating-point operations per second or 1018 FLOPS. Getting to exascale, like breaking the petascale barrier in 2008, is a milestone in supercomputing that has recently galvanized the industry.

Arguably, the exascale era has already begun. Today’s most powerful supercomputer, the Summit system at Oak Ridge National Laboratory, has racked up a handful of exascale milestones. The 27,648 NVIDIA V100 Tensor Core GPUs in Summit can drive 3.3 exaflops of mixed-precision horsepower on AI tasks.

Harnessing some of that oomph, government and academic researchers shared the 2018 Gordon Bell Prize for using AI to determine the genetic roots of being susceptible to opioid addiction and chronic pain. Their work on one of America’s most pressing epidemics pushed the GPUs on Summit to 2.36 exaflops.

NVIDIA GPUs are now used in 125 of the TOP500 systems worldwide. Beyond Summit, they include the world’s second, sixth, eighth and 10th most muscular systems. Over the last several years, designers have increasingly relied on GPU accelerators to propel these big-iron beasts to new performance heights.

For more on NVIDIA events at SC19, check out our event page.

The post At SC19, GPU Accelerators Power Supercomputers to AI and Exascale appeared first on The Official NVIDIA Blog.

[Discussion] Creat network to predict features instead of classify ?

Am new to ML, background in signal processing and traditional CS. Have been working on classifying raw audio from my custom created datasets and I want to move into prediction and then generation(GANs)

Is there any difference between GANs and prediction ? Like if I wanted to predict the next audio sample how would my current classification CNN change? My thinking is to be able to predict the next sentence from a recorded sentence.

submitted by /u/copythatpasta
[link] [comments]

[P] Replicate Toronto BookCorpus

Hey all,

I created a small python repository called Replicate TorontoBookCorpus that one can use to replicate the no-longer-available Toronto BookCorpus (TBC) dataset.

As I’m currently doing research on transformers for my thesis, but could not find/get a copy of the original TBC dataset by any means, my only alternative was to replicate it. I figured I am not the only one with this issue, and thus made and published this small project.

As with the original TBC dataset, it only contains English-language books with at least 20k words. Furthermore, the total number of words in the replica dataset is also slightly over 0.9B. All in all, if you follow the steps outlined in the repository, you end up with a 5Gb text file with one sentence per line (and three blank sentences between books).

PS. If you have a copy of the original TBC dataset, please get in touch with me (I am desperately looking for the original)!

submitted by /u/SynonymOfHeat
[link] [comments]

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber’s team, won 4 image recognition challenges prior to AlexNet

probably many do not know this, I learned it by studying the references in section 19 of Jurgen’s very dense inaugural tweet

I knew AlexNet, the CUDA CNN by Alex Krizhevsky and Ilya Sutskever and Geoff Hinton which won ImageNet 2012, but prior to AlexNet, Jurgen’s team with his “outstanding Romanian postdoc Dan Ciresan … won 4 important computer vision competitions in a row between May 15, 2011, and September 10, 2012” with an earlier CUDA CNN, let me call this DanNet, the blog post on their miraculous year links to a summary of these contests

I saw a news article claiming that AlexNet started a deep learning revolution in 2012, but actually the references show that DanNet was the first superhuman CNN in 2011 and also won a medical imaging contest on images way bigger than AlexNet’s

the most cited DanNet paper is CVPR July 2012, 5 months before AlexNet at NIPS 2012, but earlier descriptions of DanNet appeared at IJCAI 2011 and IJCNN 2011

in his blog, Jurgen also cites CNN pioneers since Fukushima 1979, and GPU implementations of neural networks since Jung and Oh 2004

to be fair, AlexNet cites DanNet and admits that it is similar, however, it does not mention that DanNet won all those earlier challenges

ResNet beat AlexNet on ImageNet in 2015, but ResNet is actually a special case of the earlier highway networks, also invented in Jurgen’s lab, the “First Working Feedforward Networks With Over 100 Layers,” section 4 of The Blog links to an overview, he credits his students Rupesh Kumar Srivastava and Klaus Greff

there was a big reddit thread on section 5 of his blog, Jurgen’s GAN of 1990, and everybody knows LSTM, which won contests already in 2009, section 4 of The Blog, but I think many don’t know yet that his team also was first in the CUDA CNN game

submitted by /u/siddarth2947
[link] [comments]

Eni Doubles Up on GPUs for 52 Petaflops Supercomputer

Italy energy company Eni is upgrading its supercomputer with another helping of NVIDIA GPUs aimed at making it the most powerful industrial system in the world.

The news comes a little more than two weeks before SC19, the annual supercomputing event in North America. Growing adoption of GPUs as accelerators for the world’s toughest high performance computing and AI jobs will be among the hot topics at the event.

The new Eni system, dubbed HPC5, will use 7,280 NVIDIA V100 GPUs capable of delivering 52 petaflops of peak double-precision floating point performance. That’s nearly triple the performance of its previous 18 petaflops system that used 3,200 NVIDIA P100 GPUs.

When HPC5 is deployed in early 2020, Eni will have at its disposal 70 petaflops including existing systems also installed in its Green Data Center in Ferrera Erbognone, outside of Milan. The figure would put it head and shoulders above any other industrial company on the current TOP500 list of the world’s most powerful computers.

The new system will consist of 1,820 Dell EMC PowerEdge C4140 servers, each with four NVIDIA V100 GPUs and two Intel CPUs. A Mellanox InfiniBand HDR network running at 200 Gb/s will link the servers.

Green Data Center Uses Solar Power

Eni will use its expanded computing muscle to gather and analyze data across its operations. It will enhance its monitoring of oil fields, subsurface imaging and reservoir simulation and accelerate R&D in non-fossil energy sources. The data center itself is designed to be energy efficient, powered in part by a nearby solar plant.

“Our investment to strengthen our supercomputer infrastructure and to develop proprietary technologies is a crucial part of the digital transformation of Eni,” said Chief Executive Officer Claudio Descalzi in a press statement. The new system’s advanced parallel architecture and hybrid programming model will allow Eni to process seismic imagery faster, using more sophisticated algorithms.

Eni was among the first movers to adopt GPUs as accelerators. NVIDIA GPUs are now used in 125 of the fastest systems worldwide, according to the latest TOP500 list. They include the world’s most powerful system, the Summit supercomputer, as well as four others in the top 10.

Over the last several years, designers have increasingly relied on NVIDIA GPU accelerators to propel these beasts to new performance heights.

The SC19 event will be host to three paper tracks, two panels and three invited talks that touch on AI or GPUs. In one invited talk, a director from the Pacific Northwest National Laboratory will describe six top research directions to increase the impact of machine learning on scientific problems.

In another, the assistant director for AI at the White House Office of Science and Technology Policy will share the administration’s priorities in AI and HPC. She’ll detail the American AI Initiative announced in February.

The post Eni Doubles Up on GPUs for 52 Petaflops Supercomputer appeared first on The Official NVIDIA Blog.