As I am surveying the work published on disentanglement at NEURIPS — 2019, I would like to share some of the cheat-sheets, video lecture playlist and a github repo I have curated in this regard with the ML community here. Hope you find this useful in some way!
I am attempting to implement my own DQN without looking at the source code. I am torn between two possible approaches for evaluating the loss.
The basic approach could be to take the gradient over the selected action and to set the gradient of all other action value outputs to 0. My worry is that changes to improve the Q-value of the selected action will incidentally change the outputs for Q-values of other actions, meaning our other estimates will become less accurate. One way to solve this would be to set the labels for the non-chosen actions to be the current output of the Q-network, so that the network is incentivized not to change the output values of these actions. However, I have not seen this approach very much on the forums I’ve checked so I’m assuming it is bad for some reason.
Can anyone shed some light on which approach is better to take?
BERT is at work in Europe, tackling natural-language processing jobs in multiple industries and languages with help from NVIDIA’s products and partners.
The AI model formally known as Bidirectional Encoder Representations from Transformers debuted just last year as a state-of-the-art approach to machine learning for text. Though new, BERT is already finding use in avionics, finance, semiconductor and telecom companies on the continent, said developers optimizing it for German and Swedish.
“There are so many use cases for BERT because text is one of the most common data types companies have,” said Anders Arpteg, head of research for Peltarion, a Stockholm-based developer that aims to make the latest AI techniques such as BERT inexpensive and easy for companies to adopt.
Natural-language processing will outpace today’s AI work in computer vision because “text has way more apps than images — we started our company on that hypothesis,” said Milos Rusic, chief executive of deepset in Berlin. He called BERT “a revolution, a milestone we bet on.”
Deepset is working with PricewaterhouseCoopers to create a system that uses BERT to help strategists at a chip maker query piles of annual reports and market data for key insights. In another project, a manufacturing company is using NLP to search technical documents to speed maintenance of their products and predict needed repairs.
Peltarion, a member of NVIDIA’s Inception program that nurtures startups with access to its technology and ecosystem, packed support for BERT into its tools in November. It is already using NLP to help a large telecom company automate parts of its process for responding to product and service requests. And it’s using the technology to let a large market research company more easily query its database of surveys.
Work in Localization
Peltarion is collaborating with three other organizations on a three-year, government-backed project to optimize BERT for Swedish. Interestingly, a new model from Facebook called XLM-R suggests training on multiple languages at once could be more effective than optimizing for just one.
“In our initial results, XLM-R, which Facebook trained on 100 languages at once, outperformed a vanilla version of BERT trained for Swedish by a significant amount,” said Arpteg, whose team is preparing a paper on their analysis.
Nevertheless, the group hopes to have before summer a first version of a Swedish BERT model that performs really well, said Arpteg, who headed up an AI research group at Spotify before joining Peltarion three years ago.
An analysis by deepset of its German version of BERT.
In June, deepset released as open source a version of BERT optimized for German. Although its performance is only a couple percentage points ahead of the original model, two winners in an annual NLP competition in Germany used the deepset model.
Right Tool for the Job
BERT also benefits from optimizations for specific tasks such as text classification, question answering and sentiment analysis, said Arpteg. Peltarion researchers plans to publish in 2020 results of an analysis of gains from tuning BERT for areas with their own vocabularies such as medicine and legal.
The question-answering task has become so strategic for deepset it created Haystack, a version of its FARM transfer-learning framework to handle the job.
In hardware, the latest NVIDIA GPUs are among the favorite tools both companies use to tame big NLP models. That’s not surprising given NVIDIA recently broke records lowering BERT training time.
“The vanilla BERT has 100 million parameters and XML-R has 270 million,” said Arpteg, whose team recently purchased systems using NVIDIA Quadro and TITAN GPUs with up to 48GB of memory. It also has access to NVIDIA DGX-1 servers because “for training language models from scratch, we need these super-fast systems,” he said.
More memory is better, said Rusic, whose German BERT models weigh in at 400MB. Deepset taps into NVIDIA V100 Tensor Core 100 GPUs on cloud services and uses another NVIDIA GPU locally.
Let’s say we have a simple “autoencoding transformer” architecture:
encoder
bottleneck (Z)
decoder
We can train the model either using:
the Masked Language Model objective, where we mask random inputs / replace them with a null token, and measure the loss on reconstruction of the masked inputs
or the Autoencoding objective, where we don’t mask anything, and measure the loss on reconstruction of all inputs
Now we ask about the properties of Z – the latent representation of the data, after the model is trained. Will Z differ between the two objectives? How will it differ? Will it capture different information? Which loss will preserve more information in Z?
Does this have an obvious interpretation? Any intuitions?
I needed to blur faces on the video dataset I collected previously for privacy issues.
Surprisingly, I couldn’t find related (free)services nor programs.
Firstly, I wrote simple code for automatic blurred face maker with a Haar detector using OpenCV. I didn’t like the result. Haar detector was fast but not accurate enough for me.
Then, I tried the DNN model which OpenCV already includes. The performance got better but because the input of DNN is 300×300 and my input video is 1080×1920, faces from long-distance were often ignored.
So, I split the input video 3 to 5 divisions horizontally and vertically and computed face heatmap using the confidence of each detection result. Final face detection was done by contour detection from OpenCV.
The result seems that it can detect small faces from relatively high-resolution videos.
Any suggestions or related papers? would be more than welcome!
2019 was the year deep-learning driven AI made the leap from the rarefied world of elite computer scientists and the world’s biggest web companies to the rest of us.
Everyone from startups to hobbyists to researchers are picking up powerful GPUs and putting this new kind of computing to work. And they’re doing amazing things.
And with NVIDIA’s AI Podcast, now in its third year, we’re bringing the people behind these wonders to more listeners than ever, with the podcast reaching more than 650,000 downloads in 2019.
Here are the episodes that were listener favorites in 2019.
A Man, a GAN, and a 1080 Ti: How Jason Antic Created ‘De-Oldify’
You don’t need to be an academic or to work for a big company to get into deep learning. You can just be a guy with a NVIDIA GeForce 1080 Ti and a generative adversarial network. Jason Antic, who describes himself as “a software guy,” began digging deep into GANs. Next thing you know, he’s created an increasingly popular tool that colors old black-and-white shots. Interested in digging into AI for yourself? Listen and get inspired.
Sort Circuit: How GPUs Helped One Man Conquer His Lego Pile
At some point in life, every man faces the same great challenge: sorting out his children’s Lego pile. Thanks to GPU-driven deep learning, Francisco “Paco” Garcia is one of the few men who can say they’ve conquered it. Here’s how.
UC Berkeley’s Pieter Abbeel on How Deep Learning Will Help Robots Learn
Robots can do amazing things. Compare even the most advanced robots to a three-year-old, however, and they can come up short. UC Berkeley Professor Pieter Abbeel has pioneered the idea that deep learning could be the key to bridging that gap: creating robots that can learn how to move through the world more fluidly and naturally. We caught up with Abbeel, who is director of the Berkeley Robot Learning Lab and cofounder of Covariant AI, a Bay Area company developing AI software that makes it easy to teach robots new and complex skills, at GTC 2019.
How the Breakthrough Listen Harnessed AI in the Search for Aliens
UC Berkeley’s Gerry Zhang talks about his work using deep learning to analyze signals from space for signs of intelligent extraterrestrial civilizations. And while we haven’t found aliens, yet, the doctoral student has already made some extraordinary discoveries.
How AI Helps GOAT Keep Sneakerheads a Step Ahead
GOAT Group helps sneaker enthusiasts get their hands on authentic Air Jordans, Yeezys and a variety of old-school kicks with the help of AI. Michael Hall, director of data at GOAT Group, explains how in a conversation with AI Podcast host and raging sneakerhead Noah Kravitz.
Doing high quality, original work is what ultimately pushes society forward and improves people’s lives. If people started thinking that if they put something high quality and original on Github, a random youtuber would come along and claim it as their own work, then people might just start putting fewer things on GitHub to begin with, and the world would be a worse place. That’s why its so critical to not plagiarize. The ideal world we want to live in is one where the people who actually do high quality & original work are the ones who get the credit. It took my reputation blowing up in my face for me to realize that, both in the Neural Qubit paper case & my content more generally. I hope my painful fall serves as a valuable lesson to everyone else. This is my apology video.
He put together a list of GitHub repos he took code from to produce his videos.
With or without the aid of deep/machine learning, could this shift in paradigm actually solve a game like chess? As in, determine if the game is won or lost for either army at the very start. If so, how so? What’s the principle behind how it could solve it? I mean, it’s fairly clear how present encryption would be compromised. Checkers was solved over a decade ago without the use of a quantum computer, by the way.