I want to use technologies like TensorRT Inference Server or Tensorflow Serving to create a microservice architecture for analyzing video content using deeplearning models (CNN).
I have doubts on these points :
- What is the best way to store video files?
- What is the best way to extract frames and pass them to Tensorflow Inference Serving or TensorRT?
- Do I need a message broker like Apache Kafka? Can I use directly HTTP or gRPC to pass extracted frames to each microservice?
I searched a lot but I never found a similar architecture or some guidelines to manage inference on video files. Any advice will be appreciated, thanks in advance.
Spiking neural networks are not currently the focus of most machine learning researchers, but there are several reasons why they are of interest:
Spike based communication is believed to be the primary way in which biological neutrons interact, so spiking neuron models are of interest to computational neuroscientists.
Special purpose hardware (often called brain-inspired or neuromorphic hardware) can potentially deliver better power / performance numbers than deep learning hardware accelerators.
With that being said there are few modern machine learning focussed libraries available to explore spiking neural networks. We are in the early stages of creating one based on PyTorch (https://github.com/norse/norse). What we’ve publicly published is enough to explore supervised learning on small datasets like MNIST and CIFAR-10.
Any feedback or comments would be appreciated. Also happy to discuss related state of the art research.
I’ve recently started looking into speech synthesis, and notice that most of the focus is on text-to-speech.
I haven’t had much like finding anything on speech-to-speech – that is, changing the voice of an audio clip to that of another person (e.g. by passing a voice embedding as an input to the model). Not sure what the actual term is for it. Is there much happening in this space, and if so, any recommendations on where to start? While not broadly applicable, it seems (on the surface) like it’d be a lot easier than TTS.
[Research] UCL Professor & MIT/ Princeton ML Researchers Create YouTube Series on ML/ RL — Bringing You Up To Speed With SOTA.
We started a new youtube channel dedicated to machine learning. For now, we have four videos introducing machine learning some maths and deep RL. We are planning to grow this with various interesting topics including, optimisation, deep RL, probabilistic modelling, normalising flows, deep learning, and many others. We also appreciate feedback on topics that you guys would like to hear about so we can make videos dedicated to that. Check it out here: https://www.youtube.com/channel/UC4lM4hz_v5ixNjK54UwPEVw/
and tell us what you want to hear about 😀 Thanks!!
Now, who is we: I am an honorary lecturer at UCL with 12 years of expertise in machine learning, and colleagues include MIT, Penn, and UCL graduates;
National Pothole Day is Jan. 15. Its timing is no accident.
All over the Northern hemisphere, potholes are at their suspension-wrecking, spine-shaking worst this month.
Thanks to AI, one startup is working all year long to alleviate this menace. Benjamin Schmidt, president and co-founder of RoadBotics, is using the tech to pave the way to better roads.
His startup is identifying areas at risk of potholes, so city governments can improve roads before damage worsens.
Schmidt spoke with AI Podcast host Noah Kravitz about how RoadBotics is working with over 160 governments across the world to collect and analyze video data to improve preventative maintenance.
Key Points From This Episode:
- Using smartphones placed against car windshields, RoadBotics collects and analyzes video data to assign each road a score, which local governments can use to inform infrastructure decisions.
- RoadBotics protects privacy by blurring people, cars and other sensitive data so only roads are analyzed.
- Early this year, RoadBotics will be release an app so anyone can use smartphones to collect data and submit to their neural network to help improve analysis.
“The sooner you can detect [surface distresses], the sooner you can put a cheaper intervention in now that really just saves the life of the road.” — Benjamin Schmidt [5:00]
“RoadBotics was founded at exactly the right moment with the right tech, the right hardware. So we’re now in this sweet spot where we can actually deploy a solution” — Benjamin Schmidt [6:46]
You Might Also Like
How Deep Learning Will Reshape Our Cities
Lynn Richards, president and CEO of the Congress for New Urbanism, and Charles Marohn, president and co-founder of Strong Towns, weigh in on the benefits of using AI to design cities, and simulating designs in VR prior to construction.
How AI Will Revolutionize Driving
Danny Shapiro, senior director of automotive at NVIDIA, explains the capabilities necessary for autonomous driving, from object detection to AI to high performance computing.
Where Is Deep Learning Going Next?
Bryan Catanzaro, head of applied deep learning research at NVIDIA, explains his journey in AI from UC Berkeley, to Baidu, to NVIDIA. He’s striving for AI that works so seamlessly that users don’t even notice it, and he explains how GPUs are helping to make that happen.
The post AI’s Mild Ride: RoadBotics Puts AI on Pothole Patrol appeared first on The Official NVIDIA Blog.
I’m a student currently trying to create a DL model that is able to identify a fox in an image. The initial plan was to have a robot detect a fox and chase it. As a starting point, i’d want to just have a DL model that actually is able to identify a fox!
The biggest problem at the moment is gathering training data, does anyone have any advice where or how I can get many pictures of foxes to use for training? Would using videos and splitting it down into frames work? I initially wanted to use night vision footage as the device would work at night, is there any way to convert normal pictures to nightvision?
Any advice would be appreciated, thanks 🙂
Let’s say we want to translate between two sequences that share the same vocabulary.
We assume that the vocabulary is: V = [A,B,C,D,E,F,G]
We have this parallel data:
Source: [A B C C , A F G]
Target: [E B C C, E F G]
This was just an example
It we want to represent any sequence. We can use a vector that contains the counts of each element from the vocabulary.
So A B C C = [1,1,2,0,0,0,0]
A F G = [1,0,0,0,0,1,1]
E B C C = [0,1,2,0,1,0,0]
E F G = [0,0,0,0,1,1,1]
As we said that A B C C = E B C C and A F G = E F G, then their vectors must be the same to some extent. Like we can have something like this:
A B C C = E B C C = [1,1,2,0,1,0,0]
The first idea was to train a seq2seq model and try to extract the encoder mapping representation of the sequence. But it looks that the encoder encode just the source sequence representation not the mapping.
Is there any algorithms that can perform this task?
I am developing an image to text conversion system using EAST text detector and pytesseract.
For tilted text portions in the image, I have found out the tilt angle and start point of the text bound and used the following code to rotate the image through the given angle about a given pivot point.
def rotateImage(img, angle, pivot): padX =[img.shape- pivot, pivot] padY =[img.shape- pivot, pivot] imgP =np.pad(img, [padX,padY], ‘constant’) imgR =scipy.ndimage.rotate(imgP, angle, reshape=False) return imgR[padY: -padY, padX : -padX]
the angle and pivot are given as int values but even then at time im getting a typeerror saying pad_width must be of integral type
what is the problem here and how to solve it