[P][R] ML in the embedded world
The last couple weeks I’ve started experimenting with ML. As an electronic engineer I’m focus on the embedded domain and the last years on the embedded Linux domain. The last few months the semiconductor industry has turned to ML (they like to call it AI) and from now on almost all the new CPUs and MCUs are coming out with some kind of AI accelerator. The software support for that HW is still quite bad though, so there is plenty of HW and no SW, but it will get better in the future, I guess.
That said, I though that it was the right time to get involved and I wanted to experiment with ML in the low embedded and the Linux embedded domain, providing some real-working examples and source code for everything. The result, was a series of 5 blog posts which I’ll list here with a brief description for each one.
- [ML on embedded part 1]: In this post there’s an implementation of a naive implementation of 3-input, 1-output neuron that is benchmarked in various MCUs (stm32f103, stm32f746, arduino nano, arduino leonardo, arduino due, teensy 3.2, teensy 3.5 and the esp8266.
- [ML on embedded part 2]: In this post I’ve implemented another naive NN with 3-input, 32-hidden, 1-output. Again the same MCUs where tested.
- [ML on embedded part 3]: Here I’ve ported tensorflow lite for microcontrollers to build with cmake for the stm32f746 and I’ve also ported a MNIST keras model I’ve found from a book to tflite. I’ve also created a jupyter notebook that you can hand-draw a digit and then from within the notebook run the inference on the stm32.
- [ML on embedded part 4]: After the results I got from part 3, I thought it would be interesting to benchmark ST’s x-cube-ai framework to do 1-to-1 comparisons with tflite-micro on the same model and MCU.
- [ML on embedded part 5]: As all the previous posts were about edge ML, I’ve implemented a cloud acceleration server using a Jetson nano and I developed a simple TCP server with python that also runs inferences in the same tflite model that I’ve used also in part 3 & 4. Then I’ve written a simple firmware for the ESP8266 to send random input arrays serialized with flatbuffers to the “AI cloud server” via TCP and then get the result. I’ve run some benchmarks and did some comparisons with the edge implementation.
Although these are more interesting for embedded engineers, I think it also fits in here.