1 minute read

As a student of a Master in Computer Vision, I decided to leverage my background in electronics and seek for a research group that would be working on real-time image processing, and so, my professor of HLS methods for FPGAs, course I took in Hungary, suggested me to work on real-time inference of deep neural networks for edge devices.

Intelligent IoT requires power-efficient DNN models. -- Credit: Pixabay/CC0 Public Domain

Edge devices usually are restrained in computing power and evergy availability, therefore, it is often inpractical to port full floating point DNN models that run on power-hungry GPUs to these constrained plattforms. One common option is to reduce numerical precision of the operations carried on for inference, by quantizing weights and activations, making them cheaper to compute.

An extreme case of quantization is the deployment of Binary Neural Networks (BNNs), on which the aritmetic operations conducted on a neural network are simplified to the extent on which weights and activations are of binary nature. A method for training Binary Neural Networks was first proposed by Bengio et al. (2016). Under this paradigm, it was proved by Umuroglu et al. (2016) that matrix-vector multiplications, the backbone operation of deep learning inference, can be replaced for logical XORs and popcount operations. This is especially cheap and highly parallelizable to compute on an FPGA, for which this kind of neural networks are very promising, achieving high thruoutput and energy efficiency.

One of the challenges we faced was that during the second semester in Spain, we wouldn’t have access to any suitable FPGA plattform so I decided to buy my own. It turned out to be the smallest FPGA System-on-Chip in the market; the lack of programmable logic real state would prove to be challenging . . . but I learned so much while trying to circunvent my board’s limitations!

You can learn more about my work on BNNs in this article I published in Hackster.io.