The combination of sensors such as digital cameras and computing devices are compositional of current vision systems. The problem with such systems is that they perceive the environment only after the visual information has been recorded and transmitted. The information being sent also tends to include a fair amount of data that is irrelevant to a user, either machine or human; in an autonomous vehicle, a system may capture details about leaves on a tree on the side of the road. Such details consume power and processing time, reducing the efficiency of the system.
A Convolutional Neural Network (CNN) on the SCAMP-5D vision system classifying hand gestures at 8,200 frames per second. Courtesy of University of Bristol.
“We can borrow inspiration from the way natural systems process the visual world — we do not perceive everything — our eyes and our brains work together to make sense of the world, and in some cases, the eyes themselves do processing to help the brain reduce what is not relevant,” said Walterio Mayol-Cuevas, professor in robotics, computer vision, and mobile systems at the University of Bristol and principal investigator on the project.
The collaboration yielded two papers on the subject — one led by Laurie Bose and the other led by Yanan Liu — and two refinements toward the goal of more efficient intelligent cameras. The researchers implemented CNNs (convolutional neural networks) directly on the image plane. The CNNs developed by the team classified frames at thousands of times per second without ever needing to record them or send them down the processing pipeline. The researchers demonstrated the technology by classifying handwritten numbers, hand gestures, and even plankton.
The work was made possible due to the SCAMP architecture developed by Piotr Dudek, professor of circuits and systems at the University of Manchester, and his team. SCAMP is a camera-processor chip that the team has described as a PPA (pixel processor array). A PPA has a processor embedded in each pixel that allows them to communicate and process in a truly parallel form — ideal for CNNs and vision algorithms.
SCAMP-5d vision system. Courtesy of The University of Manchester.
“Integration of sensing, processing, and memory at the pixel level is not only enabling high-performance, low-latency systems, but also promises low-power, highly efficient hardware,” Dudek said. “SCAMP devices can be implemented with footprints similar to current camera sensors, but with the ability to have a general purpose massively parallel processor right at the point of image capture.”
“What is so exciting about these cameras is not only the newly emerging machine learning capability, but the speed at which they run and the lightweight configuration,” said Tom Richardson, senior lecturer in flight mechanics at the University of Bristol. Richardson has been working to integrate the technology into lightweight drones.
The research could lead to intelligent dedicated AI cameras, visual systems that can send high-level information to the rest of the system, such as the type of object or event taking place in front of the camera. This approach could make systems more efficient and secure as no images would need to be recorded.
The research papers were presented at the European Conference on Computer Vision 2020. Demos and videos are accessible here.