Dangerous flaws in computer vision

And how to fix them

Computer vision in the context of Artificial Intelligence consists of several common tasks such as image classification, object detection, semantic segmentation, and instance segmentation. Even if the tasks are different they rely on the same underlying technology called deep neural networks.

Easily Fooled

Deep neural networks are very powerful, but can be easily fooled and confused by adversarial attacks. To the extent that researchers at Northeastern, IBM, and the Massachusetts Institute of Technology, created a t-shirt that makes people invisible to a large number of computer vision systems (source).


t-shirt that makes one invincible to computer vision
T-shirt that makes the wearer invisible to some computer vision algorithms. source
This could make it possible to bypass security simply by wearing a special t-shirt. It may sound like James Bond stuff, but in reality, this has wider implications. It makes it very possible to sabotage AI models and cause dangerous situations. A few years ago a group of researchers showed how to fool a self-driving car by putting a sticker on a stop sign, causing the car to run right through the stop without slowing down (source).

Both of these cases were explicitly targeted at beating the AI system, but that does not exclude the possibility of saying a child wearing a colorful t-shirt wandering out into the street and self-driving cars not being able to detect him.

Modifications to a stop sign that caused the AI model to interpret it as a speed limit of 45, source

Why is this the case?

Computer systems today are trained using a method called maximum likelihood estimation (MLE). This has the consequence that the model becomes overconfident based on the presence of just a few key features in the image, not the collection of all the features.
To see an explanation of this, check out our blog post on the need for uncertainty in medical imaging. It shows how a simple model that is trained to classify handwritten digits, mistakes the letter X for an 8 simply because there are two crossing lines in the middle of an 8 that match the middle of an X.

Today's approach to getting around this issue is training the models with images that are distorted in various ways, hoping that exposure to this will make the model more robust. This works to some extent, but it relies on guessing the correct distortions the AI will be exposed to, it does not solve the fundamental problem.

How to solve this problem

The goal should not be simply the highest accuracy possible, but a robust model that can say what it is sure about and when it is unsure. With this knowledge of the uncertainty, one gets a whole new dimension of information to use in the decision-making process. The model may still not pick up the person wearing the t-shirt mentioned above, but it would show a big spike in uncertainty that can be acted upon.

The way to introduce this uncertainty estimation is to move beyond the shortcut of maximum likelihood estimation and start using a field of statistics called Bayesian inference. In reality, maximum likelihood estimation originates from Bayesian inference, where one has made a series of assumptions to simplify the problem. Using Bayesian Inference ensures a robust model and enables proper uncertainty estimation directly. This means that if there is data that is very different from what it has seen before, it gets more uncertain. Also if it sees something that could have multiple possible interpretations, it gets more uncertain to signify this.


Here is a video we made showing how uncertainty adds a new dimension of information to an autonomous vehicle. On the left is the normal video, and in the middle is the standard semantic segmentation of the video, where the color overlays indicate what it is classified as. On the right is the uncertainty estimation, the darker the more certain the model is and the lighter colors signify the areas the model is uncertain about.

Read More