Image annotation companies are constantly investing in new ways to reduce the amount of time needed to annotate a database full of images. One of the most powerful image annotation techniques is automatic image annotation, which can drastically speed up the annotation process. However, there are drawbacks to the use of automatic image annotation techniques, and human annotators still have their roles to play.
What are the pros and cons of using automatic image annotation? What are the pros and cons of using human annotators?
Let’s take a closer look at the image annotation process and automatic image annotation in order to understand the answers to these questions.
Good image annotations need to be accurate, and they need to be accurate in two senses. Objects must be accurately labeled with the correct class and the pixels that contain the object must be selected with precision/accuracy.
It’s extremely important that when objects are annotated the correct label is applied to the object. If an incorrect class is applied to the object, the image classifier will take these incorrect features into account when recognizing objects in the future and the classifier’s accuracy will be degraded.
An object may receive an incorrect annotation label when there are similar objects in the dataset that belong to different classes. As an example, many cars look very similar, especially those in the same make and model that differ only by year. An annotator may mistakenly apply a label to one car that belongs to another car.
Good annotations must also be pixel accurate. The pixels that are assigned to an object should genuinely be part of the object, or else the accuracy of the image classifier will be degraded just as in the case above. If pixels which aren’t part of an object are tagged as part of the object, the classifier will take the wrong features into account when creating a representation of the object.
Inaccurate pixel labels can occur when bounding boxes include too much of the background of an image inside of their boundaries. Another case where inaccurate labels can occur are during the process of semantic segmentation, when the threshold for classifying a pixel as one object or another is improperly set and semantic regions that contain portions of other objects are created.
Automatic image annotation is carried out with the assistance of deep neural networks trained to classify images, much like those that will eventually classify the images being annotated. The type of network used in automatic image annotation is a Convolutional Neural Network or CNN. CNNs are networks that specialize in the interpretation of image data, forming representations of the pixel values within the image. The representations are created by the convolutional layers within the network, which interpret the entire image and create values that represent pixels of different colors and brightness. These representations are then given to the densely connected layers of the network which will learn the features that represent different objects.
CNNs use a function called Max Pooling to make representations of images smaller and simpler. The Max Pooling layers in a CNN select just the pixels in the image which have the greatest values, leaving behind pixels with smaller values. By applying this technique to every area of the image, a reduced version of the image is created that is easier for the neural network to analyze, even though it still contains the relevant features. Max Pooling layers play an important role in automatic image annotation, because they enable CNNs to process images with greater speed. This becomes even more important as the size of the database increases and processing time scales.
Automatic image annotation systems work by translating image data to semantic data. CNNs are used to train an image classifier to recognize objects in a target dataset, and then the image which needs to be annotated is fed into the image classifier. The image classifier extracts the visual content from the image, distinguishing relevant objects. The semantic content is then checked against a database containing objects the user wants annotated. The semantic content of the database is then mapped to the image.
While automatic image annotation techniques can produce annotations through the method described above, the annotations produced by the automatic system are often less accurate than the annotations made by human annotators. Given that this is true, why do companies use automatic image annotation techniques?
The benefit of using automatic image annotation is that autonomous annotation systems are able to create annotations much quicker than human annotators can.
Human annotators must painstakingly draw bounding boxes around objects, draw lines across images, and define semantic regions. Creating an individual annotation can be time-consuming in itself and the time invested only grows the more images there are in the target dataset. As computers don’t need to interpret an image on a screen with eyes and then move an arm to create an annotation, they can produce annotations much faster than even the best human annotators.
While computers can create annotations with greater speed than human annotators can, human annotators retain several advantages over autonomous annotation systems. The annotations produced by autonomous annotation algorithms have poor accuracy much more frequently than human-created annotations. Human annotators have an advantage when annotating similar objects that an autonomous system might be confused by. As such, while human annotations are slower the annotations they create are more likely to be high quality, properly labeled annotations.
Human annotators have another benefit over automatic annotation, they can quickly adapt to new annotation requirements. For example, if new classes of objects have to be annotated and inserted into a database, it is much easier to instruct human annotators to take this into account. In contrast, an automatic annotation platform needs to be retrained in order to account for the new image classes, which can take quite a bit of time.
Because both human annotators and automatic image annotation systems have their pros and cons, combining the two approaches can utilize the best of both worlds. Semi-automatic image annotation uses computer algorithms to help human annotators create annotations quicker, providing suggested tags to users. They also let users provide the system with relevant feedback on suggested image tags. The user-generated feedback can then be used to create more accurate keyword suggestions.