Image annotation enables computers to better interpret images and video, and the technique of image annotation can be applied to many different tasks and disciplines. Some of the main applications of image annotation include use cases in areas like:
Let’s take a close look at image annotation to better understand some of the main applications of image annotation, as well as the various image annotation techniques that support them.
Before we dive into examining applications of image annotation, it is important to take a second and make sure that we understand what image annotation is.
Image annotation is the process of adding new metadata to existing data. In the case of computer vision this typically means surrounding important objects with structures like bounding boxes, which helps the computer recognize similar objects in the future. Image annotation speeds up the process of pattern recognition when a computer vision system is presented with new data, and this useful technique has many different applications.
Image annotation is typically done by humans, and it can be very time intensive, although crowdsourcing the process can help speed it up. There are also some semi-autonomous systems that can automatically label aspects of images.
Different techniques exist to annotate images, with each of the techniques possessing their own specific use cases. Image annotation techniques include bounding boxes, polygonal bounding, keypoint creation, semantic segmentation, instance segmentation, and lines/splines.
Before covering the main applications of image annotation, let’s quickly examine some of the different techniques used in image annotation.
Bounding boxes are one of the primary image annotation methods. Bounding boxes are created around the object at a certain location and at a certain frame, and they assist in the general recognition of objects. Bounding boxes show the network where to look to discover an object, assisting the network in finding the relevant patterns needed to recognize an object.
Polygonal annotation is similar in concept to bounding boxes, but it uses many polygons and provides tighter, more accurate information about the size and shape of the object in question. Cuboidal tracking takes the concept of bounding boxes and applies it to a 3D space instead of a 2D plane. This enables the network to discern an object’s general volume and its position within space.
Keypoint tracking is concerned with mainly the outermost portion of an object, as it is intended to help the network determine the size and positioning of the object. The required/most important parts of the object are keyed in the image. If one was tracking a car, the key points would be things like wheels, side mirrors, and headlights.
Semantic segmentation refers to the process of dividing an image up into many segments/regions based off of an object/region’s semantic meaning. For instance, an image of a park would have “trees”, “sidewalk”, and “grass” separated into different regions, as these things have different semantic definitions.
Adjusting the threshold of a semantic segmentation algorithm determines how picky the algorithm is in separating regions from one another. Instance segmentation takes the concept of semantic segmentation and scales it up. Whereas semantic segmentation has large regions of an image divided up, instance segmentation divides images up into different objects, recognizing individual objects in an image (out of the set of object classes the network has been told to look for).
Finally, annotating particular lines or splines within an image allows computer vision systems to pay attention to divisions between important regions of an image. If the goal is to get a computer vision system to be aware of boundaries, annotating lines or splines is useful. Typically just the pixels that divide one region from another region are annotated.
Now that we’ve taken a look at the different types of algorithms used to create image annotations, we can take a closer look at the different applications of image annotation.
Face recognition is a common computer vision and therefore a common application of image annotation. Face recognition involves getting a computer vision system to extract the relevant features from an image of a human face and discriminate between human faces and other objects. Face recognition is also used to distinguish images of one person from images of another person.
Face recognition algorithms can be enhanced by image annotation techniques which track points on the human face, frequently tracking dozens of different points in different parts of the face like the chin, ears, eyes, and mouth. These facial landmarks are annotated and provided to the image classification system. Good face recognition algorithms backed by well annotated data can deal with problems like faces occluded by objects, poor lighting, and partial images of faces.
Robotics is another main application of image annotation. Industrial robots often need to carry out tasks like transporting items from one place to another and navigate areas that can be full of people and objects. Image annotation helps a robot equipped with a camera distinguish different types of objects, to know which items it needs to pick up. Line annotation can also be used to help robots stay within certain areas or distinguish between different parts of a production line.
Image annotation can also be used to facilitate the detection and extraction of text from images. Bounding boxes are used to specify where in an image text is located, and then a text detection algorithm will make representations of both the entire string and individual words from the string.
Bounding boxes can be used this way to extract text from images of different locations, like pulling text from street signs. Bounding boxes can also be employed to handle dense text situations, like the extraction of words from a text document where each box represents a different paragraph. Semantic segmentation can also be used to distinguish regions of text from non-text areas in images.
Autonomous and semi-autonomous vehicles make use of multiple types of image annotation. The creation of bounding boxes is used to train the car’s AI to recognize many different types of objects, like animals, trees, street signs, and of course other cars. Semantic segmentation is also used for recognizing discrete regions of the environment, like determining what part of the surrounding area is the sidewalk. Semantic segmentation helps the network recognize these regions with pixel-level accuracy. Line annotation is also used to help the vehicle distinguish between lanes on the road.
Security systems that make use of surveillance cameras can use image annotation to help automatically detect and flag suspicious activity or insert useful information like number of people in an area into a database along with the footage. Bounding boxes can be drawn to distinguish people from other objects in the vicinity, and if people are detected where they aren’t supposed to be or in an area for a suspiciously long time, the footage can be flagged.
Objects can also be tracked by security systems benefiting from image annotation, flagging items like suspicious bags that have been left in an area. Semantic segmentation can be used to divide areas in video into different sectors, where one area may be restricted and another isn’t.
While the applications of image annotation listed above are some of the main application for image annotation, there are also other applications for image annotation. These applications are seeing large growth rates and increased adoption.
Agriculture Technology, or AgTech is a rapidly growing industry, and computer vision systems making use of image annotation have been adopted for various agricultural tasks. One of the primary ways that image annotation is used by the AgTech industry is in detecting plant diseases. Images of fields and individual crops can be given to an AI classifier, trained to recognize images of both healthy and diseased crops/ fields through the use of bounding boxes and semantic segmentation.
As the number of images in digital datasets and databases grows exponentially, automated image indexing and image retrieval systems are becoming more useful. Image retrieval solutions typically use one of two schemes: annotation-based image retrieval (ABIR) or content-based image retrieval (CBIR).
While CBIR systems use visual features like the color and location of objects to retrieve images, ABIR relies on annotations. Automatic image annotation methods that utilize bounding boxes, semantic segmentation, and instance segmentation are used to train image recognition systems, supplementing the recognition of low-level image features with textual annotations.
While the applications listed above are some of the main applications for image annotation, there are many more applications not covered here. Regardless of how you intended to use image annotation in your computer vision system, it’s important to pick the right kind of annotation and to properly annotate your data. If you’re having difficulty with image annotation, consider outsourcing your annotation to trained professionals who can guarantee your images are annotated properly, optimizing the performance of your image classifier.