Semantic segmentation is a sophisticated computer vision technique that enables computers to discern one region in an image from another, based on the semantic context of that region. Semantic segmentation has many applications in diverse fields like medicine, fashion, and agriculture. Let’s examine some of the most prominent applications of semantic segmentation.
Before analyzing the different applications for semantic segmentation, let's be sure that we have a good intuition for what semantic segmentation is.
Semantic segmentation is one level of analysis for computer vision systems. Computer vision applications have different levels of analysis, different levels of inference, that they can process an image with. The lowest level of analysis for a computer vision application is classification, where the computer vision system is simply expected to give an image a discrete label describing what the main object in the image is. During the course of classification, it is assumed that there is only one object in the image, and if there are multiple objects, one object is assumed to be the main object or subject of the image.
The second level of analysis is classification with localization. At this level of analysis not only is the subject of the image classified, but the computer vision system is expected to describe where in the image the object is, outputting a bounding box that identifies the location of the object. It’s assumed that there is one primary object in the image.
Object detection is the third level of analysis, where the computer vision system extends the tasks of classification and localization to multiple objects within the image, attempting to localize and classify all objects of Interest. Bounding boxes are again assigned to the objects in the image.
Semantic segmentation can be conceived of as an extension of object detection and localization but applied to every pixel of an image. Every pixel receives a label assigning it to a class. Another term for this process is dense prediction, referring to the fact that every pixel in the image is given a prediction.
Semantic segmentation doesn't just put bounding boxes and labels around more objects, it actually classifies every single pixel within an image. The output is typically a high-resolution image that usually maintains the same size as the original image. The “semantic” part of the term semantic segmentation describes how regions of the image are classified based off of their semantic meaning. So for instance, portions of an image could be labeled as grass, trees, or people.
Another type of image annotation is instance segmentation, which essentially takes the concept of semantic segmentation and applies it to every object of interest/every instance within an image. This means that instead of labeling people as one group comprising a “person” class, every individual in the image is assigned a different class, classifying every instance of that object.
In terms of designing methods of carrying out semantic segmentation, there are a few different ways the problem can be handled.
Semantic segmentation can be done with both deep learning and non-deep learning techniques. The non-deep learning methods of carrying out semantic segmentation include thresholding, K-means clustering, and edge detection.
Thresholding is the process of splitting images into a background and foreground region. Certain threshold values are selected, and this value is what separates the pixels into one of the levels, isolating the objects at that level. Binary images can be made from grayscale images thanks to thresholding, as it distinguishes the darker and lighter pixels of a color image.
K-means Clustering assigns a pixel as the central point to a group of surrounding pixels, based on the similarity of the features the surrounding pixels possess. “K” represents how many groups or classes are chosen. The central point in the cluster is repeatedly moved and the distance to surrounding pixels recalculated until the optimal placement is found, organically dividing the image up into groups as it goes.
Edge detection operates by discerning drastic changes in brightness. Edge detection typically necessitates casting the discontinuous points as large features, like curved regions or sharp edges. For instance, the boundary of a blue region and a red region is discerned with edge detection.
Most modern segmentation methods use deep learning techniques, though some of them still use aspects of the classical segmentation methods. Convolutional Neural Networks (CNNs) are the primary architecture used to do semantic or instance segmentation.
The convolutional layers in a CNN create small windows/filters of the image until the entire image has been analyzed and complete representation of the image created. After this, the representation of the image is passed into the fully connected portion of the network which makes inferences about the regions of the image.
Fully Convolutional Networks are improvements on regular CNNs, where the final output layer has a large receptive field that matches the width and height of the original image, enabling every pixel to be classified.
Because semantic segmentation assigns a label to every pixel with an image, it is an extremely powerful image annotation technique that gives computer vision systems the ability to interpret environments comprised of many different object classes. Due to this fact, semantic segmentation is employed whenever an AI system must have a complete/fine-grained understanding of the regions/images they are interacting with.
Common use cases for semantic segmentation include: robotics, autonomous driving, medical image diagnostics, and satellite image processing.
Doing semantic segmentation can help computer vision systems achieve tasks like the recognition of expressions, age recognition, and the prediction of gender of the ethnicity of individuals. Semantic segmentation enables these tasks by separating regions of the face into important attributes like the mouth, chin, nose, eyes, and hair. Effective face segmentation means controlling for factors like image resolution, lighting conditions, feature occlusion, and orientation.
LabelImg and open source and comes with Windows binaries, making it easy to install and set up. While LabelImg is easy to use, it only supports the creation of boundary boxes and no other forms of annotation.
The classification/recognition of clothes for fashion can be extremely difficult due to a large number of potential classes. In most situations, simply labeling a rack of clothes “clothes” is insufficient. General object recognition isn’t enough for e-commerce applications, a higher level of judgment is required that enables a computer vision app to distinguish between different types of clothes. In order for this to happen, semantic segmentation algorithms must be able to recognize many different classes and take variables like human poses, lighting, and occlusion into account. Good clothing semantic segmentation algorithms should also be able to handle small objects like hats, scarves, and socks.
Radiologists must analyze a number of charts and images in order to make a medical diagnosis, but the complexity of medical images, with many overlapping the body structures, can make diagnosis difficult for even trained specialists. Systems making use of semantic segmentation can help classify relevant regions of an image, making diagnostic tests easier and simpler.
Autonomous driving is an extremely complex task that needs real-time perception, analysis, and change. Semantic segmentation is used to identify objects like other cars and traffic signs and regions like road lanes and sidewalks. Instance segmentation is used in autonomous driving, as individual cars, pedestrians, signs, etc., must be tracked.
Aerial or satellite images cover a large span of land area and contain many objects. To do meaningful analysis on aerial or satellite images, sophisticated image annotation is necessary. Semantic segmentation has uses in the fields of precision agriculture and geo sensing.
Semantic segmentation is often used to analyze images of agricultural fields and determine areas where blight or parasites are harming those fields. By recognizing areas of fields where plants are being harmed by parasites or disease, interventions can happen. Another use of semantic segmentation in agriculture is the detection of weeds in fields, however, this is done with images taken much closer up.
Semantic segmentation is used alongside satellite imagery to analyze land usage. Land cover information is used to do things like track urbanization rates and identify areas suffering from deforestation. Every pixel is given a label, like water, urban, forest, or agriculture. The detection of buildings, roads, and other urban features is particularly important for the research and analysis of city planning, road monitoring, and traffic management problems.