AI In Fashion: Image Annotation and Tagging in the Fashion Industry


When it comes to the use of artificial intelligence in the fashion industry, one of the most common implementations of AI is for image classification/recognition. AI systems can be trained to recognize specific items of clothing. In order for an AI to classify clothing items, the images of the clothes must first be annotated and tagged. Image annotation is the process of adding metadata to images that can enhance the accuracy of an image classifier, enabling the classifier to recognize features of the clothes like necklines, hems, sleeve length, lapel type, etc. 

What Is Image Annotation?

As previously mentioned, image annotation is the process of adding metadata to images. This metadata helps the AI classifier distinguish a given item of clothing, by enabling the classifier to analyze many different features of the clothing article and correctly classify it based on those features.

When feeding images into an AI classification system, the images are tagged or labeled with extra information about the type of object in question. This process of tagging and annotation lets the classifier have access to extra features it would not be able to access if provided with just a regular image. These extra features enhance the classification accuracy of the machine learning model. The features that are tagged/annotated depend on the type of object in question. If the object was a car, the features could be things like the number of doors, location and type of headlights, height, color, etc. 

In terms of features for clothing articles, the features can be just about any aspect of a piece of clothing that can distinguish one clothing piece from another clothing piece. The type of fabric used, the color, the length of the hem, how many stitches, presence of a zipper, neckline, sleeve length, and many more features can be selected as features and tagged to increase the accuracy of an image classifier.

How Does Image Annotation Work In The Fashion Industry?

When carrying out image annotation for the fashion industry, images of fashion products have to be given bounding boxes. These bounding boxes contain not only the class label for the clothing item but a list of related attributes like material, style, patterns and more. The more accurate the bounding box and ground truth label, the better the classifier will perform.

There are various other image recognition techniques that can assist in the recognition of clothing items. One effective tactic is the creation of image masks and image segmentation. 

Image Masks

An image mask is a transformation applied to an image, isolating objects you are interested in by turning the values of the surrounding pixels down to zero. The object of interest is isolated, and different transformations can then be applied to this object. For instance, the object can be enhanced or blurred if needed. 

Image Segmentation

Image segmentation refers to the task of dividing an image into multiple sets/or discrete regions of pixels, into different parts. The logic behind dividing an image into different parts is that it makes the image easier to analyze and interpret. When the different segmented portions of the image are joined together, the combined regions will form the contours of the object.

There are different types of segmentation, including semantic segmentation and instance segmentation. Semantic segmentation separates parts of the image by their semantic definitions. If you were segmenting an image of a city street, objects with different semantic definitions, like signs, pedestrians, roads, and cars, would be grouped together. There are different criteria that can be used to separate objects into semantic groups, like textures, colors, object orientation, etc. Adjusting the threshold of a segmentation algorithm will influence how sensitive the algorithm is to recognizing objects, and it will shift how different areas are labeled and classified.

The process of instance segmentation is very similar to semantic segmentation, but while semantic segmentation groups regions of the image together based off their semantic interpretation, instance segmentation highlights every specific occurence/instance of an object. So while semantic segmentation may tag cars as being one group of items, instance segmentation will tag every car individually, making sure each instance/object has its own value. Similarly, when it comes to the fashion industry, instance segmentation techniques can help a classifier identify individual shirts or dresses on a rack, rather than just a region containing clothes.

Techniques like instance segmentation and image masking can be used to provide an image classifier with extra information about the features/attributes of the objects in question, enhancing the performance of the classifier. Image segmentation techniques can be especially useful when applied to small items like scarves or hats, as these items may only take up small portions of the bounding box.

Why Are Trained Taggers Needed For Image Annotation In The Fashion Industry?

When image annotation and tagging is being done for fashion applications, the tagging should be carried out by trained professional taggers. The process of tagging images and inserting annotations is much more complex than it may appear at first. The features that will be passed into a neural network for classification can be made up of many subclasses of an item, with attributes nested within attributes. For instance, consider how many attributes could apply to a shoe. The shoe can have many different types of laces, heels, toes, arches, materials, and patterns.

Beyond the attributes that can apply to the garments themselves, there are properties of the entire image that must be considered. As an example, the lighting conditions the image was taken in have to be considered, as these can often alter the color of an object. Lighting conditions are different outside in nature compared to a studio, so this must be taken into account. Another thing that must be considered is if the clothing item is modeled on a mannequin or a model, as the clothes may hang differently.

When doing image annotation, edge cases also have to be considered. Edge cases are situations that are similar to situations you want to classify, but the instances differ from the target instances in nontrivial ways. For instance, if an item of clothing is partially obscured, that must be dealt with. 

Making The Decision To Invest In Professional Annotation

Because the task of image annotation for the fashion industry can be extremely complex, companies that need image annotation done often choose to outsource their image annotation to specialty companies rather than do the image annotation themselves. This is an intelligent choice, as even when making use of tools intended to make image annotation/tagging easier, it can still take an inordinately long time to annotate a dataset, potentially tens of thousands of hours. 

For this reason, investing in qualified professionals who are trained in the intricacies of image annotation can save people a lot of time and stress, ensuring that their data is properly annotated and that their classifier will perform optimally.