In order for an image classifier to work, the image dataset that is fed to the classifier must be annotated. Many companies consider annotating their own images, under the impression that it will save them time and resources. However, while image annotation is something that seems simple in concept, in reality it’s quite difficult.
Image annotation has many different, complicated, steps/phases to it and unless these steps are carried out correctly the resulting image classifier will perform suboptimally. Those who want the optimal performance out of their image classifier should consider outsourcing image annotation. Not only does an image dataset require detailed and attentive tagging, but the images should also undergo quality assurance as well, to ensure that they have been properly annotated and won’t throw off the classifier. In order to understand why it is so important to have properly annotated images, let’s go over the specifics of the image annotation process.
In order for a neural network based image classifier to work, the images fed to the network must be properly labeled so that it will be able to discern the patterns between the images. This means that after collecting a number of images for the training dataset, the images must be annotated. Image annotation, or labeling, is the process of adding metadata to images. This metadata provides extra, relevant information to the network, letting the network distinguish more patterns and learn more about objects than if you were simply assigning an image to one of many different classes.
Consider an image of a busy street. The image is likely to be filled with cars, pedestrians, traffic lights, streets signs, and many other objects. If you were interested in classifying cars, the cars in the image would need to be highlighted and labeled for the network. Image annotation tools use something called boundary boxes to achieve this. The cars would be assigned colored boundaries around them, and by comparing the differences and similarities between the cars to other parts of the image, the network will learn to recognize them and be able to discern a car when presented with future examples.
There is a plethora of image annotation tools out there, each created to expedite the image annotation process and make applying metadata tags simpler. Some examples of image annotation software include CVAT, LabelBox, Sloth, and Visual Object Tagging. Each of these tools specializes in different things. For instance, CVAT is an open source product that comes with many shortcuts, and its UI and UX were created specifically for the task of image annotation. LabelBox is built with the idea that companies will use it to train machine learning applications in mind, so it can be used on either hosted data or on-site data. Meanwhile, Sloth is a framework intended to give the user a solid foundation and a set of tools they can customize and adapt for their own needs.
Beyond the boundary boxes applied to objects, things get even more complicated. There are also other steps to image annotation and recognition like doing semantic segmentation and masking, although we won’t deeply into these steps here. After the boundary boxes are drawn, the image sections within the boundary boxes must be tagged for the network through the assignment of classes and attributes.
The actual tagging process is often much more complex than most people imagine. These tags will be the features provided to the neural network, but the features aren’t just simple categories like “shirt” or “shoes”. The features are made out of many subclasses and attributes within attributes. Consider all the different labels that could potentially apply to just a shoe. What sort of laces does the shoe have? What kind of heel is there? The toe of the shoe and the type of platform the shoe has must also be taken into account.
There are even attributes and considerations that exist outside the garments themselves, properties found within the image as a whole. For instance, it is important to consider things like the lighting of the image and where the image was taken. If the image was taken in a studio vs. outside in a natural environment, that difference must be taken into account. If clothes are modeled on a mannequin and not a model, this must also be considered.
There are also edge cases to consider. Edge cases are situations that may be similar in nature to the desired classification task, yet differ in important ways. Consider the task of identifying people within an image. Let’s say you wanted to identify pedestrians in an image of a street. In the case of this task, an edge case would be someone partially obscured by an object. Another example of an edge case might be two people standing very close together, with their body parts overlapping in the image. It’s important that these borderline cases be handled, either excluded or included depending on how it will affect the image classifier.
The labels/classes themselves must be chosen with careful deliberation, as classes which are too similar to one another can adversely impact the classifier. Objects can have many different attributes, and there may be many similar labels/tags for each of these attributes. Because there are so many variables to consider, variables which often aren’t apparent at first glance, a company almost always finds that their needs are different from their initial assumptions and that annotating images themselves is substantially more difficult than expected.
In fact, customers of image annotation companies typically change their requests after they have received feedback from the company. The difference between changing your mind through your own experience and changing your mind through professional feedback is that the latter will probably save you a lot of time and headache. Because of this, most companies who realize the complexity of image annotating choose to outsource.
While it is possible to do image annotating yourself, it’s almost guaranteed to save on resources if the annotating work is handled by trained professionals, if you outsource the image annotation.
Image annotation is a long and complex process, often the most difficult part of training an image recognition AI. Even when using labeling tools intended to expedite the tagging/annotation process, labeling a single image can take minutes and tagging a whole dataset, like the famous COCO (Common Objects In Context) dataset, could take tens of thousands of hours. Given how long it may take to annotate a single image, it is important to maintain a balance of both speed and accuracy.
Microwork has various procedures and strategies to ensure that both precision and speed are preserved. Taggers monitor the length of time it takes them to complete a task, so they can get an idea of their average speed. In terms of ensuring accuracy, taggers are able to have any questions they have about a tag answered by experts available in different chat rooms. The taggers also receive constant feedback from the quality assurance team, which makes them aware of any issues they should pay attention to.
The more experience a tagger has, the quicker the tagging process will go. Certain skills can only be gained through experience, and Microwork taggers have experience annotating many different images and objects, including fashion sets, logos, traffic, plants, furniture, food and a variety of household objects.
Quality Assurance is the final phase of the image annotation process. During Quality Assurance the tasks completed by the annotators are reviewed by professionals who will check for completeness and accuracy. The Quality Assurance professional has to check all the various annotation components, double checking the boundary boxes, classes, and attributes associated with the garments and image. The Quality Assurance member will correct for any mistakes, deleting any improper tags and adding new tags to anything that requires it.
Another benefit of having well trained QA agents is that they can take part in a dialogue between the QA end of the process and the image annotators. When the QA team member is correcting for mistakes in an image or editing boundary boxes, they can open a chat with a tagger and provide quick feedback to the tagger. On the other end, a tagger can direct any questions they have to members of the QA team. This two-way feedback process helps to ensure that images are properly annotated as they move through the pipeline and that any changes the customer requests are quickly implemented.
Microwork trains workers for both speed and precision, helping them maintain both. There are many design and development factors that must be taken into account during the image annotation process. The training process for Microwork taggers enables them to account for the design considerations, and the QA team can ensure that all images are annotated to the customer’s satisfaction. A communication pathway runs between the QA team to the client on one end and the tagging team on the other end. This enables prompt feedback and lets the team quickly adapt to any changes in the customer’s desires.
Microwork aims to treat its clients and workers fairly, helping resolve any issues that either group may be having. Workers who have problems with a task are always able to reach out for help, and they are given intensive training and continued support to help them achieve peak performance. Microwork is also constantly updating its tools and improving the annotation pipeline, making sure the process is always utilizing the best resources available. Outsourcing your image annotation to Microwork ensures your data is annotated with consistent speed, accuracy, and careful consideration of details based on your feedback.