Choosing To Invest In
Image Annotation Services

The Complexity Of Image Annotation offers many different image annotation services like the creation of bounding boxes, semantic segmentation, instance segmentation, and image masking. These different processes can be quite involved and require the attention of skilled workers, which carefully chooses and trains.

Image annotation is a complex task, with many different variables that must be taken into account. Creating bounding boxes requires considering things like edge cases, where the image was taken, what kinds of features apply to different objects, etc. Because of the complexity of image annotation, many companies choose to outsource the task.

We’ve previously discussed why you would want to outsource image annotation, but we only discussed the complexity of creating bounding boxes. Beyond bounding boxes there are other components to image annotation, such as semantic/instance segmentation and image masking. also provides services for these other image annotation tasks.

Image Annotation Services We Provide

Before we take a deep dive into the different services provides, let’s take a quick moment to define some terms related to image annotation services.

Object Recognition -  Object recognition is the general task of locating all relevant objects in an image.

Boundary Box - A boundary box/bounding box is a box that defines the extent of the object within the image. The box highlights the width/height of the object, telling the network where to look to detect the object.

Segmentation - The term segmentation, in general, refers to partitioning an image into different, discrete parts or regions. However, with regular segmentation there is no attempt made to give these regions any kind of meaning/label. While segmentation just refers to the process of creating discrete regions, “semantic segmentation” refers to the process of partitioning images and also trying to fit the images into a pre-defined, semantic class. Semantic segmentation can be done through classifying entire chunks of the image, or by classifying individual pixels.

Creating Bounding Boxes

Image Annotation - Bounding Boxes
Photo: By (MTheiler) - Own work, CC BY-SA 4.0,

The creation of bounding boxes is one of the primary image annotation services offered by Bounding boxes are used to identify objects. Colored boxes are assigned to objects in the image, surrounding the object in question. This specifies where the network should be looking for the image, describing the object’s target location. The process of assigning bounding boxes to an object can be complicated, as one must take into account things like overlapping objects. A predicted boundary box can also be different from the region that the network itself will actually recognize the object as being located in. Knowing how to properly assign bounding boxes is important to creating a dataset that will make the classifier perform optimally.

Various tools can be used to create bounding boxes. There are a number of image annotation tools that can be used, such as Sloth, Visual Object Tagging, and LabelBox. Each of these tools has their own strengths and weaknesses, but no matter which tool is used the tagger has to be familiar with it and trained on it to carry out quick yet efficient tagging.

Semantic Segmentation

Another image annotation service is image segmentation. This is the process of dividing an image into multiple different parts, into pixel sets. Segmentation is meant to divide an image into parts that are easier to analyze, and after the segmentation occurs all the regions joined together will form the contours of the image. Semantic segmentation assigns labels to semantically similar regions.

Image Annotation - Bounding Boxes
Segmentation of parts of a femur. Photo: By Newe A, Ganslandt T - Newe A, Ganslandt T (2013) Simplified Generation of Biomedical 3D Surface Model Data for Embedding into 3D Portable Document Format (PDF) Files for Publication and Education. PLoS ONE 8(11): e79004. doi:10.1371/journal.pone.0079004, CC BY 3.0,

To make this more concrete, let’s take a real world example. Consider an image of a street, with a road, cars, pedestrians and signs. Semantic segmentation would group semantically similar objects, objects with similar definitions, together. Cars would be grouped together, and so would pedestrians. Buildings would also be grouped together. Regions of the image can be joined together by similar colors, textures, and orientations. Adjusting the threshold of a segmentation system will impact how different areas are classified/labeled.

Image Annotation - Semantic Segmentation
Semantic segmentation of a glass, a teddy bear, and a rose. Photo: MartinThoma via Wikimedia Commons, CC 1.0,

Instance segmentation is a subset of segmentation, similar to semantic segmentation, but instead of grouping general regions of an image together based on semantic meaning, individual objects are highlighted. In other words, every instance of an object is tagged, so if you have an image with five cars in it each car, each instance, will get its own value.
When doing semantic segmentation it is important to scrutinize how portions of the image are labeled. Should paint on a road be labeled as a sign or as part of the road? Are driveways considered part of the road or the sidewalk? The questions must be properly considered as they will impact the network’s performance.

Image Masking

A third image annotation service provided to clients is image masking, which refers to getting a small portion of an image and applying different transformations to that portion. An image mask, sometimes referred to as a convolution matrix, or a kernel, is a small matrix. A matrix is a representation of the image that the neural network can interpret, while the image mask is a representation of just one part of the image. Masks are created of objects you are interested in, and the objects are isolated by turning the values of pixels around them down to zero.

Image Annotation - Image Masking
Image masking can be used to isolate an object from the rest of an image and apply transformations to it. Photo: Clippingpathandroid via Wikimedia Commons, CC 4.0,

Once a mask of the image has been created, and the part of the image you are interested in isolated, different transformations can be applied to the image. The image kernel can be transformed to assist in tasks like edge detection, sharpening, or blurring. If an image contains multiple subjects but you are interested in making one subject stand out from the others, this can be done by taking a mask of the subject.

Image masks, segmentations, and bounding boxes are all examples of image convolutions. These different convolutions are methods of drawing important features out of an image that will be passed into to a Convolutional Neural Network, a special type of neural network used for computer vision.

The Need For Quality Assurance

A major part of image annotation, beyond the actual tagging and segmentation tasks themselves, is doing quality assurance on the tagged images. A good Quality Assurance team will review the tasks completed by the image annotators. The QA team will ensure that the image annotation is done to the customer’s satisfaction. The QA team will delete any improper tags that have been created and create any new tags that need creating.

In addition to correcting for mistakes, it’s important to have a QA team because they are the go-between for the customer and the image annotation team, enabling feedback from the customer to get to the image annotators. Many companies find that their needs are different from their initial assumptions, so a QA team can take the new requirements into account and deliver the updated requirements to the image annotators.

Composing A Dataset

Even after the various image annotation services have been completed, all the images must be compiled together into a dataset. When you create a dataset, you should consider things like the how representative/comprehensive the dataset is. Datasets need to be representative of all the classes you are interested in, with a roughly even amount of images representing any given class. If one class is overrepresented in the dataset, the classifier will learn the biases.

For this reason, the dataset should be analyzed for things like consistency and redundancy. Any duplicate images should be removed from the dataset. These redundant images will teach the network very little, but it will extend the length of the training. This is a form of data cleaning. Other forms of dataset preparation include imputing missing values, downsampling the dataset to make it easier to train on, and formatting the data to make it all consistent. Properly annotated or segmented images will make the composition of a final dataset much simpler and easier. If data has been improperly annotated, the creation of a dataset becomes more difficult since it will be harder to do things like check for consistency or redundancy. provides image annotation services like the creation of boundary boxes, semantic segmentation, and image masking. The image annotation services that provides can save you a lot of time and trouble.’s trained image annotators will ensure that your data is annotated properly, able to respond to any changes in requirements thanks to the assistance of the QA team. No matter the annotation service you need, will work alongside you to ensure a high quality dataset.