1. What is Data labeling?One of the most important part in the data science process is data pre-processing which makes data scientists put lots of time and effort into. Particularly, in the context of machine learning, data pre-processing requires a step of labeling. This is the step to detect and label data sample into multi-classifications so that the labeled data can be used in further machine learning process. Two typical labeling data type in ML projects are text and image annotation. The former is the crucial step in NLP projects such as sentiment analysis and the latter is the required part in computer vision projects. Labeling process is normally handled by manual, but it can be assisted by some applications to reduce the working-time. In this article, I will introduce LabelImg — an image annotation tool which is quite simple and easy-to-use for the beginner or non-technical guys who are concerning about how to do computer vision projects. Show 2. What is LabelImg?LabelImg is a graphical image annotation tool which provides images with bounding boxes after labeling. It is written in Python and uses Qt (one of the most common GUI for python) for its graphical interface. The output labeled data are saved as XML files in PASCAL VOC format, the format used by ImageNet. Also, YOLO format is supported in this tool. Below are the LabelImg User Interfaces when opening: 3. Why do we use LabelImg?
However, this tool still has some limitations:
3. Installation.3.1. Prerequisites.The installation below is used for Windows OS with Anaconda. Therefore, we need to install Anaconda to have an Anaconda terminal. Follow the link below to install Anaconda in Windows: https://docs.anaconda.com/anaconda/install/windows/ 3.2. Installing LabelImg in Windows with Anaconda.
We should create a new environment instead of using the base environment in Anaconda to avoid conflict of versions while installing the packages. Because the current LabelImg packages requires the older version of python (< 3.8.x). Therefore, creating a new separate environment can avoid the error conflict when there are two versions of python in a same environment. Do the following commands: conda create -n env_nameconda activate env_name E.g: conda create -n labelimgconda activate labelimg 3.3. Install python/tensorflow.Please note that we only install the python version which is older than 3.8.x because the latest tensorflow version is not compatible with python version > 3.8.0. Do the following commands:
# this command will install the latest version of tensorflow 3.4. Install pyqt and lxml.These are prerequisite packages needed to be imported before running labelImg.
3.5. Install labelImg package.
e.g: cd C:\Users\sang.huynh\Documents\SANG\labelImg
Also, to avoid manually changing the location of resources.py file, we can use the command below instead (as suggested by Jaggu): pyrcc5 -o libs/resources.py resources.qrcThis command will create a resource.py file in folder labelImg. This file will be moved to the libs folder in the next step.
Then it opens a window of labelImg app. In View menu, tick Auto Save Mode — this will make the output files are automatically saved in Dir Folder. Output files are saved in XML format as default. 4. Open labelImg after the first installation.
**THANKS FOR READING!*** ***References: |