A Guide to Build your Custom Object Detection Model using YOLOv3

October 6, 2022 | Comments(0) |


1. Introduction
2. About YOLO
3. Step-by-Step Guide on Custom Object Detection Model
4. Conclusion
5. About CloudThat
6. FAQs



Object Detection is a part of the Computer Vision technique to localize the object in the image and classify it. As we humans see what the object is, we also make computers to understand what the image is and where it is localized.


Fig 1: Source Google

As you can see in above Fig 1 Object detection compromises Classification and Localization.

It is possible for computers to observe, recognize, and analyze objects in images and videos in a similar fashion to how people do so using Computer Vision, a field of artificial intelligence that uses machine learning and deep learning. The use of Computer Vision for automated AI visual inspection, remote monitoring, and automation is quickly gaining prominence.

In this blog, you will come to know how to train and detect custom object detection using You only Look once V3. In the end, I am sure that you can implement your custom object detection. I have used Google Colab for training purposes. And for the demo, I have used Face Mask Detection, as it is a binary class (With Mask or Without Mask). Also, I have mentioned the requirements to get started.

About YOLO

The YOLO (You Only Look Once) was written by Joseph Redmon using a framework called Darknet. YOLO is an open source and the state of the art algorithm for real-time object detection. There are multiple versions of YOLO (V2, V3, V4, and V5). We will be using Yolo V3 for easy training.

The initial version presented the overall architecture, the second iteration improved the design and used pre-defined anchor boxes to boost the bounding box proposal, and the third iteration further improved the model architecture and training procedure.

Step-by-Step Guide on Custom Object Detection Model

Here we will be creating Face Mask Detection using YOLO v3

Step 0: Custom Dataset Creation and Labelling

You have to collect the data for custom training. After preparing the dataset, it is recommended to you use the LabelImg tool, which can be used to create bounding boxes and actual labels for the images.

Easy installation:

For more reference: https://github.com/tzutalin/labelImg

After installing,

  • Create a new folder “Train” and create a “class.txt” file
  • In the class.txt file create the class label

Example: 0 Mask Not Detected

               1 Mask Detected

Create obj.data file and modify the content below
classes= 2 (person with mask and without mask)
train = data/train.txt
valid = data/train.txt
names = data/obj.names
backup = backup/

  • Classes: represent no of classes
  • path to train data
  • path to test data
  • Create another folder “Images” under the “Train” Folder
  • Move all images (of different classes) to the “Images” folder

Now using the labelImg tool, create a bounding Box for the dataset

Make sure you save the image with the bounding box in the same Folder “/train/images” and save it in YOLO Format

Upload to google drive or GitHub account as a zip file

Congrats, one big step has been completed.

Step 1: Cloning the Darknet repository for YOLO architecture

Here, we are cloning the architecture of yolov3 which is used for detection.

!git clone https://github.com/AlexeyAB/darknet.git

Step 2:  Configuring the MakeFile

Here, we are going to make some changes in the Make File for further computation.

2.1 Change the directory to Darknet Folder

2.2 Make sure You have GPU installed

2.3 Make Changes to GPU and OPENCV from 0 to 1

  • ‘1’ represents to activate or use
  • !sed – stream editor
  • !cat – Makefile Cat(concatenate),it will read the file
  • !cat Makefile
  • !make

Step 3: Download the pre-trained weights

Step 4: Upload the dataset you have stored into Git hub or google drive

Make sure you have the below-defined files,

  1. obj.data
  2. obj.names
  3. dataset ( images )

Unzip the data files we zipped before

!unzip  data/custom.zip -d data/ # adjust the path





With the above files, you also need train.txt where it says the path of every image for training, and for validation it’s optional.

Step 5: Configuring the Yolo cfg file

We now going to make some changes to yolov3.cfg file available in Darknet/cfg folder

  • random 0 to 1
  • Max_batch = No_of classes * 2000
  • Filters = (classes + 5)x3
  • Subdivisions should be 8 batches to 32
  • Set network size width=416 height=416 or any value multiple of 32
  • Change line classes=80 to your number of objects ( e.g.: 2 )
  • To Configuring the cfg file run the below command

Step 6: Train and Test model

For Linux use the below command


Step 7: When should I stop the training?

  • In the training part, you will see average loss, IoU, ith iteration as output
  • Make note of the average loss once the loss starts to increase rather than decrease continuously. If your average loss is increasing, then you should stop the training
  • After every 100 iterations, you will see, the weight’s are downloaded to the darknet/backup folder and after every 1000 iterations Weight’s will be stored in the darknet/backup folder
  • So now, let’s check the accuracy of our weight’s using a map indicator

For example, you have 3 different weight files (7000, 8000, and 9000th iterations)

darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights

Replace 7000 with 8000 and 9000

Choose weights-file with the highest mAP (mean average precision) or IoU (intersect over union) darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map

For windows use,

Darknet.exe instead of  !./darknet


Step 8: Testing with input Images / Videos

ImageDetection :

For Linux,

!./darknet detector test data/obj.data cfg/yolov3.cfg /content/weights/yolov3_1300.weights /content/darknet/data/image_test01.jpg -dont_show

Video Detection :

!./darknet detector demo data/obj.data cfg/yolov3.cfg /content/weights/yolov3_1300.weights -dont_show videoname -i 0 -out_filename me_06.avi -thresh 0.7

That’s it. Congratulation, you made it.

Sample Output 


Video Source Detection:



To elevate the custom object detection using Yolo, we created the Person with Mask and Without dataset and labeled it carefully using the tool LableImg. With that, we choose Yolo v3 as an architecture for faster detection. At last, trained and tested successfully in Google Colab.

Git Hub Reference: https://github.com/Ganesh9100/Mask-Detection-YOLO_V3-

With more Training data and different classes, the model can be used for many Real-Time Applications.

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding YOLO, Object Detection and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThats offerings.


Q1. What is LabelImg?
A. It is a graphical image annotation tool written in python.
Installation :

Q2. What is Yolo Cfg file?
A. It is a configuration file where it has some parameters like batch, subdivisions, decay, etc.

Q3. What is darknet ?
A. It is an opensource predefined neural network framework written in C and CUDA and also it supports CPU and GPU computations

Leave a Reply