YOLO: You Only Look Once (you only need to see it once)
Detection as Regression
The system divides the input image into S*S grids.
If the center of an object falls in a grid cell, the grid cell is responsible for detecting the object.Each grid cell predicts B bounding boxes and confidence scores for these boxes.These confidence scores reflect the confidence level of the model for the objects contained in the boxes and the prediction accuracy of the model for the boxes.We define confidence as Pr(Object) IOUtruth pred.If there is no object in the cell, the confidence score should be zero.Otherwise, the confidence score is equal to the intersection of the union (IOU) between the prediction box and the true range value.
Each bounding box consists of five predictions: x, y, w, h and confidence level.(The x, y coordinates represent the center of the box relative to the boundary of the grid cell.Width and height are predicted relative to the entire image.Finally, the confidence prediction represents the IOU between the prediction box and any ground truth box.
Each grid cell also predicts the C-condition class probability Pr (Classi object).These probabilities are determined on the grid cells containing the objects.Only one set of class probabilities for each grid cell is predicted, regardless of the number of boxes B.In the test, the conditional class probability is multiplied by the confidence level prediction of a single box to get the class-specific confidence level score for each box.These scores encode the probability of the class appearing in the box and how well the prediction box matches the object.
For YOLO on PASCAL VOC, we use S=7 and B=2.PASCAL VOC has 20 tag classes, so C=20.
The final prediction is a 7*7*30 tensor.
Call yolo's method
1 Based on Istudio's AddileDetection, use GPU to train and predict, see Another blog
2 Method of using Darknet
Direct calls to pre-trained models and weights on COCO (Common Objects in Context) datasets.
Official Darknet website: https://pjreddie.com/darknet/
import cv2 import matplotlib.pyplot as plt from utils import * from darknet import Darknet
# Specify the location of the cfg file that contains the structure of the model cfg_file = './cfg/yolov3.cfg' # Specify the location of the weights file, which contains the weight of the model weight_file = './weights/yolov3.weights' # Specify COCO dataset class labels namesfile = 'data/coco.names' # Load Model Structure m = Darknet(cfg_file) # Load Model Weights m.load_weights(weight_file) # Load COCO class labels class_names = load_class_names(namesfile)
# Specify drawing size plt.rcParams['figure.figsize'] = [24.0, 14.0] #Load Image img = cv2.imread('./images/dog.jpg') # Convert to RGB color channel original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Scale the image to the required input size for the model resized_image = cv2.resize(original_image, (m.width, m.height)) # Draw Image plt.subplot(121) plt.title('Original Image') plt.imshow(original_image) plt.subplot(122) plt.title('Resized Image') plt.imshow(resized_image) plt.show()
nms_thresh = 0.6 #Setting a threshold for non-maximum suppression of NMS iou_thresh = 0.4 # Setting the IOU Threshold of Intersection and Merge Ratio
# Set Image Size plt.rcParams['figure.figsize'] = [24.0, 14.0] # Load Image img = cv2.imread('./images/dog.jpg') # Convert to RGB color channel original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Scale the image to enter dimensions for the model resized_image = cv2.resize(original_image, (m.width, m.height)) # Set IOU Threshold iou_thresh = 0.4 # Setting NMS Threshold nms_thresh = 0.6 # Detecting objects in images boxes = detect_objects(m, resized_image, iou_thresh, nms_thresh) # Output Detected Objects and Confidence print_objects(boxes, class_names) # Visualize images, frames, and classification results plot_boxes(original_image, boxes, class_names, plot_labels = True)