voc2012 Dataset Details

https://blog.csdn.net/wenxueliu/article/details/80327316

In target detection, if you don't know the data, you sometimes see clouds in the data set processing area.such as

  1. What exactly does the word trainval mean
  2. What is the range of ymax, xmax, ymin, xmin values in the bbox label and how do they be calculated?
  3. How pictures relate to labels, etc.

The purpose of this article is:

  1. Understand how datasets are organized
  2. How TensorFlow handles data.

Dataset Details

In target detection, it is mainly used in Annotations, ImageSets, JPEGImages

ImageSets/Main/saves the index of the specific dataset, Annotations saves the label data, and JPEGImages saves the image content.

ImageSets

The ImageSets/Main/folder is named in the format {class}_trainval.txt {class}_val.txt.Exceptions to train.txt val.txt

Includes four folders: Action, Layout, Main, Segmentation

  1. Action: Stores human actions (e.g. running, jumping, etc., which is also part of VOC challenge)
  2. Layout: It stores data with body parts (head, hand, feet, etc.), which is also part of VOC challenge
  3. Main: Stores image object recognition data, divided into 20 categories.
  4. Segmentation: Stores data that can be split.

The ImageSets/Main/folder is named in the format {class}_trainval.txt {class}_val.txt.Exceptions to train.txt val.txt

aeroplane_train.txt
aeroplane_trainval.txt
aeroplane_val.txt
bicycle_train.txt
bicycle_trainval.txt
bicycle_val.txt
bird_train.txt
bird_trainval.txt
bird_val.txt
boat_train.txt
boat_trainval.txt
boat_val.txt
bottle_train.txt
bottle_trainval.txt
bottle_val.txt
bus_train.txt
bus_trainval.txt
bus_val.txt
car_train.txt
car_trainval.txt
car_val.txt
cat_train.txt
cat_trainval.txt
cat_val.txt
chair_train.txt
chair_trainval.txt
chair_val.txt
cow_train.txt
cow_trainval.txt
cow_val.txt
diningtable_train.txt
diningtable_trainval.txt
diningtable_val.txt
dog_train.txt
dog_trainval.txt
dog_val.txt
horse_train.txt
horse_trainval.txt
horse_val.txt
motorbike_train.txt
motorbike_trainval.txt
motorbike_val.txt
person_train.txt
person_trainval.txt
person_val.txt
pottedplant_train.txt
pottedplant_trainval.txt
pottedplant_val.txt
sheep_train.txt
sheep_trainval.txt
sheep_val.txt
sofa_train.txt
sofa_trainval.txt
sofa_val.txt
train.txt
train_train.txt
train_trainval.txt
train_val.txt
trainval.txt
tvmonitor_train.txt
tvmonitor_trainval.txt
tvmonitor_val.txt
val.txt
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  1. {class}_train.txt holds all indexes of a training set of classes, with 5717 train data for each class.
  2. {class}_val.txt holds all indexes of the validation set whose category is class, and there are 5823 valdata for each class
  3. {class}_trainval.txt holds all indexes of the training validation set whose category is class, and each class has 11540 value data

Each file contains content that is

2011_003194 -1
2011_003216 -1
2011_003223 -1
2011_003230  1
2011_003236  1
2011_003238  1
2011_003246  1
2011_003247  0
2011_003253 -1
2011_003255  1
2011_003259  1
2011_003274 -1
2011_003276 -1

Note: 1 represents positive samples and -1 represents negative samples.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

VOC2012/ImageSets/Main/train.txt saves the file names of all training sets and finds the picture files corresponding to the file names from VOC2012/JPEGImages/VOC2012/Annotations/Find label file for file name

VOC2012/ImageSets/Main/val.txt saves the file names of all validation sets and finds the picture files corresponding to the file names from VOC2012/JPEGImages/VOC2012/Annotations/Find label file for file name

Read the Example object converted from JPEGImages and Annotation files to tf and write the {train|test}{index}_of{num_shard} file.The number of Examples written per file is total_size/num_shard.(Num_shard can be adjusted appropriately by different datasets to control the size of each output file)

Annotations

Files in the folder are named XML files in the format {id}.xml (id saved in VOC2012/ImageSets/Main/folder). Keep the following key information

  1. Object label: name, for example, person
  2. Picture size: depth, height, width
  3. Object bbox: xmax, xmin, ymax, ymin under bndbox
    <annotation>
        <filename>2009_001137.jpg</filename>
        <folder>VOC2012</folder>
        <object>
            <name>person</name>
            <bndbox>
                <xmax>355</xmax>
                <xmin>187</xmin>
                <ymax>334</ymax>
                <ymin>121</ymin>
            </bndbox>
            <difficult>0</difficult>                //Is the target hard to recognize (0 means easy to recognize)
            <occluded>0</occluded>
            <pose>Unspecified</pose>                //Shooting angle
            <truncated>0</truncated>                //Is it truncated (0 means complete)
        </object>
        <object>
            <name>pottedplant</name>
            <bndbox>
                <xmax>500</xmax>
                <xmin>376</xmin>
                <ymax>261</ymax>
                <ymin>1</ymin>
            </bndbox>
            <difficult>0</difficult>
            <occluded>1</occluded>
            <pose>Unspecified</pose>
            <truncated>1</truncated>
        </object>
        <segmented>1</segmented>
        <size>
            <depth>3</depth>
            <height>334</height>
            <width>500</width>
        </size>
        <source>
            <annotation>PASCAL VOC2009</annotation>
            <database>The VOC2009 Database</database>
            <image>flickr</image>
        </source>
    </annotation>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41

JPEGImages

Using 2009_001137.jpg as an example, get image_data (binary of picture content), height, width after decoding

Data Set Conversion to TF Record

For example, the final TFRecord is

TF Example 
{
'filename' : 'VOC{year}/JPEGImages/{id}.jpg'
'height' :  height
'width' :   width
'classes' : [classes.index(person), classes.index(person)]
'y_mins' : [float(121)/334, float(1)/334]  # ymin of each object
'x_mins' : [float(187)/500, float(376)/500]
'y_maxes' : [float(334)/334, float(261)/334]
'x_maxes' : [ float(355)/500, float(500)/500]
'encoded' : 'Picture Content Binary'
}

//among

classes = [
    "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat",
    "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"
]

//Note: difficult = 1 Skip directly without processing.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

Turn a picture and tags into a TF Example. For the entire dataset, iterate through the dataset in turn.

    Twenty-one original articles were published, 6 were praised, and 776 were visited
    Private letter follow

    Tags: Database xml

    Posted on Tue, 03 Mar 2020 18:35:52 -0800 by objNoob