3 minute read

EfficientNet

For parameter calculation

# EfficientNetParameterCalculator.py
a = 1.2
b = 1.1
r = 1.15

a_list = []
b_list = []
r_list = []

phi_list = [0, 0.5, 1, 2, 3.5, 5, 6, 7] # phi value of 0, 1, 2, 3, 4, 5, 6, 7

for phi in phi_list:
    new_a = pow(a, phi)
    new_b = pow(b, phi)
    new_r = pow(r, phi)
    a_list.append(new_a)
    b_list.append(new_b)
    r_list.append(new_r)

For channel of each stage i

channels = [32, 16, 24, 40, 80, 112, 192, 320, 1280]
new_channels = []

b_length = len(b_list)
c_length = len(channels)

for b_num in range(b_length):
    for c_num in range(c_length):
        new_channel = b_list[b_num] * channels[c_num]
        new_channels.append(new_channel)

For depth of each stage i

layers = [1, 1, 2, 2, 3, 3, 4, 1, 1]
new_layers = []

a_length = len(a_list)
l_length = len(layers)

for a_num in range(a_length):
    for l_num in range(l_length):
        new_layer = a_list[a_num] * layers[l_num]
        new_layers.append(new_layer)

modify the value of parameters for proper iteration

(1) batch size

...
class General:
    log_frequency = 10
    name = __name__.rsplit("/")[-1].rsplit(".")[-1]
    #batch_image = 8 if is_train else 1
    batch_image = 2 if is_train else 1
    fp16 = True
    loader_worker = 8

(2) epochs

...
class schedule:
    #mult = 6
    mult = 1
    begin_epoch = 0
    end_epoch = 6 * mult

Table for training from scratch

  • batch size: 2 or 4 or 6 or 8,
  • optimizer: lr= , increasing rate: lr / iteration,
  • epochs:
  • top_N: 1000, 2000, …,
  • NMS: 0.5 or 0.6

EfficientDet

for object detection,
Backbone: EfficientNetBX(sharable parameters)
Neck: BiFPN
Head: N-class classification subnet + Bounding box regression subnet

BiFPN: for different weight, can use fast normalization fusion.

먼저, NAS-FPN을 이용해서 결과를 측정하고, Bi-FPN도 같이 수정해나간다.

EfficientDet에 EfficientNet이 FPN과 연결되는 형태는,
EfficientNet의 Stage와 관련이 있으며 특정 Stage에서 Input resolution, output resolution이 달라지는 것을 이용해서,
다양한 resolution의 feature map을 사용하면 피라미드 모양의 Feature Pyramid Networks를 구성할 수 있다. input output Stage1에서 224X224 -> 112X112, P1 feature maps, output size is 1/2 of input resolution
Stage2에서 112X112 -> 112X112,
Stage3에서 112X112 -> 56X56, P2 feature maps, output size is 1/4 of input resolution
Stage4에서 56X56 -> 28X28,
Stage5에서 28X28 -> 28X28, P3 feature maps, output size is 1/8 of input resolution
Stage6에서 28X28 -> 14X14, P4 fature maps, output size is 1/16 of input resolution
Stage7에서 14X14 -> 7X7, P5 feature maps, output size is 1/32 of input resolution
Stage8에서 7X7 -> 7X7,
Stage9에서 7X7 -> 3X3

P6 feature maps, output size is 1/64 of input resolution
P7 feature maps, output size is 1/128 of input resolution

P6과 P7은 P5를 이용하여 얻어낸다.

P6 feature maps, output size is 1/2 of P5 resolution (2X2 kernel로 stride 2)
P7 feature maps, output size is 1/4 of P5 resolution (4X4 kernel로 stride 4)

GPU 메모리의 제한(12GB)으로, EfficientNetB4까지만 트레이닝 가능.

To do list

  • Imagenet 1K for using pre-trained weight
  • COCO 2014 30class for using pre-trained weight
  • Dropout
  • Fast Normalization Fusion in BiFPN
  • From D0 ~ D7, compare the results of them
  • Transfer learning or Fine-tuning from COCO dataset to WRS dataset

Fine-tune

bbox_conv1234_weight, bbox_conv1234_bias, bbox_pred_weight, bbox_pred_bias
cls_conv1234_weight, cls_conv1234_bias, cls_pred_weight, cls_pred_bias
python으로 mxnet 임포트해서, pre-trained params 불러와서 classifier, regressor 부분 지워주고,
nd.save 로 저장한다음에 pretrain_model로 보내서 detection_train.py

Anchor size 조절해야하고, Freeze 얼마만큼 할 것인지?
Stage1만 가져왔을때 결과 0.857

KFold then Ensemble PyTorch multiple GPU WBF
Data Augmentation, CutMix, MixUp, Insect augmentation
use Learning scheduler

https://github.com/rwightman/efficientdet-pytorch

Tensorflow style

https://github.com/google/automl
install them and then train.

strategy:
batch_size: 8, 4, 2
num_epochs: 500, 1000
num_examples_per_epch: (important) 40, 80, 100, 200
anchor scale: (important) TBD
ratio aspect: (important) TBD label_smoothing: TBD
Augmentation: TBD

Need to think of the object size for small object detection.
All the effdet model have different input size,

How to train the model

Transformate your dataset to COCO format first.

PYTHONPATH=".:$PYTHONPATH" python dataset/create_coco_tfrecord.py --image_dir=wrs/train2017/ --object_annotations_file=wrs/annotations/instances_train2017.json --output_file_prefix=wrs/train

PYTHONPATH=".:$PYTHONPATH" python dataset/create_coco_tfrecord.py --image_dir=wrs/val2017/ --object_annotations_file=wrs/annotations/instances_val2017.json --output_file_prefix=wrs/valid

then move your tfrecord files into tfrecord directory in your dataset.

(effdet) dongjun:~/djplace/automl/efficientdet/wrs$ tree -d
.
├── annotations
├── test
├── tfrecords
├── train2017
└── val2017
vi wrs.yaml

use_keras_model: False
num_classes: 13
label_map: {1: obj1, 2: obj2, 3: obj3, 4: obj4, 5: obj5, 6: obj6, 7: obj7, 8: obj8, 9: obj9, 10: obj10, 11: obj11, 12: obj12}
moving_average_decay: 0
var_freeze_expr: '(efficientnet|fpn_cells)'
anchor_scale: 2.0
mixed_precision: True
python main.py --mode=train_and_eval --training_file_pattern=wrs/tfrecords/train*.tfrecord --validation_file_pattern=wrs/tfrecords/valid*.tfrecord --model_name=efficientdet-d3 --model_dir=wrs_efficientdet-d3-finetune --ckpt=efficientdet-d3 --train_batch_size=4 --num_examples_per_epoch=200 --num_epochs=1000 --hparams=wrs.yaml

How to do inference

First of all, we need 2 steps to infer the test images.
(1) Exporting the ‘your_trained_model.pb’ file
(2) Infer the images using (1) file.

#(1)
python model_inspect.py --runmode=saved_model --model_name=efficientdet-d3 --saved_model_dir=tmp/saved_model2/ --ckpt_path=efficientdet-d3-finetune/archive/ --hparams=wrs.yaml --batch_size=1

#(2)
python model_inspect.py --runmode=saved_model_infer --model_name=efficientdet-d3 --ckpt_path=wrs_efficientdet/efficientdet-d3-finetune/ --saved_model_dir=tmp/saved_model/efficientdet-d3_frozen.pb --input_image=./wrs/test/*.png --output_image_dir=tmp/img_output --hparams=wrs.yaml