J. Dai and Y. Li, R-FCN : Object detection via regionbased fully convolutional networks, Journal NIPS, 2016.

J. Redmon and S. Divvala, You only look once : Unified, real-time object detection, IEEE CVPR, 2016.

W. Liu and D. Anguelov, SSD : Single shot multibox detector, ECCV, 2016.

W. Han and P. Khorrami, Seq-NMS for video object detection, Journal CoRR, 2016.

K. Kang and H. Li, T-CNN : Tubelets with convolutional neural networks for object detection from videos, IEEE Trans. On TCSVT, 2018.

X. Zhu and Y. Xiong, Deep feature flow for video recognition, 2017.

A. Dosovitskiy and P. Fischer, FlowNet : Learning optical flow with convolutional networks, IEEE ICCV, 2015.

K. He and X. Zhang, Deep residual learning for image recognition, IEEE CVPR, 2016.

M. Liu and M. Zhu, Mobile video object detection with temporally-aware feature maps, IEEE CVPR, 2018.

K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Journal NIPS, 2014.

M. Zolfaghari and G. L. Oliveira, Chained multistream networks exploiting pose, motion, and appearance for action classification and detection, 2017.

Y. Wang and J. Song, Two-stream SR-CNNs for action recognition in videos, 2016.

K. Soomro, A. R. Zamir, and M. Shah, UCF101 : A dataset of 101 human actions classes from videos in the wild, Journal CoRR, 2012.