Rear-end vision-based collision detection system for motorcyclists

Abstract. In many countries, the motorcyclist fatality rate is much higher than that of other vehicle drivers. Among many other factors, motorcycle rear-end collisions are also contributing to these biker fatalities. To increase the safety of motorcyclists and minimize their road fatalities, this paper introduces a vision-based rear-end collision detection system. The binary road detection scheme contributes significantly to reduce the negative false detections and helps to achieve reliable results even though shadows and different lane markers are present on the road. The methodology is based on Harris corner detection and Hough transform. To validate this methodology, two types of dataset are used: (1) self-recorded datasets (obtained by placing a camera at the rear end of a motorcycle) and (2) online datasets (recorded by placing a camera at the front of a car). This method achieved 95.1% accuracy for the self-recorded dataset and gives reliable results for the rear-end vehicle detections under different road scenarios. This technique also performs better for the online car datasets. The proposed technique’s high detection accuracy using a monocular vision camera coupled with its low computational complexity makes it a suitable candidate for a motorbike rear-end collision detection system.


Introduction
According to the motorcycle industry, there are 313 million motorcycles in the world, of which 77% are in Asia, 5% are in Latin America, and 2% are in North America. 1 The average number of motorcycles per thousand people in most of the Asian cities is ∼196, 2 whereas Europe and North America contain only 16% of the world motorcycle fleet. 1 Motorcyclists have higher risks of fatalities compared with any other types of vehicle drivers. 3 It is estimated that more than 180,000 motorcyclists died worldwide annually as a result of road crashes. 1 However, motorcyclist safety is a major issue in most Asian countries. In Malaysia, more than 50% of road fatalities are due to motorcyclists. 4 Similarly, motorcyclists are responsible for 80% of road fatalities in Vietnam, 70% in Thailand, 61% in Indonesia, and 58% in Cambodia. 4 In other countries, the situation is not as alarming, but it still represents a major concern: motorcycle-related fatalities are 13% in New Zealand 5 and 14% in the former EU15 (i.e., the 15 member nations of the EU prior to 2004). 6 Frontal and rear-end collisions that highly contribute to motorcyclist fatalities are mostly recorded on motorways and primary roads. Most of these fatalities are reported on straight road sections. 4 Many factors are involved in motorcycle accidents, among which are excessive speed, driving under the influence of alcohol, ignorance of the route, and loss of control. 7,8 To increase the safety of the motorcyclists and minimize their fatalities, different techniques have been proposed.
They can be divided into two major categories: (1) passive safety and (2) active safety. Passive safety techniques aim to reduce injuries, whereas active techniques help to avoid accidents from occurring in the first place. Passive safety covers the use of helmets, 9,10 special cloths, 11,12 air bags, 13 etc. Active safety covers electronic stability control, 14,15 antilock braking system, 16 advance collision warning (ACW) systems for motorcycles, [17][18][19] etc. These ACW active systems give early warnings to the motorcyclists about potential dangers. 20,21 As far as active safety systems for cars are concerned, there are many techniques developed for collision detections. [22][23][24][25][26][27][28] These techniques rely on global positioning system (GPS), 22 radar, 23 laser scanners, 24 intervehicle communications, 25 ultrasonic- 26 and camera-based approaches, [27][28] etc. However, there are very few documented papers or systems on collision detections or collision avoidance systems for motorcyclists.
For example, in Ref. 17, different techniques aiming at warning motorcyclists about potential collisions are discussed; nevertheless, the development and accuracy of these systems are not fully established. In Ref. 18, information about the "dangerousness of curves ahead" is presented to the motorcyclist to avoid possible accidents. The given technique relies on the GPS to estimate the motorcycle position and to estimate the approaching curve; the technique is not sufficiently proven as it has only been tested in a laboratory environment by means of a simulator. Next, the authors of Ref. 19 proposed a vision-based collision warning system for motorcycle; a mobile phone camera, mounted on the front side of a motorcycle, was used to detect the frontal vehicles, and the GPS was used to estimate the distance.
Images and videos provide rich data sources from which additional information and context can be surmised. Cameras provide a wide field of view, allowing for the detection and tracking of (moving) objects across multiple lanes. In general, vehicle detections using cameras can be classified as stereo and monocular. 28 Stereo-based methods require two images, leading to the increase in system complexities and costs. In contrast, monocular vision-based vehicle detections have mirrored advances in computer vision, machine learning, and pattern recognition.
For a monocular-based system, the determination of vehicle locations is performed by analyzing the vehicle's motion or appearance. 28 In a motion-based technique, an optical flow method is used to detect the vehicles. 29,30 Motion-based methods are effective for detecting moving objects; however, they are computationally intensive and require analysis of several frames before an object can be detected. They are also sensitive to camera movement and may fail to detect objects with slow relative motion. 31 As such, motion-based techniques are less commonly used for vehicle detections.
On the other hand, appearance-based detection techniques detect vehicles based on shadow underneath the vehicles, 32 color, 33 symmetry, 27,31 texture, 34 lights, 35 and edges. 36 In this paper, a rear-end vision-based collision detection system has been proposed for motorcyclists. This system detects incoming vehicles from the rear-end using a single camera; it has been tested on different road scenarios and on available online datasets to evaluate its performances.

Related Work
Recent related works for vehicle detections are given in Table 1. This table covers the detection type and the dataset properties; it also highlights some results through metrics, such as accuracy and true positive rate (TPR), and it further provides the limitation of these techniques. In Table 1, the "detection type" column defines whether the technique is applied either for the front or rear-end vehicle detection. The "dataset" column provides information related to the videos, the number of images, or the type of vehicles involved in the experiments. In Ref. 32, a single camera was used to detect front vehicles. The technique uses a combination of two features namely the shadow underneath the vehicle and horizontal edges for vehicle detection; the analysis of consecutive frames is used to calculate the relative speed of the detected car. Unavoidably, to detect the front vehicle license plate, both vehicles should be very close to each other, imposing a constraint that could be dangerous under a high-speed scenario. As reported, this technique is quite slow, as it processes only four frames per second.
The authors of Ref. 37 presented a monocular visionbased rear vehicle detection and tracking system for car drivers. The camera was positioned looking backward out of the rear windshield. The application was for the detection of the front parts of the approaching vehicles to assist the driver in lane changing. Symmetry and edge operators were used to generate the region of interest (ROI). Subsequently, vehicles were detected using Haar wavelet features that were later fed to a support vector machine (SVM) classifier.
In Ref. 38, a rear-end vehicle detection under low light conditions has been proposed. The technique identifies the vehicle headlamp pairs using a region growing threshold and a cross-correlation bilateral symmetry analysis method. This technique performs a perspective transformation to correct the distortion and ensure consistent detection performance throughout all road manoeuvers. Finally, a Kalman filter is used for tracking purposes. Unfortunately, this technique is only effective at night or when the light condition is very low.
The authors of Ref. 39 detected vehicles in adjacent lanes by placing a camera at the left side rear-view mirror. The camera captures the images in the adjacent lane to detect vehicles. This technique uses a neuro-fuzzy network to detect the vehicles. The training of the neuro-fuzzy network plays an important role in the detection process.
In Ref. 40, a monocular vision-based technique has been proposed to detect front vehicles. Histograms of oriented gradients (HOG) have been used to extract the features, and SVM has been used for classification. Shadows underneath the vehicles have been used as a feature for vehicle detections. However, if the system fails to select the shadow area, the vehicle detection is then performed using HOG features, which result in larger amounts of calculations and slower processing speed.
In Ref. 41, license plate detection has been used to identify front vehicles. This technique first detects the license plate of the front vehicle and verifies it by using the  AdaBoost was used to train a shadow detector offline, and SVM has been used for the classification.

Data Set
In this research, a Sony action cam was installed at the rearend of a motorcycle (Yamaha, 115cc) to acquire the dataset    as shown in Fig. 1. The camera was mounted in a shocked proof casing to minimize any vibration effects. The video datasets were recorded in ".avi" format at 30 frames per second and at a resolution of 600 × 800 pixels. A total of 5000 frames containing 120 vehicles were used for testing. The video datasets were recorded in different road scenarios having different light conditions. During the video recordings, the driving speed of the motorcycle varied from 40 to 80 km∕h. The datasets were recorded along Ipoh-Lumut highway, Perak, Malaysia and inside Universiti Teknologi PETRONAS (UTP), Malaysia. Due to the limitations of online motorcycle datasets, the proposed technique has also been validated using online car datasets. [43][44][45] The "source-2" dataset is available at Ref. 46. The details of the datasets are given in Table 2.

Vehicle Detection Technique
The steps leading to the algorithm development are shown in Fig. 2.
First, the video frame is transformed into gray scale to enhance the computation performance. Afterward, the ROI is computed. To get the ROI, a fixed area from the top of every grayscale frame is excluded to remove the sky and other unwanted regions. Next, the left and right boundaries of the ROI image are kept the same as the input frame, as shown in Fig. 2. After that, binary road region segmentation and vehicle detections are performed; both approaches will be explained in the next sections.

Binary Road Region Detection Technique
The binary road region detection technique is illustrated in Fig. 3.
To find the binary road region, first and foremost six square patches have been extracted from the ROI image. All the selected patches have the same size and their locations are fixed for each ROI image. In our case, these patches are selected near the bottom of the ROI image, which is adjacent to the motorcycle. At that place, there is a low probability of vehicle presence and a high probability that it contains only the road region. The locations of these userdefined patches are shown in Fig. 4.
For the first ROI image or the frame, the average grayscale value for all the six patches has been calculated. This average grayscale value named as a mean value (MV) is also saved as a previous mean value (PMV). Next, this value is used as a threshold to convert the ROI image into a binary road image. The usage of the PMV is explained below.
Assume that MV, PMV, and MðiÞ are the corresponding MV, PMV, and the average grayscale value of the individual patch, where i ¼ 1;2; 3;:::;6; the MV and PMV for the first image can be calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 6 7 5 MV ¼ PMV ¼ P Most of the roads have white lines or some lane marks. These white lines or lane marks are normally used to guide the motorcyclists and other vehicle drivers into an appropriate direction, as shown in Fig. 5.
It should be noted that it may be possible for some of these six patches to contain a white line or a lane marker region affecting (i.e., increasing) the MV of these patches. Therefore, only patches having a mean gray level lower than an empirical threshold are kept, and the affected one(s) is/are discarded for the remaining frames.
The threshold value for the patch selection is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 5 1 6 MðiÞ ≤ 100: Then, as presented in Fig. 3, two possible scenarios arise: (1) one or more patches contain a road region and (2) no patch contains a road region.

One or more patches containing road region
If one or more patches contain the road regions, the average grayscale values of these patches are used to calculate the MV. This MV is also stored as a PMV and is used to convert that ROI image into a binary image.
For example, if Mð1Þ, Mð2Þ, and Mð6Þ are the average grayscale values of the patches that contain the road region [i.e., Mð1Þ, Mð2Þ,a n dMð6Þ ≤ 100 while Mð3Þ, Mð4Þ, and Mð5Þ > 100], then MV and PMV are computed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 6 0 4

No patch containing road region
If no patch contains a road region [i.e., Mð1Þ;Mð2Þ;:::;Mð6Þ > 100], then the PMV from the previous frame is used as the MV to convert the current ROI image into a binary image. From our observation, the binary road detection technique gives reliable results, even when lane markers are present on the road. Figure 6 shows the results of the binary road region detection. From Fig. 6, one can see that the road boundary is clearly visible and it can help to differentiate between the road region and the unwanted area.

Vehicle Detection
To find the vehicle pattern initially, Sobel edge detection 48 with Otsu threshold 49 has been applied onto the ROI image. The resultant image consists of the lane markers and vehicle footprints as shown in Fig. 7. In this figure, the road boundaries, vehicles edges, and other object edges are clearly visible. The edges of the lane markers can also be seen in the figure. However, the differentiation between vehicle footprints and other objects is still very difficult at this stage.
To minimize false vehicle detections, it is important to remove the lane markers from the Sobel edge detected image. This is achieved by subtracting the binary image from the Sobel edge detected image. By applying this technique, lane markers are being effectively removed from the resultant image. The image is referred as lane marker free (LMF) image, as shown in Fig. 8.
In Fig. 8, vehicle edges and road boundaries are clearly visible. The lane markers or any other unwanted noise at the road region have been removed. Therefore, a vehicle footprint detection is much easier and simpler at this stage. To acquire the vehicle footprints, Harris corner detection 50 and Hough transform have been applied to the LMF image. The results of Harris corner detection are shown in Fig. 9.
Hough transform is used to detect the angular lines, which further helps in detecting the vehicle footprints and removing the shadow regions. Assume that h, m h , and a h are the length, slope, and the angle of the angular line, respectively, obtained from Hough transform. If (x 1 ;y 1 ) and (x 2 ;y 2 ) are, respectively, the start and the end points of the angular line, then the length h, slope m h , and angle a h are calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 5 7 6 h ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 5 3 8 In our technique, the angular range selected for these lines is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 6 3 ; 7 1 1 85 deg ≤ a h ≤ 92 deg : It may be possible that the selected lines still contain the shadow edges. As the shadow edges are bigger than the vehicle edges and the camera is fixed at the rear end of the motorcycle, a threshold condition is applied on these lines to get the vehicle footprints. The threshold condition for the length (h) selection of these lines is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 6 3 ; 6 1 4 10 pixels ≤ h ≤ 50 pixels: (8) The selected lines are shown in Fig. 10.
For the vehicle footprint validation, a comparison of the selected lines and Harris corner detection is performed. For this comparison, initially, an image is divided into small patches. In all patches, the slopes of the selected lines are computed. For example, in any patch, if (x p ;y p )and(x q ;y q ) are the starting and the end points of the selected line, respectively, then the slope m h of this line is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 3 2 6 ; 7 3 0 If (x r ;y r ) is the position of the corner detected by the Harris corner detection technique, then its slope m and distance d with respect to the initial point of the selected line is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 3 2 6 ; 6 4 3 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 3 2 6 ; 6 0 2 These three parameters m h , m, and d permit us to assess if the selected line is passing through the Harris corner landmark or if it is very close to it (up to four pixels); in that case, it is selected as vehicle footprints. If the selected line intersects the Harris corner, then the values of both slops (i.e., m h and m) will be equal. Similarly, the value of d provided the information that how far is the Harris corner from the initial point of the selected line.
For the proposed technique, the size of six selected patches for ROI, their locations, and the values of other parameters such as MðiÞ, a h , and h are kept the same for all types of road scenarios and light conditions. The same values are used for the self-recorded and online datasets. Each dataset has different frame sizes and a different placement of the camera; therefore, the fixed area that has been excluded from the top of every grayscale frame varies for each dataset.
Last, the binary road image is used to make a decision whether the selected lines represent a vehicle or not. Indeed, the boundaries of the road are clearly black, which enables us to mark the road edges and allows us to discard the lines appearing outside of these marked edges. The remaining lines contain the vehicles footprint.

Experimental Results
The given technique was implemented on the Intel ® core ™ i7-4770 CPU (3.4 GHz processor dual core, installed memory 16GB). Our technique was investigated in C++ using OpenCV. Our method was evaluated on the selfrecorded dataset as well as on the online datasets [43][44][45] for comparison purposes. The results show that our technique works perfectly well for the motorcycle (using the selfrecorded dataset) as well as for the online car datasets. It can detect both incoming and outgoing vehicles perfectly from the rear-end on single and multiple lanes. The technique is also capable of detecting parked vehicles on the roadside. Some of the results from the self-recorded dataset are shown in Fig. 11.
In Fig. 11, the detection of rear-end vehicles is shown for the self-recorded dataset. In this figure, one can see the detection of the parked cars along the roadside as well as correct detection of truck, even when lane markers are present on the road.
Next, Fig. 12 shows the detection of vehicles from the online dataset. [43][44][45] The proposed technique can detect the vehicles even in the presence of shadow regions. As expected, it detects the vehicles up to a certain distance (determined by our various thresholds) as shown in Fig. 12(d).
Sometimes, the method fails to detect some vehicles and generate false alarms as shown in Fig. 13. This may occur due to the road structure as shown in Figs. 13(a), 13(b), 13(e), and 13(f). In Fig. 13(a), the road is in a tilted position; therefore, the horizontal lines obtained from the Hough transform did not fit the selected line criteria, and the proposed technique missed that vehicle. In Fig. 13(b), the area below the road barrier is detected as a road region and this leads to the false detection, as a vehicle. Similarly, in Fig. 13(d), the footpath around the right side of the road is detected as a road region; therefore, the object on it is detected as a vehicle. From Fig. 13(f), we can see that the road is damaged and contains many horizontal cracks, which sometimes are detected as vehicles, leading to false detections. In Fig. 13(c), the car in the middle lane is occluded by the shadow created by other cars and could not be possibly detected. On the whole, this technique still obtained very less false detections. Comparison of the proposed technique with the existing state-of-the-art rear-view-based vehicle detection methods is given in Table 3. For the given technique, accuracy, TPR, and false detection rate (FDR) are given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 6 3 ; 2 6 0 Accuracyð%Þ¼ where TP, TN, FP, and FN refer to true positives, true negatives, false positives, and false negatives, respectively. Also for each technique, an average frame rate processing time has been computed, and it is equal to the average number of frames processed in 1 s.
From Table 3, one can see that for the self-recorded dataset, the accuracy is 95.1%. When comparing with the existing state-of-the-art works, our technique shows improved performances on all the quantitative criteria. The reason for a better FDR is due to the patch selection method, which enables to predict an enhanced binary road region. The second main reason is the length of the selected Hough lines, which helps to remove the shadow regions.
Our technique achieves a high accuracy for source-2 dataset with respect to the work of Choi. 45 For LISA-dense and LISA-urban datasets, our method achieves higher TPR as compared with the recent research work of Sivaraman 43 and Satzoda and Trivedi. 51 For LISA-Sunny dataset, our technique provides a TPR that is a bit less as compared with the techniques of Sivaraman 45 and Satzoda and Trivedi. 51 However, the FDR of our method on LISA-sunny dataset is less compared with the existing work. For iROADS-daylight dataset, our technique's TPR and FDR are lower as compared with the work of Satzoda and Trivedi, 51 but a higher frame rate is achieved. Satzoda's 51 technique performs better under sun light condition, whereas our technique performs good for all light conditions and traffic scenarios. The proposed work achieves higher frame rate for the self-recorded dataset and for the online datasets as compared to the all existing methods. 43,45,51 The rapid vehicle detection gives more time to the motorcyclist to take a correct decision, therefore making it more suitable for motorcycle applications.
Due to the use of low level of features (such as Harris edge detection and the lines computed from Hough transform for vehicle foot print detections), higher computing performance has been obtained. The proposed method achieves a higher accuracy in lesser time, which makes it efficient for motorcycle applications.

Conclusion
From this work, it can be concluded that the proposed technique is effective for rear-end vehicle detections for motorcycle applications. The method presented in this paper achieves higher accuracy and better results in different road scenarios compared with other methods recently published. The patch selection method for the binary road detection contributes a lot to reduce the false detections and produce reliable results, even when shadows or different lane markers are present on the road. The size selection of the lines, computed from Hough transform, also helps to avoid the shadow regions and improve the accuracy. It shows a very good performance for the motorcycle using the rear-end dataset as well as utilizing the online vehicle frontal datasets. The proposed method provides a reliable accuracy, making it more trustworthy for vehicle applications. It achieved higher accuracy, TPR, and frame rate for many road scenarios as compared with the existing state-of-the-art methods. Also, this technique provides an upright vision-based rear-end collision detection system for the motorcyclists. It is believed that this method could help to reduce the motorcyclist fatality rate, when integrated with vibrational or auditory warning-based alert systems. These aspects are currently being investigated.