Knowledge Discovery in Road Accidents Database - Integration of Visual and Automatic Data Mining Methods

Imad Abugessaisa


Road accident statistics are collected and used by a large number of users and this can result in a huge volume of data which requires to be explored in order to ascertain the hidden knowledge. Potential knowledge may be hidden because of the accumulation of data, which limits the exploration task for the road safety expert and, hence, reduces the utilization of the database. In order to assist in solving these problems, this paper explores Automatic and Visual Data Mining (VDM) methods. The main purpose is to study VDM methods and their applicability to knowledge discovery in a road accident databases. The basic feature of VDM is to involve the user in the exploration process. VDM uses direct interactive methods to allow the user to obtain an insight into and recognize different patterns in the dataset. In this paper, I apply a range of methods and techniques, including a paradigm for VDM, exploratory data analysis, and clustering methods, such as K-means algorithms, hierarchical agglomerative clustering (HAC), classification trees, and self-organized-maps (SOM). These methods assist in integrating VDM with automatic data mining algorithms. Open source VDM tools offering visualization techniques were used. The first contribution of this paper lies in the area of discovering clusters and different relationships (such as the relationship between socioeconomic indicators and fatalities, traffic risk and population, personal risk and car per capita, etc.) in the road safety database. The methods used were very useful and valuable for detecting clusters of countries that share similar traffic situations. The second contribution was the exploratory data analysis where the user can explore the contents and the structure of the data set at an early stage of the analysis. This is supported by the filtering components of VDM. This assists expert users with a strong background in traffic safety analysis to be able to intimate assumptions and hypotheses concerning future situations. The third contribution involved interactive explorations based on brushing and linking methods; this novel approach assists both the experienced and inexperienced users to detect and recognize interesting patterns in the available database. The results obtained showed that this approach offers a better understanding of the contents of road safety databases, with respect to current statistical techniques and approaches used for analyzing road safety situations.

Full Text:



  • There are currently no refbacks.