International Conference on Advanced Technologies, Computer Engineering and Science

Feature Selection for Gender Classification in TUIK Life Satisfaction Survey

Adil Çoban Ilhan Tarimer

Abstract

As known, attribute selection is a method that is used before the classification of data mining. In this study, a new data set has been created by using attributes expressing overall satisfaction in Turkey Statistical Institute (TSI) Life Satisfaction Survey dataset. Attributes are sorted by Ranking search method using attribute selection algorithms in a data mining application. These selected attributes were subjected to a classification test with Naive Bayes and Random Forest from machine learning algorithms. The feature selection algorithms are compared according to the number of attributes selected and the classification accuracy rates achievable with them. In this study, which is aimed at reducing the dataset volume, the best classification result comes up with 3 attributes selected by the Chi2 algorithm. The best classification rate was 73% with the Random Forest classification algorithm.



Conference
International Conference on Advanced Technologies, Computer Engineering and Science
Keywords
Data algorithms attribute selection data mining Orange program machine learning

Language
English

Subject
Computer Science

Full Paper (PDF)

251 views
362 downloads