Outlier detection approach based on local outlier factor for datasets with mixed attributes

Taegu Kim, Nam Wook Cho

Research output: Contribution to journalArticlepeer-review

Abstract

Although outlier detection has received significant attention by practitioners as well as researchers, its application to datasets consisting of both categorical and numerical attributes still remains a challenge. In this paper, a novel approach based on the local outlier factor (LOF) and similarity measure is proposed to tackle the challenge. Occurrence frequency similarity is adopted to measure the closeness ofcategorical data and derive a continuous distance accordingly. Two distances from categorical and numerical attributes are merged and input to the LOF calculation to identify outliers. Test results on various datasets confirm that the proposed approach provides superior performance for all cases compared to the simple numerical approach. The consistent superiority over the benchmark validates that the similarity measure successfully captures the characteristics of categorical data.

Original languageEnglish
Pages (from-to)2155-2160
Number of pages6
JournalICIC Express Letters, Part B: Applications
Volume7
Issue number10
StatePublished - 1 Oct 2016

Keywords

  • Categorical data
  • Local outlier factor
  • Mixed type data
  • Outlier detection
  • Similarity

Fingerprint

Dive into the research topics of 'Outlier detection approach based on local outlier factor for datasets with mixed attributes'. Together they form a unique fingerprint.

Cite this