The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.

Supervised Nested Algorithm for Classification Based on K-Means

Nieddu Luciano;
2020-01-01

Abstract

The aim of this paper is to present an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification. Some of the most robust techniques for supervised classification are those based on classification trees that make no assumptions on the parametric distribution of the data and are based on recursively partitioning the feature space into homogeneous subsets of units according to the class the entities belong to. One of the shortcomings of these approaches is that the recursive partitioning of the data, i.e. the growing of the tree, is achieved considering only one variable at a time and, although it makes the tree pretty simple in terms of determining rules to obtain a classification, it also makes them hard to interpret. Building on these ideas we carry the integration of parametric model into trees one step further and propose a supervised classification algorithm based on the k-means routine that sequentially splits the data according to the whole feature vector. Results from applications to simulated data are shown to address the potentiality of the proposed method in different conditions.
2020
978-981-15-3311-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14090/2788
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact