High-dimensional datasets are increasingly common in learning problems, in many different domains, such as text categorization, genomics, econometrics, and computer vision. The excessive number of features carries the problem of memory usage in order to represent and deal with these datasets, clearly showing the need for adequate methods for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements for the storage of these datasets.
It is often the case that filter approaches are the only applicable option, on high-dimensional datasets, where wrappers and embedded methods can be too expensive. Moreover, some filter approaches are also computationally prohibitive for high-dimensional datasets. This talk addresses (supervised and unsupervised) efficient techniques for feature discretization and feature selection suitable for high-dimensional datasets. These techniques attain competitive results and can also act as pre-processors for more sophisticated methods (e.g. wrappers). A set of experimental results on microarray and computer vision datasets is discussed.