A strategy to identify and select the most relevant variables to study problems in the exact sciences, when large databases of data have to be explored, is formulated. It consists of a first exploratory stage, performed mainly with the classification and regression tree method, to determine the list of most relevant signals to be used in the analysis of the phenomenon of interest. A linear correlation technique, followed by a nonlinear correlation technique (principal component analysis and autoassociative neural networks (NNs), respectively), is then applied to reduce the number of signals to the ones containing nonredundant information. The potential of the approach is illustrated by an application to the problem of identifying the confinement regime in the Joint European Torus. The minimum set of signals has been used to train an NN, and its performance is compared with that of various theoretical models. The success rate of the NN is very high, and it generally further outperforms the available theoretical models. © 1973-2012 IEEE.
All Science Journal Classification (ASJC) codes
- Nuclear and High Energy Physics
- Condensed Matter Physics
Murari, A., Mazon, D., Martin, N., Vagliasindi, G., & Gelfusa, M. (2012). Exploratory data analysis techniques to determine the dimensionality of complex nonlinear phenomena: The L-to-H transition at JET as a case study. IEEE Transactions on Plasma Science, 40(5 PART 2), 1386 - 1394. . https://doi.org/10.1109/TPS.2012.2187682