IMN

Biblio. IMN

Référence en vue solo

Avval, T. G., Moeini, B., Carver, V., Fairley, N., Smith, E. F., Baltrusaitis, J., Fernandez, V., Tyler, Bonnie. J., Gallagher, N. & Linford, M. R. (2021) The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE). J. Chem. Inf. Model. 61 4173–4189. 
Added by: Richard Baschera (2021-10-22 10:40:59)   Last edited by: Richard Baschera (2021-10-22 10:43:28)
Type de référence: Article
DOI: 10.1021/acs.jcim.1c00244
Numéro d'identification (ISBN etc.): 1549-9596
Clé BibTeX: Avval2021
Voir tous les détails bibliographiques
Catégories: IMN, INTERNATIONAL
Créateurs: Avval, Baltrusaitis, Carver, Fairley, Fernandez, Gallagher, Linford, Moeini, Smith, Tyler
Collection: J. Chem. Inf. Model.
Consultations : 1/352
Indice de consultation : 8%
Indice de popularité : 2%
Liens URLs     https://doi.org/10.1021/acs.jcim.1c00244
Résumé     
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing data—they are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
  
Notes     
Publisher: American Chemical Society
  
wikindx 4.2.2 ©2014 | Références totales : 2830 | Requêtes métadonnées : 54 | Exécution de script : 0.12322 secs | Style : Harvard | Bibliographie : Bibliographie WIKINDX globale