[83.05] A Multivariate Statistical Analysis to Guide Classification of 2MASS/DPOSS Galaxies Using Data Mining Techniques

J. Mazzarella, T. Jarrett, S. Odewahn, R. Cutri, T. Chester, M. Schmitz, S. Monkewitz, B. Madore (Caltech)

The Spring 1999 Incremental Release of the Two Micron All-Sky Survey (2MASS) Extended Source Catalog (XSC) contains new near-infrared measurements for about eighty thousand extended objects, most of which are previously uncatalogued galaxies. Likewise, the Second Generation Digital Palomar Observatory Sky Survey (DPOSS) provides a rich archive of new visual measurements over the same regions of the sky. Concise graphical and statistical summary data are used to systematically quantify the source densities in various slices of the 2MASS+DPOSS parameter space, including BRIJHK color space, concentration indices, central and average surface brightnesses, and isophotal parameters. Results are also presented for a global principal components analysis of this merged 2MASS+DPOSS dataset for the Spring 1999 XSC sample, with the primary goal of identifying the most important linear combinations of variables to feed into a decision-tree algorithm which will be applied in a follow-up study to attempt supervised classification of previously uncatalogued galaxies. An initial cross-comparison with the current NASA/IPAC Extragalactic Database (NED) shows that approximately 10% of the Spring 1999 XSC sample are previously catalogued objects. Distributions of 2MASS/DPOSS sources with published morphological types and nuclear activity levels (starburst, LINER, Seyfert) available in NED are summarized in the context of forming a training set for a machine learning classifier.

