Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

Classifying galaxies according to their H i content


Rafieferantsoa,  Mika
Computational Structure Formation, MPI for Astrophysics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Andrianomena, S., Rafieferantsoa, M., & Davé, R. (2020). Classifying galaxies according to their H i content. Monthly Notices of the Royal Astronomical Society, 492(4), 5743-5753. doi:10.1093/mnras/staa234.

Cite as: https://hdl.handle.net/21.11116/0000-0006-71E4-9
We use machine learning to classify galaxies according to their H i content, based on both their optical photometry and environmental properties. The data used for our analyses are the outputs in the range z = 0–1 from mufasa cosmological hydrodynamic simulation. In our previous paper, where we predicted the galaxy H i content using the same input features, H i-rich galaxies were only selected for the training. In order for the predictions on real observation data to be more accurate, the classifiers built in this study will first establish if a galaxy is H i rich (⁠log(MHI/M)>−2⁠) before estimating its neutral hydrogen content using the regressors developed in the first paper. We resort to various machine-learning algorithms and assess their performance with some metrics such as accuracy, f1, AUC PR, precision, specificity, and log loss. The performance of the classifiers, as opposed to that of the regressors in previous paper, gets better with increasing redshift and reaches their peak performance around z = 1 then starts to decline at even higher z. Random forest method, the most robust among the classifiers when considering only the mock data for both training and test in this study, reaches an accuracy above 98.6 per cent at z = 0 and above 99.0 per cent at z = 1, which translates to an AUC PR above 99.93 per cent at low redshift and above 99.98 per cent at higher one. We test our algorithms, trained with simulation data, on classification of the galaxies in RESOLVE, ALFALFA, and GASS surveys. Interestingly, SVM algorithm, the best classifier for the tests, achieves a precision, the relevant metric for the tests, above 87.60 per cent and a specificity above 71.4 per cent with all the tests, indicating that the classifier is capable of learning from the simulated data to classify H i-rich/H i-poor galaxies from the real observation data. With the advent of large H i 21 cm surveys such as the SKA, this set of classifiers, together with the regressors developed in the first paper, will be part of a pipeline, a very useful tool, which is aimed at predicting H i content of galaxies.