Asteroid material classification based on multi-parameter constraints using artificial intelligence

Jiayi Ge; Xiaoming Zhang; Juan Li; Mingtao Li; Yijun Tang; Yunxiao Jiang; Chengzhi Liu; Zhe Kang; Xianqun Zeng; Xiaojun Jiang

doi:10.1051/0004-6361/202451971

Home

All issues

Volume 692 (December 2024)

A&A, 692 (2024) A100

Full HTML

Open Access

Issue		A&A Volume 692, December 2024


Article Number		A100
Number of page(s)		14
Section		Planets, planetary systems, and small bodies
DOI		https://doi.org/10.1051/0004-6361/202451971
Published online		04 December 2024

A&A, 692, A100 (2024)

Asteroid material classification based on multi-parameter constraints using artificial intelligence

Jiayi Ge¹^,2^★, Xiaoming Zhang¹^,2^★, Juan Li¹^,2, Mingtao Li²^,3, Yijun Tang⁴, Yunxiao Jiang⁴, Chengzhi Liu²^,5, Zhe Kang²^,5, Xianqun Zeng¹ and Xiaojun Jiang¹^,2^★

¹ CAS Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China
² University of Chinese Academy of Sciences, Beijing 100049, China
³ National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China
⁴ School of Physics, Zhejiang University of Technology, Hangzhou 310014, China
⁵ Changchun Observatory, National Astronomical Observatories, Chinese Academy of Sciences, Changchun 130117, China

^★ Corresponding authors; [email protected]; [email protected]; [email protected]

Received: 23 August 2024
Accepted: 11 October 2024

Abstract

Context. Material types of asteroids provide key clues to their evolutionary history and contained resources. The Gaia mission has released extensive low-resolution spectral observation data of small Solar System bodies. However, methods for classifying asteroids based on low-resolution space-based spectra are still inadequate, and do not fully leverage the complementary features of spectra and multiple intrinsic attributes of asteroids to achieve precise material classification.

Aims. Our goal is to propose a method with a higher generalization accuracy for asteroid material classification by integrating multi-source information, identifying optimal feature combinations for model inputs, and deepening the understanding of relationships among asteroid parameters.

Methods. The effective asteroid photometric, physical, and orbital parameters were screened using the information gain ratio and Spearman’s rank correlation coefficient. Then, artificial intelligence techniques were employed to combine asteroid spectra with the selected various parameters for six-class material classification. By comparing five machine learning models, we identified network structures with higher validation accuracy and stable generalization performance. Meanwhile, feature ablation experiments were conducted to determine the input parameter combinations suitable for different scenarios. Finally, based on the statistical results and model outputs, the constraint relationships among asteroid parameters were visualized and analyzed.

Results. The proposed AsterRF model achieved a validation accuracy of 92.2%, an improvement of approximately 7.8 percentage points compared to existing methods that use only spectra. V-type asteroids exhibited the highest classification accuracy, followed by A-type and D-type. X-type asteroids had the lowest precision and recall, and were easily confused with C-type. The model generally showed higher classification confidence for S-type asteroids. The top five attributes that the model focused on are the phase slope parameter (G), orbital type, albedo, H magnitude, and effective diameter. Additionally, the correlations between asteroid materials and other parameters were generally below 0.4.

Conclusions. Incorporating optimal asteroid parameter combinations can significantly enhance classification accuracy based on spectra. A dual-channel network that processes spectra and parameter inputs separately, and employs a self-attention mechanism for feature fusion is effective in combining multi-source asteroid information. Both the statistical correlations and model performance-based importance rankings of parameters contribute to understanding the constraint relationships among asteroid attributes.

Key words: methods: data analysis / techniques: miscellaneous / surveys / minor planets, asteroids: general

© The Authors 2024

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Asteroids hold significant value for the study of the formation and evolution of the Solar System and planets, as well as the origin of life on Earth (DeMeo & Carry 2014). Asteroids of different material are believed to have formed in various regions of the early Solar System, and may have experienced collisions and orbital migrations throughout their long evolutionary history (Bottke et al. 2005; Sahijpal 2023; Takir et al. 2023). Additionally, asteroids with different surface compositions contain varying types and relative abundances of minerals and other resources (DeMeo et al. 2009; Binzel et al. 2019). Therefore, classifying asteroid material types is not only fundamental to studying their physical properties, but is also crucial for supporting close-up exploration (Wei et al. 2024), space resource development (Lin et al. 2024; Wang & Yao 2024), and planetary defense (Li et al. 2020; Zou et al. 2024). Currently, only about 30 asteroids have obtained accurate or even detailed surface composition and distribution information through in situ and flyby missions (Wei et al. 2024). Observing the spectra or multicolor information of asteroids using telescopes to determine their material types remains the primary method.

Asteroid material classification studies, based on the data used, include utilizing multi-color photometry (Zellner et al. 1985; Popescu et al. 2018), polarimetric observations (Belskaya et al. 2017), albedo (Mahlke et al. 2022), or spectroscopy (Binzel et al. 2019; Luo et al. 2023), as well as classifying asteroid orbital types using orbital elements (Hossain & Zabed 2023). Building on previous research (Tholen 1984; Bus 1999; Bus & Binzel 2002a,b), DeMeo et al. (2009) expanded the spectral range for asteroid classification from visible light to visible–near-infrared (VisNIR) wavelength. The developed Bus-DeMeo taxonomy classifies the 0.45–2.45 μm VisNIR spectrum into 24 categories and has become a widely used asteroid taxonomic scheme (Villanueva et al. 2018; Penttilä et al. 2021). Generally, the longer the spectral range, the better the differentiation between asteroid types. Relying solely on the visible or near-infrared bands makes it difficult to clearly distinguish many types, such as the C and X complexes (DeMeo et al. 2009). The wavelength range, data quality, and resolution of the spectra, along with the performance of the classification methods, all impact the reliability of the classification results.

In 2023, the Gaia space telescope (Gaia Collaboration 2016) released the Gaia Data Release 3 (DR3) dataset, which includes the spectra of more than 50000 asteroids (Gaia Collaboration 2023a,b). Prior to this, asteroid spectra were primarily obtained through ground-based telescope observations. The Gaia DR3 spectra include quality indicators for each wavelength, which allow us to identify high-quality data. However, they cover only the visible light wavelength and have a much lower spectral resolution of 44 nm (Gaia Collaboration 2023a) compared to most ground-based telescope observations (Binzel et al. 2019). This poses a challenge for accurately determining the types of a large number of asteroids. Thus, developing reliable classification methods for low-resolution spectra is urgently needed.

From a methodological perspective, manually designed asteroid spectral classification methods typically utilize principal component transformation, combining spectral slope and absorption features to design a detailed classification process (DeMeo et al. 2009). Whenever faced with new asteroid spectral dataset, this traditional method requires the creation of a new set of classification rules (Penttilä et al. 2021), and the same applies to Gaia data. In recent years, machine learning-based methods such as Naive Bayes (Klimczak et al. 2021), support vector machines (Klimczak et al. 2021), random forests (Chao et al. 2017), and artificial neural networks (ANNs; Penttilä et al. 2021; Klimczak et al. 2021; Luo et al. 2023; Korda et al. 2023; Muinonen et al. 2023) have been employed to achieve automatic classification of asteroids. Machine learning techniques can enable models to autonomously learn features from data with class labels as supervision, which eliminate the need to manually design features for each step of the classification process (Li et al. 2022). Among these methods, ANNs demonstrate strong nonlinear fitting capabilities, data mining abilities, and rapid inference efficiency, and show good performance in several related studies.

Penttilä et al. (2021) proposed a method using a single-hidden-layer feedforward neural network to classify groundbased VisNIR spectra of asteroids and simulated Gaia spectra, achieving classification accuracies of 90.6% and 86.5%, respectively, for 11 classes. Luo et al. (2023) proposed a similar neural network aimed at classifying asteroid spectra observed by the future Chinese Space Survey Telescope (CSST). The authors conducted experiments on ground-based data with a spectral range similar to that of the CSST and achieved 87% accuracy for a ten-class classification on a test set. Muinonen et al. (2023) utilized an ANN to analyze asteroid spectra from the Gaia DR3 dataset, combined with photometric slopes, achieving approximately 82% accuracy in a four-class material classification. This represents one of the few studies to date that classifies asteroid using real space telescope observation spectral data.

However, existing studies primarily rely on a single type of data, either spectra or multi-color photometric data. The material type of asteroids may potentially correlate with their physical properties such as size, albedo, as well as orbital parameters and photometric information. For example, Erasmus et al. (2018) found statistically significant correlations between the probabilities of asteroids being of C-type or S-type and their semimajor axis in a study of 1000 main-belt asteroids. Therefore, it is crucial to leverage the data mining advantages of artificial intelligence (AI) to extract deep features from multi-dimensional input data that are constraining for asteroid types, with the aim of achieving more reliable classification results. Furthermore, the statistical studies on the correlation of various parameters with asteroid types are not yet comprehensive, particularly in identifying the effective parameter combinations for AI models. Additionally, current deep learning methods primarily use fully connected networks, with insufficient comparison to other algorithms. As a result, there is an opportunity to optimize model architectures and enhance their generalization abilities.

In this paper, we propose a new framework for asteroid material classification that effectively integrates asteroid spectral data, physical parameters, orbital parameters, and photometric information using AI technology, thereby achieving reliable determination of asteroid types. The key objectives in this process include uncovering the potential constraint relationships and their importance order between different parameters and material types, as well as presenting classification models with improved accuracy for handling short-wavelength and ultra-lowdispersion spectra with multi-source information.

The remainder of this paper is organized as follows. Section 2 introduces the data used and the dataset production process. Section 3 describes the specific methods. The experimental results and a discussion are provided in Sect. 4. We draw our conclusions in Sect. 5.

2 Data

Section 2.1 introduces the sources and relevant information for the asteroid spectra, parameters, and type labels used in this study. The data preprocessing procedures, including data cleaning and category merging, are described in Sect. 2.2. Finally, Sect. 2.3 explains the data augmentation process and the dataset division for training, validation, and testing of the proposed method.

2.1 Data sources

2.1.1 Spectra and parameters

The spectral data are sourced from Gaia DR3 (Gaia Collaboration 2023a,b), which includes the average reflectance spectra of 60518 Solar System objects (SSOs) observed between August 5, 2014, and May 28, 2017. We specifically extracted asteroid spectra from this dataset. These spectra were measured using red and blue photometers; each spectrum was composed of 16 discrete wavelength bands spanning from 0.374 to 1.034 μm (Gaia Collaboration 2023a). Reflectance was normalized to 1 at 0.55 μm.

In addition to the spectra, various asteroid parameters are also employed for classification. Specifically, the photometric parameters used include the absolute magnitude, the phase slope parameter, and the maximum light curve amplitude. The physical parameters include the rotation period, effective diameter, and albedo. The orbital parameters consist of a total of 13 elements. The photometric and physical parameters are sourced from the Asteroid Lightcurve Database (LCDB; Warner et al. 2021) within the Planetary Data System (PDS), which is a long-term archive of digital data products returned from NASA’s planetary missions. We note that the parameters in this database are incomplete and have missing values. The orbital parameters are sourced from the Minor Planet Center (MPC). The detailed parameter information is shown in Table 1).

The majority of G values in the LCDB are derived using the H-G system by fitting photometric observation data. Some surveys that have produced a large number of rotation periods also use G1 values under the H-G1,G2 system (Warner et al. 2021). The majority of albedo values are assumed (based on orbital group) or derived from H magnitude and diameter, while some are estimated from infrared observations or taxonomic type. The effective diameters are primarily calculated using H magnitude and albedo; some are estimated from infrared observations. Additionally, the semilatus rectum describes the shape and size of an elliptical orbit, particularly reflecting the relationship between the orbital focus and the ellipse, and is determined by the semimajor axis and eccentricity. The synodic period refers to the time interval in which an asteroid and Earth align again in the same direction, which can be used to predict their relative positions.

Table 1

Spectra and parameter information.

2.1.2 Material type labels

We integrated asteroid type classification results from ten studies or publicly available datasets to generate the labels for our dataset. Tholen (1984) proposed an asteroid taxonomy based on the eight-color asteroid survey data (Tedesco et al. 1982; Zellner et al. 1985) in his doctoral thesis. Bus & Binzel (2002a) provided spectral data for 1447 asteroids and defined three major categories and 26 detailed categories based on the Tholen taxonomy. Lazzaro et al. (2004) published visible light spectra and classification results for 820 asteroids observed from 1996 to 2001. DeMeo et al. (2009) expanded the spectral range to the near-infrared using 371 asteroids. Carvano et al. (2010) used the extensive SDSS MOC4 database to establish a classification system compatible with existing research and provided their classification results. The Asteroid Taxonomy V6.0 (AT-V6) dataset is a collection of various asteroid type labels gathered from the literature. Polishook et al. (2014) studied the “fresh” and weathered surfaces of asteroids and published the spectral type for 31 asteroid pairs. Binzel et al. (2019) reported spectral properties and classification results for over 1000 near-Earth objects. DeMeo et al. (2009) confirmed the overall abundance and distribution of 21 olivine-dominated A-type main-belt asteroids. Finally, the LCDB database (Warner et al. 2021) includes asteroid types from various sources.

These asteroid type results originate from multi-source data, with varying spectral ranges and data qualities affecting the reliability. Therefore, we integrated these results according to a certain priority. Specifically, labels generated using VisNIR spectra were given higher priority over those using visible or near infrared spectra alone, and these were prioritized over labels based on multi-color photometry. Additionally, more recent publications were prioritized over older ones. The sources of the asteroid type labels used in this study are summarized in Table 2. It is worth noting that the resolution of Gaia DR3 spectra is only 44 nm, which is much lower than the resolution of the spectra mentioned in Table 2.

Table 2

Material type label information.

2.2 Data preprocessing

2.2.1 Data cleaning

A quality flag of 2 for the average reflectance spectra of SSOs in Gaia DR3 indicates that the wavelength band is compromised and not recommended for use. Statistical results show that lowquality spectra are primarily concentrated in the first and last bands, as illustrated in Fig. 1a. To ensure the quality of the spectra, we removed the first and last channels, as well as the channel at 0.55 μm since this value is always normalized to 1. The remaining 13 channels were retained. Next, for each SSO, if any channel still had a flag of 2, the corresponding object was removed to ensure that there were no poor-quality spectra in our dataset. After completing this filtering step, 58 166 SSOs remained.

The filtered SSOs were then matched with the LCDB and MPC orbital databases. The orbital parameters in the MPC were complete; the counts of valid values for the photometric and physical parameters we used from the LCDB are shown in Fig. 1b. We removed objects based on the quality flags or descriptions of the parameters, eliminating those with missing values, types assumed based on orbital group, types incompatible with the Bus-DeMeo taxonomy, binary or multiple asteroids, and those with unreliable rotation period results due to tumbling or other reasons. The minimum light curve amplitude parameter was not used because there were too many missing values. After completing all the filtering steps, a total of 4961 asteroids remained.

Fig. 1

Distribution of low-quality wavelength bands in Gaia DR3 SSOs spectra (a), and valid counts of photometric and physical parameters in LCDB before cleaning (b).

2.2.2 Category merging

In the Bus-DeMeo taxonomy, asteroid spectral types are divided into 24 detailed categories (DeMeo et al. 2009). Existing research often merges these categories based on different objectives, for example by consolidating subcategories into main equivalents, forming 11 categories (Penttilä et al. 2021), or by conducting statistical studies with four major classes (C, D, S, and X; Erasmus et al. 2018). For low-resolution spectra in optical wavelengths, some finer types are difficult to distinguish, so detailed classification is not necessary. To ensure type separability while emphasizing the mineralogical significance of asteroids, we have merged asteroid material types into six major categories based on the Bus-DeMeo classification, as detailed in Table 3. First, subclasses are merged into their equivalent major categories; for example, Xc, Xe, and Xk are merged into the X class. Second, some categories, including O, R, and K, have few samples; this can easily lead to overfitting in machine learning models. Therefore, they are merged into the S class, which is spectrally similar. Additionally, we referred to the merging results and mineralogical interpretations from existing studies (Gaffey et al. 1993; DeMeo & Carry 2013; Chao et al. 2017).

The C and B classes are combined to represent carbon-rich asteroids. The merged D class exhibits a red slope in spectra, indicating possibly organic-rich primitive materials, primarily distributed in the outer regions of the Solar System (DeMeo & Carry 2013). The merged S complex category represents a group of silicate-dominated asteroids. Metal-rich asteroids are included in the X class. The A class stands alone, characterized by distinct olivine features in its spectrum (DeMeo et al. 2019). The V class, with strong 1 μm and 2 μm absorption features, is primarily associated with asteroid 4 Vesta (Tholen 1984; DeMeo et al. 2009). Figure 2 shows the Gaia spectra of different classes of asteroids in our dataset.

2.3 Dataset generation

2.3.1 Data augmentation

We expanded the dataset using real samples to increase data diversity, which also balances the sample sizes across different categories. For spectral data, we calculated the mean and standard deviation for each wavelength by category, then generated simulated spectra using Gaussian distributions centered around these means and scaled by the standard deviations. The LCDB database provides error values for the parameters of each asteroid. Figure 3 shows the error distributions of some parameters. For parameters such as H magnitude and G, we performed augmentation by adding random Gaussian noise to the original values, with the median of the errors as the maximum perturbation bound. Some G parameters exhibit large errors, possibly due to sparse observational phase angles, which increase the uncertainty in the fitting process. For rotation period and effective diameter, the errors tend to increase with the estimated values. Therefore, the maximum boundary for Gaussian noise is set as the median percentage of the errors relative to the estimated values. The orbital parameters of asteroids are generally reliable, so a small noise boundary of 0.001 was set. Orbital types are discrete and are not augmented. The perturbation ranges for each parameter are shown in Table 4.

After augmentation, each type of asteroid consists of 2500 real and simulated samples, resulting in a total of 15 000 samples. Finally, Z-score normalization is applied to each parameter, which helps the model converge faster and prevents certain large or small values from adversely affecting the training process. The formula is as follows: $Z = \frac{X - μ}{σ} .$ $\[Z=\frac{X-\mu}{\sigma}.\]$ (1)

Here X is the input parameter, and μ and σ are the mean and standard deviation of the parameter, respectively.

Table 3

Asteroid material types and sample numbers used in this study.

Fig. 2

Gaia spectra of asteroids (colored lines) in our dataset and average spectrum (black line). These spectra are fitted with fourth-degree polynomials.

Table 4

Perturbation ranges of parameters for data augmentation.

2.3.2 Dataset partitioning

There are eight SSOs in our dataset that have been closely observed in history: 1 Ceres, 4 Vesta, 21 Lutetia, 243 Ida, 433 Eros, 951 Gaspra, 2867 Steins, and 162173 Ryugu. These SSOs have reliable material type information, so we used them as an independent test set. We primarily adopted ten-fold cross-validation (Rodriguez et al. 2009) to evaluate the effectiveness of our method. Specifically, the augmented dataset of 15 000 samples was randomly divided into ten equal parts. The experiment was repeated ten times; each time nine of the parts were used as the training set and the remaining part was used as the validation set. The overall model performance was assessed by averaging the ten validation results to mitigate random effects.

3 Methodology

This section introduces the proposed methods for developing an AI model that utilizes low-dispersion spectra along with multiple parameters to achieve more accurate classification, and explores the constraints of asteroid properties on their types. Section 3.1 introduces the parameter selection methods to reduce data dimensionality and information redundancy. In Sect. 3.2, the proposed classification model, AsterRF, along with other comparative models, is presented. Section 3.3 explains how to perform a screening of numerous parameter combinations to ensure the feasibility of the experiments. Finally, Sect. 3.4 details the evaluation metrics and the software and hardware configuration.

3.1 Feature selection

Our dataset includes 19 characteristic parameters of asteroids, but not all of them contribute significantly to determining asteroid types. Therefore, we used two statistical methods to filter out parameters with limited usefulness: the information gain ratio (IGR) and Spearman’s rank correlation coefficient.

The IGR (Iwata et al. 2004; Yao et al. 2022) measures the contribution of individual features to reducing the uncertainty of the target variable. It can rank the importance of input features, helping to identify the effectiveness of each feature in classification. The equation is expressed as follows: $H (C) = - \sum_{i = 1}^{n} \frac{| C_{i} |}{| C |} \log_{2} (\frac{| C_{i} |}{| C |})$ $\[H(C)=-\sum_{i=1}^{n} \frac{\left|C_{i}\right|}{|C|} ~\log _{2}\left(\frac{\left|C_{i}\right|}{|C|}\right)\]$ (2) $H (C ∣ P) = - \sum_{i = 1}^{n} \sum_{j = k}^{m} \frac{| C_{i, j} |}{| C |} \log_{2} (\frac{| C_{i, j} |}{| P_{j} |})$ $\[H(C {\mid} P)=-\sum_{i=1}^{n} \sum_{j=k}^{m} \frac{\left|C_{i, j}\right|}{|C|} ~\log _{2}\left(\frac{\left|C_{i, j}\right|}{\left|P_{j}\right|}\right)\]$ (3) $I G (C, P) = H (C) - H (C ∣ P)$ $\[I G(C, P)=H(C)-H(C {\mid} P)\]$ (4) $I G R (C, P) = \frac{I G (C, P)}{H (P)} .$ $\[IGR(C, P)=\frac{I G(C, P)}{H(P)}.\]$ (5)

Here H(C) is the information entropy of the target variable C (i.e., asteroid classes), |C_i| represents the number of samples of class i, and |C| represents the total number of samples. H(C|P) is the conditional entropy of the target C given the parameter P, |P_j| represents the number of samples when P takes the value of j, and |C_i,j| denotes the number of samples that belong to class i under the condition P = j, IG(C, P) represents the information gain, and H(P) measures the intrinsic information of P. The resulting ratio is the IGR(C, P).

Spearman’s rank correlation coefficient (Hauke & Kossowski 2011) measures the correlation and monotonic relationship between two variables. The coefficient r ranges from −1 to 1. When r equals 0, it indicates that the two variables are independent. The greater the absolute value of r, the stronger the correlation. In our study, Spearman’s correlation coefficient is used to identify asteroid parameters that are statistically highly correlated, providing a basis for parameter reduction. The formula is as follows: $r = \frac{\sum_{i = 1}^{n} (R (X_{i}) - \bar{R (X)}) (R (Y_{i}) - \bar{R (Y)})}{\sqrt{\sum_{i = 1}^{n} {(R (X_{i}) - \bar{R (X)})}^{2} \sum_{i = 1}^{n} {(R (Y_{i}) - \bar{R (Y)})}^{2}}} .$ $\[r=\frac{\sum_{i=1}^{n}\left(R\left(X_{i}\right)-\overline{R(X)}\right)\left(R\left(Y_{i}\right)-\overline{R(Y)}\right)}{\sqrt{\sum_{i=1}^{n}\left(R\left(X_{i}\right)-\overline{R(X)}\right)^{2} \sum_{i=1}^{n}\left(R\left(Y_{i}\right)-\overline{R(Y)}\right)^{2}}}.\]$ (6)

Here R(X_i) and R(Y_i) represent the ranks of the i-th sample for variables X and Y, respectively; $\bar{R (X)}$ $\[\overline{R(X)}\]$ and $\bar{R (Y)}$ $\[\overline{R(Y)}\]$ represent the mean ranks of variables X and Y; and n is the total number of samples.

Figures 4 and 5 illustrate the IGR rankings of various parameters with respect to asteroid types and the correlation matrix among all parameters, respectively. It is evident that the parameters Peri, Node, and mean anomaly contribute minimally when IGR values are below 0.01. Additionally, mean motion and orbital period are completely negatively correlated; the semilatus rectum is highly correlated with the semimajor axis because it is derived from both the semimajor axis and eccentricity; the correlation between synodic period and orbital period exceeds 0.99. Consequently, the six orbital parameters, namely Peri, Node, mean anomaly, mean motion, semilatus rectum, and synodic period, that contribute minimally to classification or are highly correlated with other parameters have been excluded. The remaining 13 parameters are retained as potential inputs for AI models.

Fig. 3

Example of error statistics for asteroid parameters as reported in LCDB database.

3.2 Model structure design

For the task of asteroid classification by combining spectra and parameters, we designed or applied five machine learning models: multilayer perceptron (MLP), A2-Net, AsterNet, random forest (RF), and AsterRF. Among them, AsterRF is our proposed best model, which is an integration of AsterNet and random forest. Experiments were conducted on these models to compare the classification effectiveness with different parameter combinations.

Fig. 4

Parameter importance for material types of asteroids.

3.2.1 Multilayer perceptron

At present, studies using ANNs for asteroid classification typically employ multilayer perceptrons (Penttilä et al. 2021; Luo et al. 2023), a type of feedforward neural network. MLPs consist of an input layer, one or more hidden layers, and an output layer, with fully connected nodes between each layer.

The network structure of the designed MLP is shown in Fig. 6. The model’s input is a (13+n)-dimensional vector concatenated from a 13-dimensional asteroid spectrum and n dimensional parameter combinations. The hidden layers consist of two layers, each containing 30 neurons, and use a rectified linear unit (ReLU; Sandfeld 2023) as the activation function. The output layer uses a softmax function to select the category with the highest probability as the classification result. Additionally, during the training phase, the model uses a cross-entropy loss function (Mao et al. 2023) with Adam (Kingma & Ba 2014) as the optimizer. The training lasts for 1000 epochs, with a learning rate of 1×10⁻³ and a batch size of 128. We applied dropout regularization (Srivastava et al. 2014), randomly inactivating 20% of the neurons to prevent overfitting. These hyperparameters were determined through multiple experiments.

3.2.2 A2-Net

The asteroid spectral type classification network (A-Net) is a deep learning model we previously proposed for classifying spectra observed by ground-based telescopes. We modified it to adapt to the data and tasks of this study, resulting in A2-Net. The network structure is shown in Fig. 7.

In A2-Net, the convolutional layers initially extract local features from the concatenated asteroid spectrum and parameters, while the fully connected layers and self-attention mechanism (Vaswani et al. 2017) further process and focus on important features. The multi-head self-attention, a key component of the Transformer model, has shown excellent performance in various deep learning tasks (Han et al. 2022). Additionally, before feeding the data into the model, we performed feature engineering to enhance spectral features by subtracting the reflectance value at each wavelength from the value at the previous wavelength. The remaining dimensions were filled with ones.

The activation function used in the feature extraction layers is LeakyReLU (Xu et al. 2020), which has a small slope in the negative region to mitigate the vanishing gradient problem. The training lasts for 80 epochs, and the model with the highest validation accuracy was selected as the final model. The learning rate is set to 1×10⁻⁴. Other hyperparameters, including the loss function, optimizer, batch size, and dropout, are identical to those in the MLP model.

3.2.3 AsterNet

We designed a multi-parameter constrained asteroid material classification network (AsterNet), which employs a dual-branch structure to separately process asteroid spectrum and various parameters, as shown in Fig. 8. One branch utilizes onedimensional convolution to extract local features from the spectrum and includes bottleneck layers with identity mappings (He et al. 2016) to facilitate feature extraction and mitigate the vanishing gradient problem. Adaptive average pooling is employed to reduce feature dimensions. Meanwhile, the other branch processes asteroid parameters organized in a specific sequence through fully connected layers. Subsequently, the features from the two branches are fused using a multi-head self-attention module, enhancing the ability to focus on important features and relationships between the spectrum and parameters. Finally, the fused features are processed by subsequent layers and activated for classification. The hyperparameters for this model are consistent with those of A2-Net, except for the dropout rate, which has been adjusted to 0.3.

3.2.4 Random forest

The random forest (Breiman 2001) algorithm consists of multiple decision trees, each independently trained on randomly sampled data, that consider a subset of features when splitting nodes. The final predictions are made by aggregating the results through voting. RFs are known for their robustness against overfitting and their ability to assess feature importance. They are well-suited for handling structured data, including our dataset, and are effective for both classification and regression tasks. Our RF model comprises 100 trees, as shown in Fig. 9. The model’s input is a one-dimensional vector formed by concatenating the spectrum with parameters, utilizing their combined information to enhance accuracy in asteroid classification.

3.2.5 AsterRF

AsterRF is an ensemble model combining the above AsterNet and RF. Typically, ensemble methods can effectively reduce the bias that may occur in a single model by combining the strengths of multiple models, thereby improving the overall generalization performance and reliability of the prediction results (Dietterich 2000). Specifically, after AsterNet and RF are trained separately, their predicted probabilities are weighted and averaged during the inference stage, and the final prediction result is obtained through softmax activation, $O u t p u t = S o f t m a x (\frac{O A_{A N} \times P_{A N} + O A_{R F} \times P_{R F}}{O A_{A N} + O A_{R F}}),$ $\[\operatorname{Output =Softmax}\left(\frac{O A_{A N} \times P_{A N}+O A_{R F} \times P_{R F}}{O A_{A N}+O A_{R F}}\right),\]$ (7)

where OA_AN and OA_RF are the average ten-fold cross-validation accuracies of AsterNet and RF, respectively, and P_AN and P_RF are the output probabilities for the target by the two models. By effectively integrating the strengths of AsterNet and RF for asteroid feature extraction, the classification accuracy is expected to be improved.

Fig. 5

Spearman correlation matrix of asteroid parameters.

Fig. 6

Network structure of an MLP.

3.3 Initial selection of parameter combinations

The selected 13 parameters of asteroids provide a multi-faceted description of their photometric, physical, and orbital properties. To determine which combinations of these parameters best complement the spectral information to achieve optimal classification results, it would be ideal to experiment with all possible combinations and evaluate their generalization performance. However, there are up to 8191 possible combinations of the 13 parameters. Conducting feature ablation experiments on all these combinations for our five models would be time-consuming and unnecessary. Therefore, we used the fast-training random forest model to experiment with all parameter combinations, and identified the top 100 combinations with the highest validation accuracy, based on the average results of three separate experiments. The remaining four models then performed comparative experiments with these combinations to ultimately determine the most effective parameter combination strategies.

Fig. 7

Network structure of A2-Net.

Fig. 8

Network structure of AsterNet.

Fig. 9

Structure of random forest.

3.4 Metrics and environment configuration

Overall accuracy (OA) is employed to evaluate the performance of the models. Additionally, precision and recall are employed as reference metrics of each class: $O A = \frac{1}{10} \sum_{i = 1}^{10} \frac{N_{i}}{1500},$ $\[O A=\frac{1}{10} \sum_{i=1}^{10} \frac{N_{i}}{1500},\]$ (8) $P r e c i s i o n = \frac{1}{10} \sum_{i = 1}^{10} \frac{{T P}_{i}}{{T P}_{i} + {F P}_{i}},$ $\[\mathrm{Precision} =\frac{1}{10} \sum_{i=1}^{10} \frac{\mathrm{TP}_{i}}{\mathrm{TP}_{i}+\mathrm{FP}_{i}},\]$ (9) $R e c a l l = \frac{1}{10} \sum_{i = 1}^{10} \frac{{T P}_{i}}{{T P}_{i} + {F N}_{i}} .$ $\[\mathrm{Recall} =\frac{1}{10} \sum_{i=1}^{10} \frac{\mathrm{TP}_{i}}{\mathrm{TP}_{i}+\mathrm{FN}_{i}}.\]$ (10)

Here i represents the i-th fold in the cross-validation process, and each validation set contains 1500 samples; N is the number of correctly classified samples; and TP, FP, and FN are the numbers of true positive, false positive, and false negative samples, respectively.

We utilized Python 3.10.14 for data processing and analysis, with deep learning models implemented using the PyTorch 2.3.0 framework. The hardware configuration on a Linux server mainly includes an NVIDIA GeForce RTX 3080 Ti GPU (12GB VRAM) and an Intel (R) Xeon (R) W-2235 CPU @ 3.80GHz processor.

4 Results and discussion

Section 4.1 provides a detailed comparison of the validation accuracies of different methods and presents the classification confusion matrix for the best-performing AsterRF model. Subsequently, Sect. 4.2 discusses the classification results of the AsterRF on the tested asteroids. In Sects. 4.3 and 4.4, the importance of various parameters is analyzed, and the recommendations of parameter combinations suitable for different scenarios are provided, respectively. Finally, Sect. 4.5 illustrates the constraint relationships among asteroid parameters, including their classes.

4.1 Performance comparison of different methods

Table 5 presents the overall accuracies of the five compared models under different input conditions. The second column shows the accuracy when only one spectrum is used as input. The third column displays the accuracy when the Gaia spectrum and all 13 asteroid parameters are combined as inputs. The final column reflects the highest accuracies achieved when the models are fed the spectrum and the selected effective parameters. Specifically, each model was trained and validated against the 100 parameter combinations obtained in Sect. 3.3, and the highest accuracy achieved by each corresponds to the selected parameters for that model.

The horizontal comparison in the table shows that adding selected asteroid parameters can improve the generalization performance of each model compared to inputting only spectrum. The increase in accuracy varies among models, ranging from 1.40 percentage points (A2-Net) to 4.65 percentage points (MLP). Although feeding all parameters along with spectrum into the AsterNet, RF, and AsterRF also achieves higher accuracy than using only spectrum, this is not the case with the MLP and A2-Net. In reality, if the scale of the training data and model parameters were sufficient, the learning rate appropriate, and the training epochs unlimited, then inputting all parameters would improve the classification accuracy. However, these factors are naturally constrained. Therefore, it is essential to select and utilize the most effective parameters to provide additional information, while also avoiding the introduction of excessive noise.

The vertical comparisons demonstrate that the proposed AsterRF model achieves the highest accuracy across various input configurations. In current related research, the primary method involves using MLP for spectral type classification (Penttilä et al. 2021; Luo et al. 2023), achieving an accuracy of 0.8445 on our dataset. With the same inputs, the AsterNet model improves the accuracy by 5.17 percentage points. This demonstrates the advantage of a dual-branch network architecture that employs convolutional layers for processing spectral data, dense layers for handling parameter data, and integrates their features using self-attention mechanisms. When the AsterRF model incorporates spectrum and a selected parameter combination, it demonstrates outstanding performance with an OA of 0.9222, the highest among all the scenarios. AsterRF effectively combines the confidence from both the deep learning model AsterNet and the rule-based machine learning model random forest during classification, resulting in further improved accuracy.

In addition, there is a trend where models with stronger generalization ability show smaller accuracy differences between using all parameters and using selected parameters. For instance, the accuracy difference in the MLP model is 5.37 percentage points, compared to just 0.25 percentage points in the AsterRF model. This can be attributed to the fact that models with higher validation accuracy tend to exhibit greater robustness and resistance to redundant information. Consequently, the impact on accuracy from indiscriminately feeding all parameters is relatively minor.

Overall, the framework that integrates low-resolution spectrum with multiple parameter information for classifying asteroid materials has proven to be effective. Within this framework, the AsterRF model achieves a six-class accuracy of 0.9222, which represents a 7.77 percentage points improvement over the MLP model that relies solely on spectrum. We advise selecting and combining photometric, physical, and orbital characteristics to provide the model with more streamlined constraint information. Additionally, processing asteroid spectra and parameters separately, followed by feature-level fusion, and integrating models with superior generalization abilities during inference are effective strategies for achieving more reliable classification results.

The standard deviation of a model’s accuracies across tenfold validation sets indicates the stability of its generalization performance. Figure 10 presents a boxplot summarizing the standard deviations of validation accuracies for each model across 100 parameter combinations. It reveals that the MLP model has the highest median standard deviation and has several outliers. The median standard deviation for A2-Net is lower, while its interquartile range is relatively large. The medians for AsterNet and RF are significantly reduced, indicating that these models maintain stable validation accuracy when there are variations in the training data. Finally, the ensemble AsterRF model exhibits the optimal generalization stability, with its standard deviation below 0.5%.

The AsterRF model reached its peak validation accuracy (0.9222) using the parameter combination of H magnitude, G, albedo, orbital type, and aphelion distance, indicating that integrating these parameters with spectrum provides optimal features for asteroid classification. This result was obtained by averaging the outcomes from three trials under different model initializations. Figure 11 presents the confusion matrix for the AsterRF model’s classification results under this parameter combination. It shows that the model achieves the highest accuracy for V-type asteroids, with a precision and recall of 0.97 and 0.98, respectively. A-type and D-type asteroids follow, with misclassified objects most frequently being S-type. The misclassified S-type objects are relatively evenly distributed across other categories. This can be partially attributed to the diverse features of S-type asteroids, which are composites of several subcategories. Additionally, X-type and C-type asteroids are often misclassified as each other, which is the primary reason for their lower accuracy.

Table 5

Ten-fold cross-validation accuracy of different methods.

Fig. 10

Standard deviations of ten-fold validation accuracies of different models.

Table 6

Mission information and material type predictions.

Fig. 11

Confusion matrix of AsterRF classification results on the validation set.

4.2 AsterRF test results

An additional test set includes eight asteroids that have been closely observed, and whose data were not previously learned by the models. We used the AsterRF model to predict the material types of these asteroids, with results detailed in Table 6. All asteroids except for 2867 Steins were correctly classified. Asteroid 2867, an X-type (E-type subclass), was misclassified by the model as an S-type. To investigate this error, we plotted the Gaia spectrum of asteroid 2867, as shown in Fig. 12a. We found that the spectral shape of this asteroid indeed resembles the average spectrum of S-types in our dataset. Extending the spectrum into near-infrared wavelengths might reveal characteristics more typical of X-types. Thus, the model’s misclassification of asteroid 2867 is understandable. Figure 12b shows an example of a correctly classified asteroid (4 Vesta) compared with the average spectrum of V-type asteroids.

Additionally, the model’s confidence scores for correctly classified S-type asteroids are generally above 0.9. For other types, although correct predictions are also achieved, the confidence is lower. This may be due to the widespread presence of S-type asteroids in the Solar System and the fact that this category has more real samples in our dataset. The higher proportion of simulated data in other categories leads to insufficient diversity in sample features. Therefore, increasing the number of real samples for rare asteroid categories is crucial for further enhancing the model’s generalization ability.

Fig. 12

Gaia spectrum of asteroid 2867 Steins and 4 Vesta compared with average spectrum of related classes. For clarity, the spectra have been vertically shifted.

Fig. 13

Accuracy ranking changes of parameter combinations across different initializations.

4.3 Feature importance analysis

Better parameter combinations contribute to higher classification accuracy. We tested the 100 best-performing combinations. However, the accuracy differences between these combinations are sometimes minimal. Therefore, the impact of random model initialization should be considered as it influences our determination of the optimal input features. Figure 13 illustrates the rank changes in validation accuracy for the 100 parameter combinations of the AsterRF model across three different initializations. The values were calculated by averaging the rank differences across the three trials. This figure shows that the variation in rankings ranges from 0 to 50, with a median around 18. This implies that if a parameter combination is ranked 10th in one training session, it might shift to 28th in another session with a different initialization.

To clarify the importance of various parameters in the task of classifying asteroids using artificial intelligence, statistical results can provide more stable judgments. Figure 14 shows the frequency of each parameter appearing in the top 50 combinations with the highest accuracy. It can be seen that the most frequently used parameters are G, orbital type, albedo, H magnitude, and effective diameter. In contrast, the least frequently used parameters including perihelion distance, semimajor axis, eccentricity, and orbital period. Orbital type and G appear in all the top parameter combinations, indicating that the model heavily relies on the integrated features of these two parameters with spectrum to classify asteroids.

It is reasonable for albedo and orbital type to appear among the important parameters. For example, D-type asteroids are commonly found in the orbital type of Trojan and typically have low albedo, with an average value of about 0.06 (DeMeo & Carry 2013). The size (i.e., effective diameter) of asteroids is also very important. This does not imply that it alone strongly constrains the asteroid type, but rather that it can provide additional complementary information when combined with other parameters. The feature importance analysis based on data statistics, as shown in Figs. 4 and 5, measures the correlation of a single variable with material type. In contrast, feature importance rankings based on model performance can capture the interactions between features. There are also notable differences in the results of feature importance visualization between the two approaches. For instance, the semimajor axis exhibits a strong correlation with asteroid type under the statistical methods of IGR and Spearman’s rank correlation coefficient, but it is not emphasized in the model. This suggests that it may be effectively replaced by other parameter combinations. Overall, methods based on statistical correlation and those based on artificial intelligence models are suitable for different research purposes. The advantage of artificial intelligence lies in its ability to integrate information from multiple sources, thus suggesting what the best solution might be.

Fig. 14

Frequency of parameter’s occurrence in top 50 combinations.

4.4 Recommended parameter combinations

The parameters required by the best model are not always readily available. For example, obtaining the G parameter requires accumulating photometric data across multiple phase angles. To adapt the proposed method to scenarios with different available parameters, we provide several recommended parameter combinations, as shown in Table 7.

Specifically, for cases where only spectrum is available, such as newly discovered asteroids in survey data, the classifications can be conducted solely based on the spectroscopy. For objects with determined orbits, incorporating the orbital type can enhance the reliability of classification results. Combining the H magnitude with other orbital elements further clarifies the asteroid’s type, and measuring the rotation period may also aid in classification. In contrast, albedo and the G parameter are more difficult to obtain. The albedo should ideally be derived from infrared observations. However, assumed albedo values based on orbital group are also acceptable. When the G parameter is available, the classification accuracy reaches 91.92%, with input parameters including H magnitude, G, orbital type, inclination, and aphelion distance. In practical applications, case 3 is likely to be commonly used because it relies solely on easily accessible orbital information and H magnitude. This also makes it more feasible to build a larger dataset for model training.

4.5 Relationships between asteroid properties

To clearly illustrate the levels of correlation among various asteroid parameters, we visualized their Spearman correlation and feature importance for material classification, as shown in Fig. 15. It reveals that the material type of asteroids has a statistical correlation of less than 0.2 with most parameters; shows low correlations (0.2–0.4) with parameters like perihelion distance, aphelion distance, albedo, and synodic period; and exhibits no strong correlations (0.7–1.0) with any parameter. The effective diameter shows moderate correlations (0.4–0.7) with albedo, the G parameter, and several orbital parameters. Many parameters exhibit weak correlations (less than 0.2), while other parameters mainly show low to moderate correlations, and strong correlations are relatively rare. The strong correlations observed between albedo and several orbital parameters are related to the fact that some albedo values in our dataset are inferred using orbital groups. The red lines indicate the five most useful parameters for material classification according to the AsterRF model.

Overall, there are complex and varying dependences among the physical characteristics, photometric parameters, and orbital elements of asteroids. Understanding these constraint relationships is important for studying the evolution history of asteroids and for the inversion of specific parameters.

Table 7

Parameter combinations for different scenarios.

Fig. 15

Relationships among 20 asteroid properties. The connections between orbital parameters with direct conversion relationships, such as mean motion and orbital period, or perihelion distance and semimajor axis and eccentricity, are not included. Correlations are represented as absolute values.

5 Conclusion

The main objective of this work was to enhance the accuracy of asteroid material classification by integrating multi-source information using artificial intelligence, and to explore the intrinsic relationships among asteroid parameters as well as the optimal input features for models. We proposed a new classification method that combines short-wavelength range and low-resolution spectroscopy with various photometric, physical, and orbital parameters to categorize asteroids into six major classes with mineralogical interpretations. The developed classification model exhibits superior generalization ability. Furthermore, the relative importance and interrelationships of these parameters are visualized and analyzed. Our main conclusions are as follows:

(1) The combination of photometric, physical, and orbital parameters of asteroids can effectively enhance material classification accuracy over spectral-only approaches, with improvements ranging from 1.4 percentage points to 4.7 percentage points across five different models. The proposed AsterRF ensemble model achieved an optimal validation accuracy of 92.2%, which is approximately 7.8 percentage points higher than existing spectrum classification methods using fully connected networks. This model ensures superior performance through a dual-channel architecture that independently processes spectra and multi-parameter inputs, and integrating features using a self-attention mechanism;

(2) Experimental results show that V-type asteroids have the highest classification accuracy, with precision and recall reaching 0.97 and 0.98, respectively. This is followed by A-type and D-type asteroids. X-type asteroids have the lowest precision and recall, at only 0.85 and 0.86, respectively, and are easily confused with C-type asteroids. Tests on closely observed asteroids indicate that the model generally exhibits higher classification confidence for the S-type asteroids compared to other categories;

(3) The Spearman’s rank correlation between asteroid types and the other 19 parameters is generally below 0.4. The top five parameters that the AsterRF model focuses on for classification based on spectrum are G, orbital type, albedo, H magnitude, and effective diameter. We recommend several parameter combinations suitable for different application scenarios, one of which includes H magnitude, G, albedo, orbital type, and aphelion distance.

Future studies could consider integrating additional asteroid observation data sources, such as planetary radar, exploring deeper connections between more parameters, and refining classification categories. To further enhance model performance, it is important to expand high-quality datasets, especially for asteroid types with few real samples. We also plan to apply the proposed method to classify and analyze data from the Gaia mission and validate the results using ground-based spectroscopic measurements. Meanwhile, the AI-based multi-source information fusion framework has the potential to be extended for estimating other asteroid parameters in related research.

Acknowledgements

This work is funded by the National Science and Technology Major Project (2022ZD0117401). This work has made use of data and/or services provided by the International Astronomical Union’s Minor Planet Center.

References

Belskaya, I., Fornasier, S., Tozzi, G., et al. 2017, Icarus, 284, 30 [NASA ADS] [CrossRef] [Google Scholar]
Binzel, R., DeMeo, F., Turtelboom, E., et al. 2019, Icarus, 324, 41 [NASA ADS] [CrossRef] [Google Scholar]
Bottke, W. F., Durda, D. D., Nesvornỳ, D., et al. 2005, Icarus, 179, 63 [NASA ADS] [CrossRef] [Google Scholar]
Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
Bus, S. J. 1999, Ph.D. thesis, Massachusetts Institute of Technology, USA [Google Scholar]
Bus, S. J., & Binzel, R. P. 2002a, Icarus, 158, 146 [Google Scholar]
Bus, S. J., & Binzel, R. P. 2002b, Icarus, 158, 106 [CrossRef] [Google Scholar]
Carvano, J., Hasselmann, P., Lazzaro, D., & Mothé-Diniz, T. 2010, A&A, 510, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Chao, H., Yue-hua, M., Hai-bin, Z., & Xiao-ping, L. 2017, Chinese Astron. Astrophys., 41, 549 [CrossRef] [Google Scholar]
DeMeo, F., & Carry, B. 2013, Icarus, 226, 723 [NASA ADS] [CrossRef] [Google Scholar]
DeMeo, F. E., & Carry, B. 2014, Nature, 505, 629 [NASA ADS] [CrossRef] [Google Scholar]
DeMeo, F. E., Binzel, R. P., Slivan, S. M., & Bus, S. J. 2009, Icarus, 202, 160 [Google Scholar]
DeMeo, F. E., Polishook, D., Carry, B., et al. 2019, Icarus, 322, 13 [CrossRef] [Google Scholar]
Dietterich, T. G. 2000, in International Workshop on Multiple Classifier Systems (Springer), 1 [Google Scholar]
Erasmus, N., McNeill, A., Mommert, M., et al. 2018, ApJS, 237, 19 [NASA ADS] [CrossRef] [Google Scholar]
Gaffey, M. J., Bell, J. F., Brown, R. H., et al. 1993, Icarus, 106, 573 [NASA ADS] [CrossRef] [Google Scholar]
Gaia Collaboration 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gaia Collaboration 2023a, A&A, 674, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gaia Collaboration 2023b, A&A, 674, A33 [CrossRef] [EDP Sciences] [Google Scholar]
Han, K., Wang, Y., Chen, H., et al. 2022, IEEE Trans. Pattern Anal. Mach. Intell., 45, 87 [Google Scholar]
Hauke, J., & Kossowski, T. 2011, Quaest. Geograph., 30, 87 [NASA ADS] [CrossRef] [Google Scholar]
He, K., Zhang, X., Ren, S., & Sun, J. 2016, in Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14 (Springer), 630 [Google Scholar]
Hossain, M. S., & Zabed, M. A. 2023, in Proceedings of International Conference on Information and Communication Technology for Development: ICICTD 2022 (Springer), 43 [Google Scholar]
Iwata, K., Ikeda, K., & Sakai, H. 2004, IEEE Trans. Neural Netw., 15, 792 [CrossRef] [Google Scholar]
Kingma, D. P., & Ba, J. 2014, arXiv e-prints [arXiv:1412.6980] [Google Scholar]
Klimczak, H., Kotłowski, W., Oszkiewicz, D., et al. 2021, Front. Astron. Space Sci., 8, 767885 [CrossRef] [Google Scholar]
Korda, D., Kohout, T., Flanderová, K., Vincent, J.-B., & Penttilä, A. 2023, A&A, 675, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lazzaro, D., Angeli, C., Carvano, J., et al. 2004, Icarus, 172, 179 [NASA ADS] [CrossRef] [Google Scholar]
Li, M., Wang, Y., Wang, Y., Zhou, B., & Zheng, W. 2020, Sci. Rep., 10, 1 [NASA ADS] [CrossRef] [Google Scholar]
Li, J., Tu, L., Gao, X., et al. 2022, MNRAS, 517, 808 [NASA ADS] [CrossRef] [Google Scholar]
Lin, Q., Wang, C., & Yao, W. 2024, Chinese Space Sci. Technol., 44, 89 [Google Scholar]
Luo, N., Wang, X., Gu, S., et al. 2023, AJ, 167, 13 [Google Scholar]
Mahlke, M., Carry, B., & Mattei, P.-A. 2022, A&A, 665, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mao, A., Mohri, M., & Zhong, Y. 2023, in International Conference on Machine Learning, PMLR, 23803 [Google Scholar]
Muinonen, K., MacLennan, E., Uvarova, E., et al. 2023, Bull. AAS, 55, 503 [NASA ADS] [Google Scholar]
Penttilä, A., Hietala, H., & Muinonen, K. 2021, A&A, 649, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Polishook, D., Moskovitz, N., Binzel, R. P., et al. 2014, Icarus, 233, 9 [NASA ADS] [CrossRef] [Google Scholar]
Popescu, M., Licandro, J., Carvano, J., et al. 2018, A&A, 617, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Rodriguez, J. D., Perez, A., & Lozano, J. A. 2009, IEEE Trans. Pattern Anal Mach. Intell., 32, 569 [Google Scholar]
Sahijpal, S. 2023, J. Astrophys. Astron., 44, 91 [NASA ADS] [CrossRef] [Google Scholar]
Sandfeld, S. 2023, in Materials Data Science: Introduction to Data Mining, Machine Learning, and Data-Driven Predictions for Materials Science and Engineering (Springer), 497 [Google Scholar]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 2014, J. Mach. Learn. Res., 15, 1929 [Google Scholar]
Takir, D., Neumann, W., Raymond, S. N., Emery, J. P., & Trieloff, M. 2023, Nat. Astron., 7, 524 [NASA ADS] [CrossRef] [Google Scholar]
Tedesco, E. F., Tholen, D. J., & Zellner, B. 1982, AJ, 87, 1585 [NASA ADS] [CrossRef] [Google Scholar]
Tholen, D. J. 1984, Asteroid Taxonomy from Cluster Analysis of Photometry (The University of Arizona) [Google Scholar]
Vaswani, A., Shazeer, N., Parmar, N., et al. 2017, Adv. Neural Inform. Process. Syst., 30 [Google Scholar]
Villanueva, G. L., Smith, M. D., Protopapa, S., Faggi, S., & Mandell, A. M. 2018, JQSRT, 217, 86 [NASA ADS] [CrossRef] [Google Scholar]
Wang, W., & Yao, W. 2024, J. Space Sci. Exp., 1, 13 [Google Scholar]
Warner, B., Harris, A., & Pravec, P. 2021, Asteroid Lightcurve Database (LCDB) Bundle V4.0, NASA Planetary Data System [Google Scholar]
Wei, S., He, Y., Liu, T., Yang, W., & Lin, Y. 2024, Chinese J. Space Sci., 44, 19 [NASA ADS] [CrossRef] [Google Scholar]
Xu, J., Li, Z., Du, B., Zhang, M., & Liu, J. 2020, in 2020 IEEE Symposium on Computers and Communications (ISCC), IEEE, 1 [Google Scholar]
Yao, J., Qin, S., Qiao, S., et al. 2022, Bull. Eng. Geol. Environ., 81, 148 [NASA ADS] [CrossRef] [Google Scholar]
Zellner, B., Tholen, D. J., & Tedesco, E. 1985, Icarus, 61, 355 [NASA ADS] [CrossRef] [Google Scholar]
Zou, Y., Xue, C., Jia, Y., et al. 2024, J. Deep Space Explor., 11, 169 [Google Scholar]

All Tables

Table 1

Spectra and parameter information.

In the text

Table 2

Material type label information.

In the text

Table 3

Asteroid material types and sample numbers used in this study.

In the text

Table 4

Perturbation ranges of parameters for data augmentation.

In the text

Table 5

Ten-fold cross-validation accuracy of different methods.

In the text

Table 6

Mission information and material type predictions.

In the text

Table 7

Parameter combinations for different scenarios.

In the text

All Figures

	Fig. 1 Distribution of low-quality wavelength bands in Gaia DR3 SSOs spectra (a), and valid counts of photometric and physical parameters in LCDB before cleaning (b).
In the text

	Fig. 2 Gaia spectra of asteroids (colored lines) in our dataset and average spectrum (black line). These spectra are fitted with fourth-degree polynomials.
In the text

	Fig. 3 Example of error statistics for asteroid parameters as reported in LCDB database.
In the text

	Fig. 4 Parameter importance for material types of asteroids.
In the text

	Fig. 5 Spearman correlation matrix of asteroid parameters.
In the text

	Fig. 6 Network structure of an MLP.
In the text

	Fig. 7 Network structure of A2-Net.
In the text

	Fig. 8 Network structure of AsterNet.
In the text

	Fig. 9 Structure of random forest.
In the text

	Fig. 10 Standard deviations of ten-fold validation accuracies of different models.
In the text

	Fig. 11 Confusion matrix of AsterRF classification results on the validation set.
In the text

	Fig. 12 Gaia spectrum of asteroid 2867 Steins and 4 Vesta compared with average spectrum of related classes. For clarity, the spectra have been vertically shifted.
In the text

	Fig. 13 Accuracy ranking changes of parameter combinations across different initializations.
In the text

	Fig. 14 Frequency of parameter’s occurrence in top 50 combinations.
In the text

	Fig. 15 Relationships among 20 asteroid properties. The connections between orbital parameters with direct conversion relationships, such as mean motion and orbital period, or perihelion distance and semimajor axis and eccentricity, are not included. Correlations are represented as absolute values.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Belskaya, I., Fornasier, S., Tozzi, G., et al. 2017, Icarus, 284, 30 [NASA ADS] [CrossRef] [Google Scholar]

[2] Binzel, R., DeMeo, F., Turtelboom, E., et al. 2019, Icarus, 324, 41 [NASA ADS] [CrossRef] [Google Scholar]

[3] Bottke, W. F., Durda, D. D., Nesvornỳ, D., et al. 2005, Icarus, 179, 63 [NASA ADS] [CrossRef] [Google Scholar]

[4] Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]

[5] Bus, S. J. 1999, Ph.D. thesis, Massachusetts Institute of Technology, USA [Google Scholar]

[6] Bus, S. J., & Binzel, R. P. 2002a, Icarus, 158, 146 [Google Scholar]

[7] Bus, S. J., & Binzel, R. P. 2002b, Icarus, 158, 106 [CrossRef] [Google Scholar]

[8] Carvano, J., Hasselmann, P., Lazzaro, D., & Mothé-Diniz, T. 2010, A&A, 510, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[9] Chao, H., Yue-hua, M., Hai-bin, Z., & Xiao-ping, L. 2017, Chinese Astron. Astrophys., 41, 549 [CrossRef] [Google Scholar]

[10] DeMeo, F., & Carry, B. 2013, Icarus, 226, 723 [NASA ADS] [CrossRef] [Google Scholar]

[11] DeMeo, F. E., & Carry, B. 2014, Nature, 505, 629 [NASA ADS] [CrossRef] [Google Scholar]

[12] DeMeo, F. E., Binzel, R. P., Slivan, S. M., & Bus, S. J. 2009, Icarus, 202, 160 [Google Scholar]

[13] DeMeo, F. E., Polishook, D., Carry, B., et al. 2019, Icarus, 322, 13 [CrossRef] [Google Scholar]

[14] Dietterich, T. G. 2000, in International Workshop on Multiple Classifier Systems (Springer), 1 [Google Scholar]

[15] Erasmus, N., McNeill, A., Mommert, M., et al. 2018, ApJS, 237, 19 [NASA ADS] [CrossRef] [Google Scholar]

[16] Gaffey, M. J., Bell, J. F., Brown, R. H., et al. 1993, Icarus, 106, 573 [NASA ADS] [CrossRef] [Google Scholar]

[17] Gaia Collaboration 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[18] Gaia Collaboration 2023a, A&A, 674, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[19] Gaia Collaboration 2023b, A&A, 674, A33 [CrossRef] [EDP Sciences] [Google Scholar]

[20] Han, K., Wang, Y., Chen, H., et al. 2022, IEEE Trans. Pattern Anal. Mach. Intell., 45, 87 [Google Scholar]

[21] Hauke, J., & Kossowski, T. 2011, Quaest. Geograph., 30, 87 [NASA ADS] [CrossRef] [Google Scholar]

[22] He, K., Zhang, X., Ren, S., & Sun, J. 2016, in Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14 (Springer), 630 [Google Scholar]

[23] Hossain, M. S., & Zabed, M. A. 2023, in Proceedings of International Conference on Information and Communication Technology for Development: ICICTD 2022 (Springer), 43 [Google Scholar]

[24] Iwata, K., Ikeda, K., & Sakai, H. 2004, IEEE Trans. Neural Netw., 15, 792 [CrossRef] [Google Scholar]

[25] Kingma, D. P., & Ba, J. 2014, arXiv e-prints [arXiv:1412.6980] [Google Scholar]

[26] Klimczak, H., Kotłowski, W., Oszkiewicz, D., et al. 2021, Front. Astron. Space Sci., 8, 767885 [CrossRef] [Google Scholar]

[27] Korda, D., Kohout, T., Flanderová, K., Vincent, J.-B., & Penttilä, A. 2023, A&A, 675, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] Lazzaro, D., Angeli, C., Carvano, J., et al. 2004, Icarus, 172, 179 [NASA ADS] [CrossRef] [Google Scholar]

[29] Li, M., Wang, Y., Wang, Y., Zhou, B., & Zheng, W. 2020, Sci. Rep., 10, 1 [NASA ADS] [CrossRef] [Google Scholar]

[30] Li, J., Tu, L., Gao, X., et al. 2022, MNRAS, 517, 808 [NASA ADS] [CrossRef] [Google Scholar]

[31] Lin, Q., Wang, C., & Yao, W. 2024, Chinese Space Sci. Technol., 44, 89 [Google Scholar]

[32] Luo, N., Wang, X., Gu, S., et al. 2023, AJ, 167, 13 [Google Scholar]

[33] Mahlke, M., Carry, B., & Mattei, P.-A. 2022, A&A, 665, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[34] Mao, A., Mohri, M., & Zhong, Y. 2023, in International Conference on Machine Learning, PMLR, 23803 [Google Scholar]

[35] Muinonen, K., MacLennan, E., Uvarova, E., et al. 2023, Bull. AAS, 55, 503 [NASA ADS] [Google Scholar]

[36] Penttilä, A., Hietala, H., & Muinonen, K. 2021, A&A, 649, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[37] Polishook, D., Moskovitz, N., Binzel, R. P., et al. 2014, Icarus, 233, 9 [NASA ADS] [CrossRef] [Google Scholar]

[38] Popescu, M., Licandro, J., Carvano, J., et al. 2018, A&A, 617, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[39] Rodriguez, J. D., Perez, A., & Lozano, J. A. 2009, IEEE Trans. Pattern Anal Mach. Intell., 32, 569 [Google Scholar]

[40] Sahijpal, S. 2023, J. Astrophys. Astron., 44, 91 [NASA ADS] [CrossRef] [Google Scholar]

[41] Sandfeld, S. 2023, in Materials Data Science: Introduction to Data Mining, Machine Learning, and Data-Driven Predictions for Materials Science and Engineering (Springer), 497 [Google Scholar]

[42] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 2014, J. Mach. Learn. Res., 15, 1929 [Google Scholar]

[43] Takir, D., Neumann, W., Raymond, S. N., Emery, J. P., & Trieloff, M. 2023, Nat. Astron., 7, 524 [NASA ADS] [CrossRef] [Google Scholar]

[44] Tedesco, E. F., Tholen, D. J., & Zellner, B. 1982, AJ, 87, 1585 [NASA ADS] [CrossRef] [Google Scholar]

[45] Tholen, D. J. 1984, Asteroid Taxonomy from Cluster Analysis of Photometry (The University of Arizona) [Google Scholar]

[46] Vaswani, A., Shazeer, N., Parmar, N., et al. 2017, Adv. Neural Inform. Process. Syst., 30 [Google Scholar]

[47] Villanueva, G. L., Smith, M. D., Protopapa, S., Faggi, S., & Mandell, A. M. 2018, JQSRT, 217, 86 [NASA ADS] [CrossRef] [Google Scholar]

[48] Wang, W., & Yao, W. 2024, J. Space Sci. Exp., 1, 13 [Google Scholar]

[49] Warner, B., Harris, A., & Pravec, P. 2021, Asteroid Lightcurve Database (LCDB) Bundle V4.0, NASA Planetary Data System [Google Scholar]

[50] Wei, S., He, Y., Liu, T., Yang, W., & Lin, Y. 2024, Chinese J. Space Sci., 44, 19 [NASA ADS] [CrossRef] [Google Scholar]

[51] Xu, J., Li, Z., Du, B., Zhang, M., & Liu, J. 2020, in 2020 IEEE Symposium on Computers and Communications (ISCC), IEEE, 1 [Google Scholar]

[52] Yao, J., Qin, S., Qiao, S., et al. 2022, Bull. Eng. Geol. Environ., 81, 148 [NASA ADS] [CrossRef] [Google Scholar]

[53] Zellner, B., Tholen, D. J., & Tedesco, E. 1985, Icarus, 61, 355 [NASA ADS] [CrossRef] [Google Scholar]

[54] Zou, Y., Xue, C., Jia, Y., et al. 2024, J. Deep Space Explor., 11, 169 [Google Scholar]