Traditional machine learning algorithms are susceptible to the curse of feature dimensionality [1]. Their computational complexity increases with high dimensional features. Redundant features may not be helpful in discriminating classes or reducing regression error, and they should be removed. Sometimes, redundant features may even produce negative effects as their number grows. Their detrimental impact should be minimized or controlled. To deal with these problems, feature learning techniques using feature selection are commonly applied as a data pre-processing step or part of the data analysis to simplify the complexity of the model. Feature selection techniques involve the identification of a subspace of discriminant features from the input, which describe the input data efficiently, reduce effects from noise or irrelevant features, and provide good prediction results.
Inspired by information theory and the decision tree, a novel supervised feature selection methodology is proposed recently in MCL. The resulting tests are called the discriminant feature test (DFT) and the relevant feature test (RFT) for classification and regression tasks, respectively [2]. The proposed methods belong to the filter methods of feature selection, which give a score to each dimension and select features based on feature ranking. The scores are measured by the weighted entropy and the weighted MSE for DFT and RFT, which reflect the discriminant power and relevance degree to classification and regression targets, respectively. It is shown by experimental results that DFT and RFT can select a lower dimensional feature subspace distinctly and robustly while maintaining high decision performance.
The proposed methods work well in the semi-supervised scenario, where useful feature set learnt in a limited number of labeled data has high intersection over union (IoU) compared to giving the full set of labeled training data. Examples are presented in Fig. 2, where the IoU scores between the feature sets selected by DFT under different supervision levels [3] are calculated for the object classification on MNIST and Fashion-MNIST. It shows the robustness of the proposed semi-supervised feature learning method.

Reference:
[1] PC Hammer, “Adaptive control processes: a guided tour (R. Bellman)”, 1962.
[2] Yang, Y., Wang, W., Fu, H., & Kuo, C. C. J. (2022). On Supervised Feature Selection from High Dimensional Feature Spaces. arXiv preprint arXiv:2203.11924.
[3] Yang, Y., Fu, H., & Kuo, C. C. J. (2022). Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking. arXiv preprint arXiv:2206.09061.

— Yijing Yang