MCL Research on LQBoost Regressor
LQBoost operates on the principle of leveraging successive linear regressions to mimic the target regression. Each least-square regression serves as a simulation of the corresponding target regression space, mapping the feature space into the current target space and approximating samples within it. The residue of the current target space becomes the regression target for the next iteration. Iteratively, samples with residuals nearing zero are pruned through a thresholding process. Building upon the foundation of LQBoost, in addition to iteratively optimizing the model through thresholding, we can further enhance the feature set using LNT. In each iteration, by removing samples with residuals close to zero, we obtain a clearer and more accurate approximation of the target space. Subsequently, to enrich the feature set and better capture the complex structures within the data, we can utilize LNT to perform local features transform. This iterative regression reduces the gap between the cumulative simulation and the target, resulting in increasingly accurate approximations.
Before this, we performed preprocessing by clustering the dataset into several clusters and using the purity of each cluster as the initial value for regression. If some clusters have high purity, the samples within those clusters use the major label of the cluster as their predicted value. For other clusters, we use the purity as the initial predicted value, and the residue generated by these clusters serves as the target space for the first layer of least square regression in LQBoost.
This preprocessing step is essential for initializing the regression process effectively. By clustering the dataset and utilizing cluster purity, we can assign more accurate initial predictions for each cluster. High-purity clusters, where most samples belong to a single class, provide a straightforward prediction based on the majority label. On [...]