LQBoost operates on the principle of leveraging successive linear regressions to mimic the target regression. Each least-square regression serves as a simulation of the corresponding target regression space, mapping the feature space into the current target space and approximating samples within it. The residue of the current target space becomes the regression target for the next iteration. Iteratively, samples with residuals nearing zero are pruned through a thresholding process. Building upon the foundation of LQBoost, in addition to iteratively optimizing the model through thresholding, we can further enhance the feature set using LNT. In each iteration, by removing samples with residuals close to zero, we obtain a clearer and more accurate approximation of the target space. Subsequently, to enrich the feature set and better capture the complex structures within the data, we can utilize LNT to perform local features transform. This iterative regression reduces the gap between the cumulative simulation and the target, resulting in increasingly accurate approximations.
Before this, we performed preprocessing by clustering the dataset into several clusters and using the purity of each cluster as the initial value for regression. If some clusters have high purity, the samples within those clusters use the major label of the cluster as their predicted value. For other clusters, we use the purity as the initial predicted value, and the residue generated by these clusters serves as the target space for the first layer of least square regression in LQBoost.
This preprocessing step is essential for initializing the regression process effectively. By clustering the dataset and utilizing cluster purity, we can assign more accurate initial predictions for each cluster. High-purity clusters, where most samples belong to a single class, provide a straightforward prediction based on the majority label. On the other hand, clusters with lower purity require a more nuanced approach, where the initial prediction is based on the cluster’s overall purity, and LQBoost further refines this prediction through iterative regression.
This straightforward yet potent approach lays the groundwork for a hierarchical regression model ensemble. With each additional regressor, built upon the residues of the preceding model, the approximation of the regression target space is refined progressively. This iterative refinement enhances the model’s predictive capability, culminating in a robust and accurate predictive framework. Through this iterative process, LQBoost iteratively learns from the residuals, iteratively improving its accuracy and predictive power.