Set-Input Trees: Discovering New Effective Tree-Based Multiple Instance Learning Algorithms
Abstract
We propose gradient-based Set-Input Trees, a novel tree-based architecture for Multiple Instance Learning (MIL) that addresses both classification and regression. Unlike conventional methods relying on fixed aggregation (e.g., min/max pooling), the proposed architecture integrates gradient-boosted trees with an attention mechanism: instances are processed independently, while leaf embeddings are pooled via learned attention weights. This preserves interpretability while capturing bag-level structure. For regression, we introduce a synthetic MIL formulation, feature-to-bag conversion, enabling evaluation on continuous targets. Experiments show outperformance of the proposed algorithms on standard MIL benchmarks comparing to tree-based models. The model’s tree-based design ensures scalability and transparency, bridging instance-level decisions with set-valued predictions. Codes implementing the proposed algorithms are publicly available.