Summarizing ‘KANITE: Kolmogorov-Arnold Networks for ITE Estimation’

BLOGS

Summarizing ‘KANITE: Kolmogorov-Arnold Networks for ITE Estimation’

Abhinav Thorat, Ravi Kolla, Niranjan Pedanekar

30^th September 2024

Figure 1: Overview of the proposed KANITE architecture

Ravi Kolla summarizes the paper titled “KANITE: Kolmogorov-Arnold Networks for ITE estimation” co-authored by Abhinav Thorat, Ravi Kolla and Niranjan Pedanekar, accepted at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases – ECML PKDD 2025.

Introduction

In causal inference, the estimation of Individual Treatment Effects (ITEs) is a foundational problem, as it is crucial for understanding the impact of a treatment on an individual user and personalizing treatments. ITE estimation has applications across a wide range of domains including healthcare, education, e-commerce, entertainment, and the social sciences. Despite its wide range of applications, ITE estimation remains challenging due to the absence of ground truth data, confounding factors, and treatment assignment bias. To overcome these issues, we propose new ITE estimation algorithms built on the recently popularized Kolmogorov–Arnold Networks.

In the below we summarize the key contributions of work:

KANs Meet Causal Inference for the First Time
While Kolmogorov–Arnold Networks (KANs) have recently gained attention for their expressive power, no prior work has explored their application in causal inference—specifically in individual treatment effect (ITE) estimation. Our work is the first to integrate KANs into this space.
Introducing KANITE: A Modular Framework for ITE Estimation
We propose KANITE, a novel framework that combines KANs with shared representation learning for estimating ITEs. To ensure the learned representations are balanced across treatment groups, we introduce a representation loss based on either Integral Probability Metrics (IPM) or Entropy Balancing (EB). KANITE includes three algorithmic variants such as KANITE-Wass, KANITE-MMD, KANITE-EB.
Making Entropy Balancing Work Beyond Binary Treatments
Entropy balancing is a popular technique for achieving covariate balance in binary treatment settings in causal effect estimation. We extend this idea to multiple treatment setups using Lagrangian duality theory, allowing for more robust estimation in complex real-world settings. This results in a novel algorithm that tightly integrates KANs with a multi-treatment version of the entropy balancing loss.
Outperforming Baselines Consistently
We rigorously benchmark KANITE on widely used datasets such as IHDP, NEWS (with 2 to 16 treatments), ACIC-16, and Twins including both binary and multiple treatment settings. KANITE consistently outperforms all baselines, demonstrating its practical effectiveness.
Analyzing KAN Hyperparameters
Beyond performance, we dive into how KAN-specific parameters—like grid resolution and the degree of spline activations—influence ITE estimation. These insights not only help interpret model behavior but also guide practitioners on how to fine-tune KANs for causal inference problems.

Proposed Model: KANITE

Our proposed KANITE framework addresses the task of ITE estimation for multiple treatments by utilizing KANs as the backbone of its architecture. It consists of three main building blocks, briefly explained below.

A. Balanced Representation of Covariates: First, KANITE aims to learn a balanced covariate representation by replacing the conventional MLPs with the KANs, shown as Representation Network in Figure 2, enabling the model to learn latent representations of covariates balanced across all treatment groups.
B. Treatment Head Networks: It consists of dedicated treatment head networks, where each treatment is modeled through a separate representation using KANs, allowing greater flexibility to capture the underlying distribution of treatment outcomes.
C. Representation loss: Three different representation losses have been considered in the proposed set of algorithms under KANITE. First and second losses are Maximum Mean Discrepancy (MMD) and Wasserstein, based on the Integral Probability Metric (IPM), and the third one utilizes Entropy Balancing (EB) method [33] to learn weights that minimize the Jensen-Shannon divergence, asymptotically, between all pairs of treatment groups. These three losses result into three different algorithms named KANITEMMD, KANITE-Wass, KANITE-EB for ITE estimation.

Below, we provide the training details of the KANITE algorithm.

Few Key Results

Conclusion

KANITE is a state-of-the-art framework for ITE estimation that leverages shared representation learning using either IPM or Entropy Balancing. Unlike traditional MLP-based architectures, KANITE employs KANs as its backbone, enabling it to learn more accurate causal effect estimates. The framework introduces three algorithms—KANITE-MMD, KANITE-Wass, and KANITE-EB—each utilizing a different IPM or Entropy Balancing-based representation loss to ensure balanced covariate representations across treatment groups. Experimental results demonstrate that KANITE effectively handles multipletreatmentscenarios, outperforming all considered baselines. Furthermore, KANITE achieves superior parameter efficiency and faster convergence while maintaining strong counterfactual prediction capabilities.

To know more about Sony Research India’s Research Publications, visit the ‘Publications’ section on our ‘Open Innovation’s page: Open Innovation with Sony R&D – Sony Research India

In most of the cases, it has been found that Content Driven sessions outperform the time driven sessions. The results are obtained on 6 baselines: STAMP, NARM, GRU4Rec, CD-HRNN, Tr4Rec on datasets like Movielens (Movies), GoodRead Book, LastFM (Music), Amazon (e-commerce).