MAESTRO: Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series

Motivation: Multimodal time-series models often assume complete modalities, but real-world sensing frequently suffers from arbitrary missingness. MAESTRO is designed to stay accurate under missing modalities.
Motivation: Multimodal time-series models often assume complete modalities, but real-world sensing frequently suffers from arbitrary missingness. MAESTRO is designed to stay accurate under missing modalities.

Real-world multimodal time series (wearables, mobile sensing, clinical monitoring) rarely comes with all modalities present. Sensors can be missing, corrupted, or intermittently unavailable, leading to arbitrary missingness patterns that break many multimodal models. MAESTRO addresses this by combining missingness-aware symbolic modeling with adaptive sparse attention and sparse MoE routing, achieving strong accuracy while keeping computation practical.

MAESTRO overview: missingness-aware symbolic tokenization with an explicit missing token , adaptive sparse intra-modal modeling, sparse cross-modal attention, and sparse MoE routing for specialization under different missingness patterns.
MAESTRO overview: missingness-aware symbolic tokenization (with an explicit missing token), adaptive sparse intra-modal modeling, sparse cross-modal attention, and sparse MoE routing for specialization under different missingness patterns.

Why MAESTRO works well in practice

Core Idea: Missingness-Aware Symbolic Modeling

MAESTRO uses symbolic tokenization (SAX-style) to convert each modality’s time-series segments into discrete tokens. Crucially, it reserves an explicit missing token to represent absent segments, allowing the model to learn the meaning of missingness instead of relying on heuristic masking or zero-filling.

Symbolic tokenization illustration: time-series segments are discretized into tokens, with a reserved missing token representing absent windows. This makes missingness explicit and learnable.
Symbolic tokenization illustration: time-series segments are discretized into tokens, with a reserved missing token representing absent windows. This makes missingness explicit and learnable.

Architecture

MAESTRO has three key components:

1) Adaptive sparse intra-modal modeling
Each modality is encoded with sparse attention, with a learned budget controlling how much attention capacity each modality receives (based on modality availability and utility).

2) Sparse cross-modal attention
Encoded modality streams are fused using sparse cross-modal attention to capture inter-modal dependencies efficiently.

3) Sparse MoE routing (specialization)
A sparse MoE module routes tokens to experts (top-k), enabling specialization for different modality-availability patterns.

Sparse MoE routing behavior: experts specialize as missingness patterns change, improving robustness without exploding compute.
Sparse MoE routing behavior: experts specialize as missingness patterns change, improving robustness without exploding compute.

Key Results

Macro-F1 vs. missingness : MAESTRO remains strong as missingness increases across multiple datasets.
Macro-F1 vs. missingness (%): MAESTRO remains strong as missingness increases across multiple datasets.

Missing-modality robustness (paper-reported highlights)

Full-modality performance snapshot (Acc / Macro-F1)

With symbolic tokenization enabled (Table 2), MAESTRO achieves:

Efficiency snapshot (WESAD setting)

On WESAD (Table 3), MAESTRO delivers strong accuracy with practical compute:

Code and Media

Publications