Academic Validation · External Reference

TimeBlind diagnosed it.
We've been building the fix.

In May 2026, researchers from Carnegie Mellon, UNC Chapel Hill, and Pittsburgh published TimeBlind — a rigorous diagnostic benchmark proving that every frontier video-language model on the market, including GPT-5 and Gemini 3 Pro, is functionally blind to time. The Burton Temporal Envelope patent family, deployed today as the Living Laboratories bot fleet, is the operational counterpart: a deterministic, patent-protected system that doesn't ask AI to see time — it lets the system act on time, against measurable pressure curves, with auditable provenance.

48.2%

vs.

98.2%

Best frontier MLLM vs. human performance on TimeBlind's compositional temporal-reasoning tasks. A 50-percentage-point gap that does not close even with 4× the frames, 10× the parameters, or maximum test-time reasoning.

Li, Zhao, Zhang, Mitra, Nyandwi, Bertasius — "TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs," arXiv:2602.00288 (2026)

What TimeBlind found

The TimeBlind team built a minimal-pairs benchmark: 600 instances, each a pair of videos with identical static visual content, differing only in temporal structure. Example: in one video a person shakes a coffee cup; in the paired video they hold the same cup still. Same room, same hand, same cup. Only the motion across time differs. The question: which video shows shaking?

Frontier models couldn't tell.

"Their performance degrades sharply on event attributes — such as Speed (slowly vs. rapidly), and Force (forcefully vs. gently) — dropping to 36.7% and 32.3% I-Acc." TimeBlind, p. 2

"These results expose a systematic deficiency in current models' understanding of low-level, physics-related temporal dynamics." TimeBlind, p. 7

Even more striking: the failure is not a matter of throwing more compute at the problem. The paper documents that:

Increasing video frames from 8 to 32 yields under 5% improvement.
An 11× parameter increase (7B → 78B) yields under 10% improvement.
Maximum GPT-5 reasoning effort peaks at 49.6% — still 49 points below human performance.

"Simply scaling model size does not yield robust spatio-temporal understanding." TimeBlind, p. 8

Their taxonomy — and ours

TimeBlind structures the diagnostic into a three-tier hierarchy mirroring cognitive-science models of temporal cognition. Remarkably, the Burton Temporal Envelope Model — patented in May 2026 as CA 3,310,722 — implements exactly this hierarchy on the operational side:

TIER TIMEBLIND DIAGNOSTIC LAYER BURTON OPERATIONAL LAYER

Atomic Events "What changed?" — recognize discrete events in time. Event-Pressure Engine atomically anchors each scheduled or observed event with a deterministic timestamp.

Parametric Attributes "How did it change?" — speed, force, magnitude. Continuous pressure curves modulate band intensity (building → heavy → urgent → critical → welfare) based on time-to-event and elapsed-since-event parameters.

Structural Logic "How do events relate?" — causality, Allen's 13 interval relations. Cascade graphs encode multi-event dependencies; Person-of-Concern hierarchy enforces escalation order; cryptographic ledger preserves causality.

The mapping was not designed to match the paper — the patent predates the paper's public release. The alignment is independent confirmation that the cognitive-science decomposition of temporal reasoning is convergent: both teams, working from opposite ends of the problem, arrived at the same three-tier structure.

The diagnostic vs. the deployed solution

TimeBlind (Diagnostic)

Passive observation. Tests what video-LLMs can perceive.
Minimal-pair videos under 30 seconds.
Static-shortcut detection across 600 curated instances.
Open-source dataset and evaluation code.
Identifies the gap. Does not propose how to close it.
Academic publication. arXiv 2602.00288.

Burton Family (Operational)

Active engagement. Drives what agents do against scheduled events.
Continuous pressure curves over arbitrary horizons (minutes to weeks).
Deterministic decisions reviewable against a wall clock.
Patent-protected. Closed source. ~31 bots deployed.
Closes the operational gap with a working production system.
Canadian Patent CA 3,310,722 + three companion filings.

These are siblings, not duplicates. TimeBlind asks can the AI see time? Our work asks can the AI use time? The fact that the world's most advanced AI models can't even pass the recognition test makes the operational solution that much more valuable — we're doing the thing they can't even diagnose, in production, today.

Why this matters for buyers and integrators

If your team builds anything that depends on AI reasoning about when — clinical-care escalation, logistics SLAs, financial-position monitoring, driver-safety alerts, student-retention windows, courier networks — the TimeBlind finding tells you the substrate you've been planning around is structurally insufficient. A 48% accuracy ceiling on temporal compositionality is not a number you can engineer your way around with more frames or more parameters.

The deterministic temporal-awareness substrate that the Burton family runs on is the architectural alternative. It doesn't try to perceive time from video — it operates against a maintained model of event-pressure curves, escalates through configurable hierarchies, and produces auditable decisions that a human supervisor can verify against a clock.

The full patent family, the running prototype, and the deployed bot catalog are indexed on the main page. The patent family will be sold by sealed-bid auction this summer; auction open and close times will be announced publicly via @EventTimeNotificationBot on Telegram. Subscribe below to be notified when the auction opens.

🔔 Get notified when the auction opens

@EventTimeNotificationBot on Telegram — the Living Laboratories public broadcast bot. Scan the QR or tap to start a conversation, then /start. One message, then quiet until the auction opens (or until something material happens with the patent family).

No spam. No marketing. Just the auction's open and close moments — and any other patent-family event that warrants a heads-up.

QR code linking to @EventTimeNotificationBot on Telegram

Scan to subscribe

For researchers and academic evaluators

If you're working on temporal reasoning, agentic AI, embodied AI, or the operational closure of the TimeBlind gap — we'd like to talk. Independent benchmarking, citation in continuation filings, or simple comparison of architectures are all topics we welcome.

📧 Contact Kevin Burton ← Main page

Cite the source

@misc{li2026timeblindspatiotemporalcompositionalitybenchmark,

  title={TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs},

  author={Baiqi Li and Kangyi Zhao and Ce Zhang and Chancharik Mitra and Jean de Dieu Nyandwi and Gedas Bertasius},

  year={2026},

  eprint={2602.00288},

  archivePrefix={arXiv},

  primaryClass={cs.CV},

  url={https://arxiv.org/abs/2602.00288}

}

Paper: arxiv.org/abs/2602.00288 · Project page: baiqi-li.github.io/timeblind_project · Code & data: github.com/Baiqi-Li/TimeBlind · News coverage: Quantum Zeitgeist

TimeBlind is the independent intellectual property of its authors. This page is a third-party commentary on their published work. We thank the TimeBlind team for naming the problem so precisely.

TimeBlind diagnosed it.We've been building the fix.