# SPRIND Next Frontier AI — Technical Fields (hypothesis + research)

> **Scope:** the SPRIND form's **technical** fields only, each filled with **(1) our hypothesis** and **(2) what the exact research will be**. Financial / compute-€ / team / commercial = **out of scope this pass**.
> **Two things are *not* invented here:** (a) the **Existing Artifacts** are results we have actually measured in this repo; (b) the **Momentum** table lists results we independently confirmed/falsified against published prior art. Everything else is explicitly framed as **hypothesis** or **research method** — we do **not** pre-commit an architecture, a technique, or a milestone list.
> **Citations:** local links to [`./papers/`](papers/).
> **Tiers:** `[PROVEN]` measured here · `[TOY]` toy/unit-test only · `[PRIOR-ART]` established external result · `[PROJECTED]` hypothesis/estimate · `[NEGATIVE]` measured null/falsification.

---

## The hypothesis (story spine)

- **The next frontier model is recurrent.** Transformers won the current S-curve by paying quadratic attention over a growing context window — cost scales with whoever burns the most H100s. **State-based recurrent networks** carry a fixed-size *evolving state*: constant compute/memory **per step**, unbounded context, memory in the substrate. Different curve — "electric drive, not a better combustion engine."
- **It comes from many sides, not one trick.** We are a **toolbox of proven research**, equipped to assemble the best of everything — fast. We think in **classes of proven tools** and find *which assembly* wins.
- **The open question = the research:** *which* recurrent architecture, composed from today's best proven components, first reaches frontier-grade capability under European efficiency constraints — decided by **rigorous benchmarks and falsification tests**, not by pre-commitment.
- **Existing modules (attention/SSM/MLP) are used where relevant** — not discarded, not forced through one mechanism. Our spiking/kernel work is a **first probe + realizability evidence**, not the thesis.

---

## Momentum — velocity is the differentiator (grounded)

- A small team, together a few weeks, using an AI-augmented **build → benchmark (vs `torch.compile`) → profile → falsify → literature-check** loop, independently **confirmed / rediscovered / falsified ≈11 published results** across GPU kernels, SNN neuroscience, spatial connectomics, and MoE routing — **11–30 May 2026 (~19 days)**. ~4 of 11 were *falsifications of field over-claims*.

| # | What we measured | Published prior art | Status |
|---|---|---|---|
| 1 | Fine sparse connectivity ≈0× GPU speedup; only block-structure wins | [Sparsity Roofline (Gale 2023)](papers/2023_Sparsity_Roofline_hardware_limits_sparse_networks_2310.00496.pdf) | re-derived |
| 2 | Naive sparse SNN kernels lose to compiled dense; only tiled split wins | [FlashLLM](papers/2023_FlashLLM_unstructured_sparsity_tensor_cores_2309.10285.pdf), [SparseRT](papers/2020_SparseRT_unstructured_sparsity_GPU_inference_2008.11849.pdf) | re-derived |
| 3 | SpectralAI RT-core "113–218×" → ≤2× vs a compiled baseline | SpectralAI preprint | `[NEGATIVE]` falsified |
| 4 | 3D wins SHD / 2D wins Yin-Yang (different mechanism) | [SpSNN (Landsmeer 2025)](papers/2025_Spatial_Spiking_Neural_Networks_2512.10011.pdf) | reproduced |
| 5 | Random sparse ≥ dense in SNN+SHD (+2pp); structured spatial +1pp | [Random Pruning (Liu 2022)](papers/2022_Unreasonable_Effectiveness_Random_Pruning_Liu_2202.02643.pdf) | new domain |
| 6 | Bare LIF + surrogate fails long-memory; richer cells recover it | [ELM (Spieler 2023)](papers/2023_Expressive_Leaky_Memory_Neuron_2306.16922.pdf) | rediscovered |
| 7 | Router *state* is the critical routing axis | [RMoE (Qiu 2024)](papers/2024_RMoE_Layerwise_Recurrent_Router_MoE_Qiu_2408.06793.pdf) | rediscovered |
| 8 | Stateful router 99% vs stateless 70% on cue-switch | [RIMs (Goyal 2019)](papers/2019_Recurrent_Independent_Mechanisms_RIMs_1909.10893.pdf), [Routing-Mamba](papers/2025_Routing_Mamba_MoE_projection_SSM_2506.18145.pdf) | `[TOY]` |
| 9 | One-shot pruning collapses SNN; needs sparse-aware retrain | [Lottery Ticket](papers/2018_Lottery_Ticket_Hypothesis_Frankle_Carbin_1803.03635.pdf), [RigL](papers/2020_RigL_Rigging_Lottery_Evci_1911.11134.pdf) | re-derived |
| 10 | Spike sets NOT temporally stable → killed a compression path | (literature silent) | `[NEGATIVE]` |
| 11 | LIF dynamics <1% of layer time → optimise matvec, not LIF | (implicit assumption) | `[NEGATIVE]` |

---

# Technical fields

## Project Title `[≤50 chars]`
- candidates: `State-based RNNs: the next frontier` · `The recurrent frontier — proven tools, assembled` · `RNNs that scale with state, not context`

## Short Description `[≤500]`
- **Hypothesis:** state-based recurrent networks are the next frontier — constant per-step compute (no context window), state-as-memory, and efficiency that fits European compute.
- **Research:** assemble a new recurrent architecture from a **toolbox of proven tool-classes** and find the best combination via **rigorous benchmarks + falsification tests** on efficiency × performance × unbounded context.
- **Grounding:** ~11 prior-art results validated/falsified in ~19 days; a working spiking prototype + GPU kernels.

## Frontier Dimension `[≤500]`
- **Model architecture** (in-scope: state-space / alternative architectures).
- **Hypothesis:** a state-based **recurrent** network opens a scaling axis transformers lack — **per-step compute need not grow with model size**, and **context is unbounded** (fixed evolving state, no re-attention).
- Built by **combining proven components**, not one new trick; existing optimised modules used **where relevant**.

## Core Idea & Architecture `[≤3000]`
- **Hypothesis (substrate):** recurrence is the right substrate — a fixed-size state evolves per timestep → streaming-native, constant compute/memory per step, memory implicit in the state ([Mamba](papers/2023_Mamba_selective_state_spaces_2312.00752.pdf), [S4](papers/2022_S4_structured_state_spaces_2111.00396.pdf)/[S5](papers/2023_S5_simplified_state_space_layers_2208.04933.pdf), [xLSTM](papers/2024_xLSTM_extended_long_short_term_memory_2405.04517.pdf), [HiPPO](papers/2020_HiPPO_recurrent_memory_polynomial_projections_2008.07669.pdf)).
- **Hypothesis (composition):** the architecture is **assembled from classes of proven tools**, composed only where each earns its place — *this is the search space, not a fixed design:*
  - *Recurrent substrate* — selective SSM / spiking / xLSTM-style cells.
  - *Conditional computation* — routing/MoE that activates few blocks per step ([Sparsely-Gated MoE](papers/2017_Sparsely_Gated_MoE_layer_1701.06538.pdf), [Switch](papers/2021_Switch_Transformers_2101.03961.pdf)); incl. **stateful/path-dependent** routing as one candidate (vs stateless [MoE-Mamba](papers/2024_MoE_Mamba_selective_SSM_mixture_experts_2401.04081.pdf)/[BlackMamba](papers/2024_BlackMamba_MoE_state_space_models_2402.01771.pdf)).
  - *Expressive recurrent neurons* — cells richer than a bare leaky integrator (multi-timescale / expressive-neuron class).
  - *Recursive / hierarchical reasoning* — depth decoupled from parameters ([GRAM](papers/2026_GRAM_Generative_Recursive_Reasoning_Baek_2605.19376.pdf), [HRM](papers/2025_HRM_Hierarchical_Reasoning_Model_Wang_2506.21734.pdf), [TRM](papers/2025_TRM_Tiny_Recursive_Model_Jolicoeur-Martineau_2510.04871.pdf)).
  - *Structured sparsity / efficiency* — GPU-exploitable (block-structured), not random fine-grained sparsity.
  - *Existing optimised modules* — attention / MLP, used where they help (hybrid precedent: [Jamba](papers/2024_Jamba_hybrid_Transformer_Mamba_MoE_2403.19887.pdf), [Nemotron-H](papers/2025_Nemotron_H_Hybrid_Mamba_Transformer_NVIDIA_2504.03624.pdf), [Nemotron 3 Super (hybrid + LatentMoE)](papers/2025_Nemotron_3_Super_LatentMoE_NVIDIA.pdf)).
- **Which members of which classes, and how they combine, is exactly what the research determines** (see *The Research*).

## Technical Novelty `[≤2000]`
- **Hypothesis (novelty):** the next frontier model is reachable by **rigorously assembling proven recurrent tool-classes** — an under-explored combination space — rather than by scaling one architecture; and our **velocity + falsification discipline** is what makes finding the winning assembly feasible in months.
- **One candidate mechanism we will test:** **stateful / path-dependent routing** — the active sub-network is chosen from the accumulated recurrent state, not the current token; this switches the active blocks *mid-sequence* as context changes. Hypothesised to differ in **capability** (not just efficiency) from stateless token-wise SSM-MoE ([Routing-Mamba](papers/2025_Routing_Mamba_MoE_projection_SSM_2506.18145.pdf), [Swimba](papers/2026_Swimba_Switch_Mamba_MoE_SSM_2603.06938.pdf)).
- **Efficiency as paradigm, not tweak:** event-driven recurrence + GPU-exploitable structured sparsity (a different compute model), not "leaner MoE routing."
- **Open hypothesis (to prove/falsify):** sub-quadratic compute-vs-sequence-length for the routed/sparse recurrent model — empirical only, no formal claim.

## Technical Novelty Citation `[≤1000]`
- **Recurrent/SSM:** [Mamba](papers/2023_Mamba_selective_state_spaces_2312.00752.pdf), [Mamba-2](papers/2024_Mamba2_Transformers_are_SSMs_SSD_2405.21060.pdf), [S4](papers/2022_S4_structured_state_spaces_2111.00396.pdf), [S5](papers/2023_S5_simplified_state_space_layers_2208.04933.pdf), [xLSTM](papers/2024_xLSTM_extended_long_short_term_memory_2405.04517.pdf), [HiPPO](papers/2020_HiPPO_recurrent_memory_polynomial_projections_2008.07669.pdf), [Active Tuning/Otte](papers/2020_Active_Tuning_RNN_state_dynamics_Otte_2010.03958.pdf).
- **Conditional computation:** [Sparsely-Gated MoE](papers/2017_Sparsely_Gated_MoE_layer_1701.06538.pdf), [Switch](papers/2021_Switch_Transformers_2101.03961.pdf), [MoE-Mamba](papers/2024_MoE_Mamba_selective_SSM_mixture_experts_2401.04081.pdf), [BlackMamba](papers/2024_BlackMamba_MoE_state_space_models_2402.01771.pdf), [Routing-Mamba](papers/2025_Routing_Mamba_MoE_projection_SSM_2506.18145.pdf), [Swimba](papers/2026_Swimba_Switch_Mamba_MoE_SSM_2603.06938.pdf), [RMoE](papers/2024_RMoE_Layerwise_Recurrent_Router_MoE_Qiu_2408.06793.pdf), [RIMs](papers/2019_Recurrent_Independent_Mechanisms_RIMs_1909.10893.pdf), [σ-MoE/Csordás](papers/2023_sigma_MoE_approximating_two_layer_FFN_2310.10837.pdf).
- **Expressive neurons:** [ELM](papers/2023_Expressive_Leaky_Memory_Neuron_2306.16922.pdf), [Scaling-Laws-Recurrent-Expressive-Neurons](papers/2026_Scaling_Laws_Recurrent_Expressive_Neurons_2605.12049.pdf).
- **Recursive reasoning / test-time memory:** [GRAM](papers/2026_GRAM_Generative_Recursive_Reasoning_Baek_2605.19376.pdf), [HRM](papers/2025_HRM_Hierarchical_Reasoning_Model_Wang_2506.21734.pdf), [TRM](papers/2025_TRM_Tiny_Recursive_Model_Jolicoeur-Martineau_2510.04871.pdf), [Titans](papers/2025_Titans_Learning_to_Memorize_at_Test_Time_Behrouz_2501.00663.pdf).
- **Spiking basis / neuromorphic delineation:** [LSNN](papers/2018_LSNN_LSTM_learning_in_spiking_neurons_1803.09574.pdf), [ALIF/Yin](papers/2021_Yin_Corradi_Bohte_ALIF_SHD_2103.12593.pdf), [e-prop](papers/2019_eprop_alternatives_to_BPTT_recurrent_SNN_1901.09049.pdf), [SpikingBrain](papers/2025_SpikingBrain_Brain_inspired_Large_Models_2509.05276.pdf); [Loihi2](papers/2021_Loihi2_Orchard_2111.03746.pdf), [SpiNNaker2](papers/2021_SpiNNaker2_Mayr_2103.08392.pdf), [Tianjic](papers/2019_Tianjic_Pei_Nature.pdf).
- **Hybrid at scale:** [Jamba](papers/2024_Jamba_hybrid_Transformer_Mamba_MoE_2403.19887.pdf), [Nemotron-H](papers/2025_Nemotron_H_Hybrid_Mamba_Transformer_NVIDIA_2504.03624.pdf), [Nemotron 3](papers/2025_Nemotron_3_White_Paper_NVIDIA.pdf).

## Capability Gap Addressed `[≤1000]`
- **Infinite / no context window** — fixed evolving state, no re-attention, no quadratic blow-up; a stream can run "always-on." Not closable by scaling transformers.
- **Real-time / low-latency & edge** — constant per-step compute + energy fits streaming audio/sensor/video and European edge deployment.
- **Hypothesised capability (to test):** path-dependent computation — switching the active sub-network mid-sequence as context changes (`[TOY]` evidence: 99% vs 70%).
- **Reasoning under tiny budgets** — recursive cores ([GRAM](papers/2026_GRAM_Generative_Recursive_Reasoning_Baek_2605.19376.pdf): ~10M params beat a 671B model on constraint reasoning) → capability from architecture, not scale.

## Existing Artifacts `[≤2000]` — measured, not hypothesised
- **Spiking SNN on SHD** `[PROVEN]`: 84–87% test accuracy at 20 epochs (ref ~90% at 150 epochs).
- **Stateful-routing toy** `[TOY]`: 99% (stateful) vs <70% (stateless) on a 13-step cue-switch unit test; end-to-end not yet run.
- **GPU kernels** (all graded vs `torch.compile` dense): stateful-routing kernel (trainable); spike-pool ("activation pool") kernel — non-routed decode **1.73–1.84× per-step, 2.86–3.10× sustained** `[PROVEN]`; multi-timestep tensor-core kernel — **3.43× (peak 3.67×)** vs compiled dense decode `[PROVEN]`; block-sparse variant **26.8×** at 99% sparsity (requires sparsity-aware retraining); pre-inference matrix-shuffle pruning kernel.
- **~26× spiking inference speedup** `[PROVEN, binary-specific]` via look-up-table over (active block × spike pattern).
- **The ~11-item validated-vs-prior-art table** above (R&D velocity, not just point results).
- *(Repo + GitHub links + benchmark JSONs/visuals to attach.)*

## Open Research Questions / Risks `[≤1000]` — these *are* the research
- **Can a routed/sparse recurrent net match a dense model's accuracy?** Open. `[NEGATIVE today]`: a routed SNN collapsed on SHD (38.6–51.8%) — the central question to settle.
- **Does the inference efficiency transfer to training and to continuous (non-spiking) substrates?** Open: training-time wall-clock speedup unproven `[NEGATIVE today]`; the 26× LUT is binary-specific `[PROJECTED]` for continuous SSMs.
- **Is the compute genuinely sub-quadratic in sequence length?** Open — empirical only, no formal proof.
- **Does small-scale behaviour survive scale-up?** Open — to be checked with scaling curves.

## TRL Assessment `[≤1000]`
- Experimental research stage. **Validated artifacts exist** (GPU kernels parity-tested + benchmarked vs compiled dense; spiking SNN trained on SHD; routing toy) — see *Existing Artifacts*. *Formal overall/sub-component TRL mapping: team to assign (next pass) — not estimated here.*

## Compute Requirements `[≤1000]`
- *Out of scope this pass (resource/financial). Technical note: credible-path-to-frontier to be shown via scaling curves; no brute-force run.*

## KPIs / Benchmarks `[≤1000]` — measurement axes for the search
- **Accuracy parity** routed/sparse vs dense at matched FLOPs.
- **Latency + throughput** (tokens/s decode) vs compiled-dense Transformer & SSM baselines.
- **Energy / token**; **max context length / streaming stability** (constant memory over long streams).
- **Active-block count / sparsity**; **compute-vs-sequence-length curve** (test sub-quadratic hypothesis).
- *(Specific benchmark suite = chosen during the research, on standard long-context / associative-recall / streaming tasks.)*

## The Research — what we will actually do `[Work Plan field, ≤4000]`
- **The research is a search, not a fixed build.** We **find the best-matching components and combinations for the next recurrent architecture through rigorous benchmarks and falsification tests.**
- **The loop** (the one that produced the Momentum results, now aimed at architecture discovery): implement a candidate → benchmark on the triad against an **adversarially-compiled** baseline → profile the *real* bottleneck → **try to falsify it** (fair baselines, ablation controls) → check the literature → keep only what survives; record what doesn't as a negative result.
- **Search space** = the proven tool-classes in *Core Idea*. Selection criterion = **efficiency × performance × unbounded context**.
- **No pre-committed architecture, technique, or milestone list** — the winning assembly (and the ruled-out dead ends) are *outputs* of the research, which become the Stage-1 technical report/preprint + experimental codebase.

## Team `[≤2000]`
- *Out of scope this pass.*

---

## References (by tool-class — local links)

**Recurrent / SSM substrate** — [Mamba (Gu & Dao 2023)](papers/2023_Mamba_selective_state_spaces_2312.00752.pdf) · [Mamba-2 (2024)](papers/2024_Mamba2_Transformers_are_SSMs_SSD_2405.21060.pdf) · [S4 (Gu 2021)](papers/2022_S4_structured_state_spaces_2111.00396.pdf) · [S5 (Smith 2023)](papers/2023_S5_simplified_state_space_layers_2208.04933.pdf) · [xLSTM (Beck 2024)](papers/2024_xLSTM_extended_long_short_term_memory_2405.04517.pdf) · [HiPPO (Gu 2020)](papers/2020_HiPPO_recurrent_memory_polynomial_projections_2008.07669.pdf) · [Active Tuning (Otte 2020)](papers/2020_Active_Tuning_RNN_state_dynamics_Otte_2010.03958.pdf) · [StateX (2025)](papers/2025_StateX_RNN_Recall_State_Expansion_2509.22630.pdf)

**Conditional computation / routing** — [Sparsely-Gated MoE (2017)](papers/2017_Sparsely_Gated_MoE_layer_1701.06538.pdf) · [Switch (2021)](papers/2021_Switch_Transformers_2101.03961.pdf) · [MoE-Mamba (2024)](papers/2024_MoE_Mamba_selective_SSM_mixture_experts_2401.04081.pdf) · [BlackMamba (2024)](papers/2024_BlackMamba_MoE_state_space_models_2402.01771.pdf) · [Routing-Mamba (2025)](papers/2025_Routing_Mamba_MoE_projection_SSM_2506.18145.pdf) · [Swimba (2026)](papers/2026_Swimba_Switch_Mamba_MoE_SSM_2603.06938.pdf) · [RMoE (Qiu 2024)](papers/2024_RMoE_Layerwise_Recurrent_Router_MoE_Qiu_2408.06793.pdf) · [RIMs (Goyal 2019)](papers/2019_Recurrent_Independent_Mechanisms_RIMs_1909.10893.pdf) · [σ-MoE (Csordás 2023)](papers/2023_sigma_MoE_approximating_two_layer_FFN_2310.10837.pdf) · [SwitchHead (Csordás 2023)](papers/2023_SwitchHead_accelerating_transformers_MoE_attention_2312.07987.pdf)

**Expressive / multi-timescale neurons** — [ELM (Spieler 2023)](papers/2023_Expressive_Leaky_Memory_Neuron_2306.16922.pdf) · [Scaling Laws for Recurrent Expressive Neurons (2026)](papers/2026_Scaling_Laws_Recurrent_Expressive_Neurons_2605.12049.pdf)

**Recursive reasoning / test-time memory** — [GRAM (Baek 2026)](papers/2026_GRAM_Generative_Recursive_Reasoning_Baek_2605.19376.pdf) · [HRM (2025)](papers/2025_HRM_Hierarchical_Reasoning_Model_Wang_2506.21734.pdf) · [TRM (2025)](papers/2025_TRM_Tiny_Recursive_Model_Jolicoeur-Martineau_2510.04871.pdf) · [Titans (Behrouz 2025)](papers/2025_Titans_Learning_to_Memorize_at_Test_Time_Behrouz_2501.00663.pdf)

**Spiking basis + neuromorphic** — [LSNN (2018)](papers/2018_LSNN_LSTM_learning_in_spiking_neurons_1803.09574.pdf) · [ALIF/Yin (2021)](papers/2021_Yin_Corradi_Bohte_ALIF_SHD_2103.12593.pdf) · [e-prop (2019)](papers/2019_eprop_alternatives_to_BPTT_recurrent_SNN_1901.09049.pdf) · [SpikingBrain (2025)](papers/2025_SpikingBrain_Brain_inspired_Large_Models_2509.05276.pdf) · [SHD dataset (2019)](papers/2019_Heidelberg_Spiking_Datasets_Cramer_1910.07407.pdf) · [Recurrent spiking robot control (Traub & Otte 2021)](papers/2021_Many_Joint_Robot_Recurrent_Spiking_NN_Traub_Otte_2104.04064.pdf) · [Loihi 2 (2021)](papers/2021_Loihi2_Orchard_2111.03746.pdf) · [SpiNNaker2 (2021)](papers/2021_SpiNNaker2_Mayr_2103.08392.pdf) · [Tianjic (2019)](papers/2019_Tianjic_Pei_Nature.pdf)

**Hybrid at scale** — [Jamba (2024)](papers/2024_Jamba_hybrid_Transformer_Mamba_MoE_2403.19887.pdf) · [Jamba-1.5 (2024)](papers/2024_Jamba_1_5_hybrid_Transformer_Mamba_at_scale_2408.12570.pdf) · [Nemotron-H (2025)](papers/2025_Nemotron_H_Hybrid_Mamba_Transformer_NVIDIA_2504.03624.pdf) · [Nemotron 3 (2025)](papers/2025_Nemotron_3_White_Paper_NVIDIA.pdf) · [Nemotron 3 Super — LatentMoE (2025)](papers/2025_Nemotron_3_Super_LatentMoE_NVIDIA.pdf) · [Nemotron Nano 2 (2025)](papers/2025_Nemotron_Nano_2_Hybrid_Mamba_Transformer_Reasoning_2508.14444.pdf)

**GPU sparse kernels / efficiency limits** — [FlashLLM (2023)](papers/2023_FlashLLM_unstructured_sparsity_tensor_cores_2309.10285.pdf) · [SparseRT (2020)](papers/2020_SparseRT_unstructured_sparsity_GPU_inference_2008.11849.pdf) · [FlashSparse (2024)](papers/2024_FlashSparse_SpMM_TC_m16n8k8_2412.11007.pdf) · [SparStencil (2025)](papers/2025_SparStencil_Sparse_TC_Stencil_2506.22969.pdf) · [Sparsity Roofline (Gale 2023)](papers/2023_Sparsity_Roofline_hardware_limits_sparse_networks_2310.00496.pdf)

**Pruning / sparse training** — [Lottery Ticket (2018)](papers/2018_Lottery_Ticket_Hypothesis_Frankle_Carbin_1803.03635.pdf) · [RigL (2020)](papers/2020_RigL_Rigging_Lottery_Evci_1911.11134.pdf) · [Random Pruning (Liu 2022)](papers/2022_Unreasonable_Effectiveness_Random_Pruning_Liu_2202.02643.pdf)

**Spatial connectivity (probe)** — [SpSNN (Landsmeer 2025)](papers/2025_Spatial_Spiking_Neural_Networks_2512.10011.pdf)

**Capability probe** — [MQAR / Zoology (Arora 2023)](papers/2023_MQAR_Zoology_associative_recall_Arora_2312.04927.pdf)
