Publications.
Selected preprints, conference papers, and technical notes on the thermodynamics of intelligence and mechanistic interpretability.
Metanthropic Research
I publish the majority of my formal research under the Metanthropic charter. Access our full archives, including safety evaluations and interpretability logs.
SPECIFICATION: Metanthropic Neural Ablation via Attention Refraction (M-NAAR)
Introduces M-NAAR to resolve the 'Unlearning Trilemma.' By refracting attention away from high-entropy tokens rather than destroying weights, we achieve 0.00 hallucination rates and robust deletion without lobotomizing the model.
Specification for Latent Logic Topology & Soundness-Aware Calibration
Operationalizes LLMs as engines of 'Latent Causal Chains' to solve the RLVR Convergence Paradox. Introduces the Soundness-Aware Level (SAL), a microscopic metric that predicts post-alignment reasoning performance with 87% accuracy.
The Kinetic-Potential Information Disentanglement Protocol (KP-IDP)
Invalidates the dangerous conflation that Decodability equals Causality. Introduces KP-IDP to distinguish between 'Dark Computation' (Kinetic) and 'Phantom Readouts' (Potential), solving the intervention-reversal paradox.
Module 003-CFG: Chronometric Flux Gating
A dynamic regularization protocol that eliminates Latent Manifold Collapse in Sparse Autoencoders. By treating feature importance as a temporal trajectory, CFG reduces feature absorption by 95% compared to Top-K baselines.
PROJECT OBLIQUE-GUARD: Latent Geometry Stabilization
Demonstrates that adversarial vulnerability is a deterministic artifact of Superposition. Introduces the Oblique-Guard Layer to filter geometric exploits by treating them as unique digital signatures within the interference lattice.
Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
Proves that SFT introduces the 'Knobe Effect' moral asymmetry where negative outcomes are judged as more intentional. Proposes surgical Iso-Semantic Residual Injection (ISRI) to restore logical neutrality without degrading general reasoning.
Arvi 20B: Democratizing Reasoning with Efficient MoEs
An open-weight Mixture-of-Experts reasoning model. With 20.9B total parameters and only 3.6B active parameters, it rivals frontier models on math, coding, and agentic benchmarks through variable effort reasoning.
MahenOCR: Commercial-Grade OCR with a 1B Parameter VLM
A 1B parameter vision-language model achieving state-of-the-art OCR through a unified end-to-end architecture. Utilizes Reinforcement Learning with Verifiable Rewards (RLVR) to eliminate cascaded module error propagation.
The Fragility of Guardrails: Cognitive Jamming and Repetition Collapse in Safety-Steered LLMs
A mechanistic audit of LLM residual streams using Sparse Autoencoders (SAEs). Demonstrates that aggressive safety-steering vectors often interfere with latent world-modeling circuits, triggering 'Cognitive Jamming'.
Dataset Distillation for the Pre-Training Era
Introduces Linear Gradient Matching (LGM) to condense massive datasets into a single synthetic image per class, revealing shared 'Platonic' representations across foundation models (CLIP, DINO-v2).
Announcing Metanthropic
Founding declaration of Metanthropic, a frontier research institution architecting deterministic AI systems where safety and reasoning are verifiable, intrinsic properties of intelligence.