Abstract
Standard practice in neural network optimization treats representational freedom as an unqualified good: more dimensions, fewer constraints, better generalization. This paper challenges that assumption.
We demonstrate that imposing hard structural boundaries on neural network representational space — specifically, Morse barrier functions that constrain activations to a bounded domain — increases Fisher information density (the amount of useful information per representational dimension) without degrading task performance. This result holds across architectures (MLP, CNN, Transformer) and scales (16-dimensional to 768-dimensional representations).
The mechanism is geometric: when a hard boundary eliminates the radial degree of freedom in representation space, the network is forced to encode discriminative information in angular structure. This “information channeling” effect produces representations that are simultaneously more compact and more informative — a form of forced parameter efficiency that emerges from topology, not from regularization.
Key Results
The paper presents experimental results across five progressively challenging configurations:
| Experiment | Architecture | Δ Fisher Information | Accuracy Impact |
|---|---|---|---|
| MNIST / MLP (16D) | MLP | +104% (p = 2×10−6) | +0.14% |
| CIFAR-10 (32–256D) | CNN | +87% to +102% | +0.1–0.4% |
| MNIST (16–64D) | MLP | +508% to +642% | Preserved |
| GPT-2 (768D) | Transformer | +28.1% | PPL 26.7 (converging) |
| Mistral-7B Safety Probes | 7B Transformer | N/A (probe study) | 98.2% probe accuracy |
The External-Substrate Constraint Principle
The paper’s central architectural insight is that constraints must operate on a different substrate than the optimization objective to be effective. The Morse barrier operates on the representational geometry (external substrate), not on the loss function (same substrate).
Ablation studies confirm this distinction: L2 penalty, ArcFace loss, and other loss-function-based regularizers either fail to produce the density enhancement or produce qualitatively different effects. Only hard projection (PGD) and norm clipping (which converges to PGD at high dimensionality) produce the characteristic combination of density enhancement and dimensional compression.
This finding has direct implications for safety certification: safety constraints that operate as loss penalties can be optimized away by a sufficiently capable model, while safety constraints that operate as hard structural boundaries cannot.
Implications for AI Safety
The paper includes a probe study on Mistral-7B examining how alignment techniques affect the linear separability of safe vs. unsafe representations:
- Base model: Safety-relevant concepts are linearly separable in representation space (98.2% probe accuracy)
- After RLHF: Linear separability degrades by 28.3 percentage points, with an 84% collapse in the separation gap between safe and unsafe clusters
This suggests that current alignment techniques (RLHF, DPO) may inadvertently destroy the geometric structure that makes safety monitoring tractable. Hard structural boundaries — which operate on geometry, not on the loss function — preserve this structure by construction.
Flatter Minima Without Regularization
Constrained representations converge to flatter loss landscape minima (+43% Hessian eigenvalue ratio), a property normally associated with better generalization. Unlike standard flatness-seeking methods (SAM, SWA, label smoothing), this flatness emerges as a geometric consequence of the boundary constraint, not as an optimization target. The boundary eliminates sharp minima near the representational periphery, naturally biasing optimization toward the interior of the feasible region.
Geometric Ratchet Effect
At the GPT-2 scale (768D), the barrier produces a “ratchet” effect: representational quality (measured by Fisher information density) is preserved during fine-tuning, even as the model adapts to new tasks. Without the barrier, fine-tuning degrades representational structure. With the barrier, the geometric constraint prevents the optimization process from “unwinding” the informative angular structure established during pre-training.
Connection to QAE Safety Kernel
The findings in this paper provide the theoretical foundation for the QAE Safety Kernel’s approach to constraint-based certification:
- Hard boundaries work better than soft penalties. The QAE kernel evaluates constraints as hard margins (pass/fail with continuous distance measurement), not as penalty terms in an optimization objective. The paper validates that this architectural choice is not merely a design preference — it produces measurably better outcomes.
- Constraints increase, not decrease, system capability. The counterintuitive result that bounded representations are more informative per dimension parallels the QAE kernel’s design: adding constraint channels does not degrade certification throughput or decision quality. It improves bottleneck identification and drift budget accuracy.
- External-substrate safety is structurally robust. The paper demonstrates that safety properties encoded as geometric constraints cannot be optimized away, while safety properties encoded as loss penalties can. This supports the QAE kernel’s architecture of evaluating safety on a separate substrate (the constraint graph) from the system’s primary optimization objective.
Citation
Tennant, W. (2026). Containment as Catalyst: Topological Bounds on Representation Geometry as a Mechanism for Forced Parameter Efficiency. figshare. https://doi.org/10.6084/m9.figshare.31742857