6.02.2026
Wall Attention: Length Generalization With Diagonal Gates
We generalize diagonal gating from linear RNNs to softmax attention, introducing Wall Attention - a data-dependent positional encoding that achieves strong length generalization and beats RoPE across the board.