Step through multi-head self-attention on image patches to see exactly how queries, keys, and values are computed, how attention scores become weights, and how multiple heads combine to produce the output. Uses tiny random weights running entirely in the browser.
See also: Vision Models in Action for a full ViT forward pass on MNIST.
Draw something on the canvas (or pick a sample pattern), then click Run Attention to step through multi-head self-attention.
Choose between Random Weights (explore the mechanics) and Pretrained MNIST (see real learned attention patterns on handwritten digits). Adjust patch size, embedding dimension, and number of heads in the Architecture panel.