Don't Just Read the Paper, Implement It

Paperman converts bleeding-edge papers into hands-on coding challenges you can solve in your browser.

arXiv:1706.03762
Attention Is All You Need
Vaswani et al., 2017
Abstract
Attention(Q,K,V) = softmax(QKT/√dk)V
PaperMan
Attention Is All You Need

Implement the scaled dot-product attention:

def attention(Q, K, V):
# Your code here
scores = Q @ K.T / √d_k
3/5 tests passing

Used by engineers from

Google
Meta
OpenAI
DeepMind
Anthropic
NVIDIA
Microsoft
Apple
Google
Meta
OpenAI
DeepMind
Anthropic
NVIDIA
Microsoft
Apple

How it Works

HOW PAPERMAN WORKS

1. Select a Paper

Choose from a curated list of top AI research papers.

📄
📄
📄

2. Understand the Math

Grasp the underlying concepts with interactive explanations.

∂L/∂w = -2Σ(yᵢ - ŷᵢ)xᵢ
Attention(Q,K,V) = softmax(QKᵀ/√dₖ)V
σ(z) = 1 / (1 + e⁻ᶻ)
Interactive
Step-by-step

3. Implement the Code

Build the model function by function in our live coding environment.

def
attention(Q, K, V):
# Your implementation
scores = Q @ K.T
weights = softmax(scores)
return weights @ V
✓ 3/3 tests passed