An Energy-Based Model (EBM) does not directly output a probability or prediction, Instead, it learns an energy function $E(x, y)$ that measures how well an input $x$ fits a possiable output $y$. And correct outputs should have lower energy.
Lower energy means better compatibility
During inference, the goal is to find the $y$ with lowest energy
$$ y^* = \arg \min_y E(x, y) $$
| Traditional Neural Network | Energy-Based Model | |
|---|---|---|
| Idea | Learn a mapping $f(x)$ to directly predict $y$ | Learn an energy function $E(x, y)$ to evaluate how well $x$ matches $y$ |
| Goal | Make the predict $y$ close to the true label | Make the correct $y$ has the lower energy than others |
| Inference | Compute $y = f(x)$ | Evaluate the energy for different y and choose the one with the lowest value |
Example: