Definition

An Energy-Based Model (EBM) does not directly output a probability or prediction, Instead, it learns an energy function $E(x, y)$ that measures how well an input $x$ fits a possiable output $y$. And correct outputs should have lower energy.

Lower energy means better compatibility
During inference, the goal is to find the $y$ with lowest energy

$$ y^* = \arg \min_y E(x, y) $$

Difference with Traditional Neural Network

	Traditional Neural Network	Energy-Based Model
Idea	Learn a mapping $f(x)$ to directly predict $y$	Learn an energy function $E(x, y)$ to evaluate how well $x$ matches $y$
Goal	Make the predict $y$ close to the true label	Make the correct $y$ has the lower energy than others
Inference	Compute $y = f(x)$	Evaluate the energy for different y and choose the one with the lowest value

Example:

Traditional Neural Network:
- Input an image, the model outputs “cat”.
Energy-Based Model:
- Input a cat image, compute the energy between the image and different labels, and select the label with lowest energy, such as “cat”
- When the output space is large or continuous, the model usually uses optimization or sampling methods instead of enumerating all possible $y$