# [DL] Hyperplane Classifier

## Perceptron Algorithm

• Converge to a separating hyperplane if data are linearly separable
• Does not terminate otherwise

Let $\thetab = (b, w_1, w_2, \cdots, w_d)^\top$ and $\xb = (1, x_1, x_2, \cdots, x_d)^\top$

Perceptron Algorithm

$$\thetab^{(t)} = \thetab^{(t-1)} + \eta_t\, (y^{(t)} - \hat{y}^{(t)})\,\xb^{(t)}$$ where $$\hat{y}_i = {\rm sign}\big(\thetab^\top \xb^{(t)}\big)$$

A short proof of the convergence of perceptron algorithm: Here

#### Perceptron Algorithm as Stochastic Gradient Descent

Loss function (rectified linear unit - relu)

Stochastic approximation of gradient descent algorithm. It replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset $S$ of the data).
$$\nabla J = \frac{1}{m} \sum_{i \in S} \nabla_\thetab\, L\big(y_i, f(\thetab, \xb_i)\big)$$