Skip to main content

🔐 Logistic Regression

Logistic regression is a linear model for binary classification. It estimates the probability that an input x\mathbf{x} belongs to the positive class.

Hypothesis

The model applies the logistic (sigmoid) function to a linear combination of the inputs: hθ(x)=σ(xTθ)=11+exTθ.h_{\boldsymbol{\theta}}(\mathbf{x}) = \sigma(\mathbf{x}^T\boldsymbol{\theta}) = \frac{1}{1 + e^{-\mathbf{x}^T\boldsymbol{\theta}}}.

Cost Function

Parameters are learned by minimizing the logistic loss: J(θ)=1mi=1m[y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))].J(\boldsymbol{\theta}) = -\frac{1}{m} \sum_{i=1}^m \big[ y^{(i)} \log h_{\boldsymbol{\theta}}(\mathbf{x}^{(i)}) + (1 - y^{(i)}) \log (1 - h_{\boldsymbol{\theta}}(\mathbf{x}^{(i)})) \big].

Example (scikit-learn)

import numpy as np
from sklearn.linear_model import LogisticRegression

X = np.array([[0], [1], [2], [3]])
y = np.array([0, 0, 1, 1])

model = LogisticRegression()
model.fit(X, y)

proba = model.predict_proba([[1.5]])
print(proba)

Interpretation

  • Outputs a probability between 0 and 1.
  • Decision boundary at hθ(x)=0.5h_{\boldsymbol{\theta}}(\mathbf{x}) = 0.5.
  • For multi-class problems, use one-vs-rest or a softmax regression extension.