| 46 | |
| 47 | |
| 48 | class GeneralizedLinearModel: |
| 49 | def __init__(self, link, fit_intercept=True, tol=1e-5, max_iter=100): |
| 50 | r""" |
| 51 | A generalized linear model with maximum likelihood fit via |
| 52 | iteratively reweighted least squares (IRLS). |
| 53 | |
| 54 | Notes |
| 55 | ----- |
| 56 | The generalized linear model (GLM) [7]_ [8]_ assumes that each target/dependent |
| 57 | variable :math:`y_i` in target vector :math:`\mathbf{y} = (y_1, \ldots, |
| 58 | y_n)`, has been drawn independently from a pre-specified distribution |
| 59 | in the exponential family [11]_ with unknown mean :math:`\mu_i`. The GLM |
| 60 | models a (one-to-one, continuous, differentiable) function, *g*, of |
| 61 | this mean value as a linear combination of the model parameters |
| 62 | :math:`\mathbf{b}` and observed covariates, :math:`\mathbf{x}_i`: |
| 63 | |
| 64 | .. math:: |
| 65 | |
| 66 | g(\mathbb{E}[y_i \mid \mathbf{x}_i]) = |
| 67 | g(\mu_i) = \mathbf{b}^\top \mathbf{x}_i |
| 68 | |
| 69 | where *g* is known as the "link function" associated with the GLM. The |
| 70 | choice of link function is informed by the instance of the exponential |
| 71 | family the target is drawn from. Common examples: |
| 72 | |
| 73 | .. csv-table:: |
| 74 | :header: "Distribution", "Link", "Formula" |
| 75 | :widths: 25, 20, 30 |
| 76 | |
| 77 | "Normal", "Identity", ":math:`g(x) = x`" |
| 78 | "Bernoulli", "Logit", ":math:`g(x) = \log(x) - \log(1 - x)`" |
| 79 | "Binomial", "Logit", ":math:`g(x) = \log(x) - \log(n - x)`" |
| 80 | "Poisson", "Log", ":math:`g(x) = \log(x)`" |
| 81 | |
| 82 | An iteratively re-weighted least squares (IRLS) algorithm [9]_ can be |
| 83 | employed to find the maximum likelihood estimate for the model |
| 84 | parameters :math:`\beta` in any instance of the generalized linear |
| 85 | model. IRLS is equivalent to Fisher scoring [10]_, which itself is |
| 86 | a slight modification of classic Newton-Raphson for finding the zeros |
| 87 | of the first derivative of the model log-likelihood. |
| 88 | |
| 89 | References |
| 90 | ---------- |
| 91 | .. [7] Nelder, J., & Wedderburn, R. (1972). Generalized linear |
| 92 | models. *Journal of the Royal Statistical Society, Series A |
| 93 | (General), 135(3)*: 370–384. |
| 94 | .. [8] https://en.wikipedia.org/wiki/Generalized_linear_model |
| 95 | .. [9] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares |
| 96 | .. [10] https://en.wikipedia.org/wiki/Scoring_algorithm |
| 97 | .. [11] https://en.wikipedia.org/wiki/Exponential_family |
| 98 | |
| 99 | Parameters |
| 100 | ---------- |
| 101 | link: {'identity', 'logit', 'log'} |
| 102 | The link function to use during modeling. |
| 103 | fit_intercept: bool |
| 104 | Whether to fit an intercept term in addition to the model |
| 105 | coefficients. Default is True. |