| 3 | |
| 4 | |
| 5 | class GaussianNBClassifier: |
| 6 | def __init__(self, eps=1e-6): |
| 7 | r""" |
| 8 | A naive Bayes classifier for real-valued data. |
| 9 | |
| 10 | Notes |
| 11 | ----- |
| 12 | The naive Bayes model assumes the features of each training example |
| 13 | :math:`\mathbf{x}` are mutually independent given the example label |
| 14 | *y*: |
| 15 | |
| 16 | .. math:: |
| 17 | |
| 18 | P(\mathbf{x}_i \mid y_i) = \prod_{j=1}^M P(x_{i,j} \mid y_i) |
| 19 | |
| 20 | where :math:`M` is the rank of the :math:`i^{th}` example |
| 21 | :math:`\mathbf{x}_i` and :math:`y_i` is the label associated with the |
| 22 | :math:`i^{th}` example. |
| 23 | |
| 24 | Combining the conditional independence assumption with a simple |
| 25 | application of Bayes' theorem gives the naive Bayes classification |
| 26 | rule: |
| 27 | |
| 28 | .. math:: |
| 29 | |
| 30 | \hat{y} &= \arg \max_y P(y \mid \mathbf{x}) \\ |
| 31 | &= \arg \max_y P(y) P(\mathbf{x} \mid y) \\ |
| 32 | &= \arg \max_y P(y) \prod_{j=1}^M P(x_j \mid y) |
| 33 | |
| 34 | In the final expression, the prior class probability :math:`P(y)` can |
| 35 | be specified in advance or estimated empirically from the training |
| 36 | data. |
| 37 | |
| 38 | In the Gaussian version of the naive Bayes model, the feature |
| 39 | likelihood is assumed to be normally distributed for each class: |
| 40 | |
| 41 | .. math:: |
| 42 | |
| 43 | \mathbf{x}_i \mid y_i = c, \theta \sim \mathcal{N}(\mu_c, \Sigma_c) |
| 44 | |
| 45 | where :math:`\theta` is the set of model parameters: :math:`\{\mu_1, |
| 46 | \Sigma_1, \ldots, \mu_K, \Sigma_K\}`, :math:`K` is the total number of |
| 47 | unique classes present in the data, and the parameters for the Gaussian |
| 48 | associated with class :math:`c`, :math:`\mu_c` and :math:`\Sigma_c` |
| 49 | (where :math:`1 \leq c \leq K`), are estimated via MLE from the set of |
| 50 | training examples with label :math:`c`. |
| 51 | |
| 52 | Parameters |
| 53 | ---------- |
| 54 | eps : float |
| 55 | A value added to the variance to prevent numerical error. Default |
| 56 | is 1e-6. |
| 57 | |
| 58 | Attributes |
| 59 | ---------- |
| 60 | parameters : dict |
| 61 | Dictionary of model parameters: "mean", the `(K, M)` array of |
| 62 | feature means under each class, "sigma", the `(K, M)` array of |
no outgoing calls