Computes the polynomial regression model parameters using ordinary least squares (OLS) estimation: β = (XᵀX)⁻¹Xᵀy = X⁺y where X⁺ denotes the Moore-Penrose pseudoinverse of the design matrix X. This function computes X⁺ using singular value decomposition (SV
(self, x_train: np.ndarray, y_train: np.ndarray)
| 100 | return np.vander(data, N=degree + 1, increasing=True) |
| 101 | |
| 102 | def fit(self, x_train: np.ndarray, y_train: np.ndarray) -> None: |
| 103 | """ |
| 104 | Computes the polynomial regression model parameters using ordinary least squares |
| 105 | (OLS) estimation: |
| 106 | |
| 107 | β = (XᵀX)⁻¹Xᵀy = X⁺y |
| 108 | |
| 109 | where X⁺ denotes the Moore-Penrose pseudoinverse of the design matrix X. This |
| 110 | function computes X⁺ using singular value decomposition (SVD). |
| 111 | |
| 112 | References: |
| 113 | - https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse |
| 114 | - https://en.wikipedia.org/wiki/Singular_value_decomposition |
| 115 | - https://en.wikipedia.org/wiki/Multicollinearity |
| 116 | |
| 117 | @param x_train: the predictor values x for model fitting |
| 118 | @param y_train: the response values y for model fitting |
| 119 | @raises ArithmeticError: if X isn't full rank, then XᵀX is singular and β |
| 120 | doesn't exist |
| 121 | |
| 122 | >>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) |
| 123 | >>> y = x**3 - 2 * x**2 + 3 * x - 5 |
| 124 | >>> poly_reg = PolynomialRegression(degree=3) |
| 125 | >>> poly_reg.fit(x, y) |
| 126 | >>> poly_reg.params |
| 127 | array([-5., 3., -2., 1.]) |
| 128 | >>> poly_reg = PolynomialRegression(degree=20) |
| 129 | >>> poly_reg.fit(x, y) |
| 130 | Traceback (most recent call last): |
| 131 | ... |
| 132 | ArithmeticError: Design matrix is not full rank, can't compute coefficients |
| 133 | |
| 134 | Make sure errors don't grow too large: |
| 135 | >>> coefs = np.array([-250, 50, -2, 36, 20, -12, 10, 2, -1, -15, 1]) |
| 136 | >>> y = PolynomialRegression._design_matrix(x, len(coefs) - 1) @ coefs |
| 137 | >>> poly_reg = PolynomialRegression(degree=len(coefs) - 1) |
| 138 | >>> poly_reg.fit(x, y) |
| 139 | >>> np.allclose(poly_reg.params, coefs, atol=10e-3) |
| 140 | True |
| 141 | """ |
| 142 | X = PolynomialRegression._design_matrix(x_train, self.degree) # noqa: N806 |
| 143 | _, cols = X.shape |
| 144 | if np.linalg.matrix_rank(X) < cols: |
| 145 | raise ArithmeticError( |
| 146 | "Design matrix is not full rank, can't compute coefficients" |
| 147 | ) |
| 148 | |
| 149 | # np.linalg.pinv() computes the Moore-Penrose pseudoinverse using SVD |
| 150 | self.params = np.linalg.pinv(X) @ y_train |
| 151 | |
| 152 | def predict(self, data: np.ndarray) -> np.ndarray: |
| 153 | """ |