Filters a DataFrame according to its nullity, using some combination of 'top' and 'bottom' numerical and percentage values. Percentages and numerical thresholds can be specified simultaneously: for example, to get a DataFrame with columns of at least 75% completeness but with no more th
(df, filter=None, p=0, n=0)
| 72 | |
| 73 | |
| 74 | def nullity_filter(df, filter=None, p=0, n=0): |
| 75 | """ |
| 76 | Filters a DataFrame according to its nullity, using some combination of 'top' and 'bottom' numerical and |
| 77 | percentage values. Percentages and numerical thresholds can be specified simultaneously: for example, |
| 78 | to get a DataFrame with columns of at least 75% completeness but with no more than 5 columns, use |
| 79 | `nullity_filter(df, filter='top', p=.75, n=5)`. |
| 80 | |
| 81 | :param df: The DataFrame whose columns are being filtered. |
| 82 | :param filter: The orientation of the filter being applied to the DataFrame. One of, "top", "bottom", |
| 83 | or None (default). The filter will simply return the DataFrame if you leave the filter argument unspecified or |
| 84 | as None. |
| 85 | :param p: A completeness ratio cut-off. If non-zero the filter will limit the DataFrame to columns with at least p |
| 86 | completeness. Input should be in the range [0, 1]. |
| 87 | :param n: A numerical cut-off. If non-zero no more than this number of columns will be returned. |
| 88 | :return: The nullity-filtered `DataFrame`. |
| 89 | """ |
| 90 | _df = df |
| 91 | if filter == "top": |
| 92 | if p: |
| 93 | _df = _p_top_complete_filter(_df, p) |
| 94 | if n: |
| 95 | _df = _n_top_complete_filter(_df, n) |
| 96 | elif filter == "bottom": |
| 97 | if p: |
| 98 | _df = _p_bottom_complete_filter(_df, p) |
| 99 | if n: |
| 100 | _df = _n_bottom_complete_filter(_df, n) |
| 101 | return _df |
| 102 | |
| 103 | |
| 104 | def matrix(df, |
no test coverage detected