MCPcopy
hub / github.com/TheAlgorithms/Python / report_generator

Function report_generator

machine_learning/k_means_clust.py:218–357  ·  view source on GitHub ↗

Generate a clustering report given these two arguments: predicted - dataframe with predicted cluster column fill_missing_report - dictionary of rules on how we are going to fill in missing values for final generated report (not included in modelling); >>> predicted =

(
    predicted: pd.DataFrame, clustering_variables: np.ndarray, fill_missing_report=None
)

Source from the content-addressed store, hash-verified

216
217
218def report_generator(
219 predicted: pd.DataFrame, clustering_variables: np.ndarray, fill_missing_report=None
220) -> pd.DataFrame:
221 """
222 Generate a clustering report given these two arguments:
223 predicted - dataframe with predicted cluster column
224 fill_missing_report - dictionary of rules on how we are going to fill in missing
225 values for final generated report (not included in modelling);
226 >>> predicted = pd.DataFrame()
227 >>> predicted['numbers'] = [1, 2, 3]
228 >>> predicted['col1'] = [0.5, 2.5, 4.5]
229 >>> predicted['col2'] = [100, 200, 300]
230 >>> predicted['col3'] = [10, 20, 30]
231 >>> predicted['Cluster'] = [1, 1, 2]
232 >>> report_generator(predicted, ['col1', 'col2'], 0)
233 Features Type Mark 1 2
234 0 # of Customers ClusterSize False 2.000000 1.000000
235 1 % of Customers ClusterProportion False 0.666667 0.333333
236 2 col1 mean_with_zeros True 1.500000 4.500000
237 3 col2 mean_with_zeros True 150.000000 300.000000
238 4 numbers mean_with_zeros False 1.500000 3.000000
239 .. ... ... ... ... ...
240 99 dummy 5% False 1.000000 1.000000
241 100 dummy 95% False 1.000000 1.000000
242 101 dummy stdev False 0.000000 NaN
243 102 dummy mode False 1.000000 1.000000
244 103 dummy median False 1.000000 1.000000
245 <BLANKLINE>
246 [104 rows x 5 columns]
247 """
248 # Fill missing values with given rules
249 if fill_missing_report:
250 predicted = predicted.fillna(value=fill_missing_report)
251 predicted["dummy"] = 1
252 numeric_cols = predicted.select_dtypes(np.number).columns
253 report = (
254 predicted.groupby(["Cluster"])[ # construct report dataframe
255 numeric_cols
256 ] # group by cluster number
257 .agg(
258 [
259 ("sum", "sum"),
260 ("mean_with_zeros", lambda x: np.mean(np.nan_to_num(x))),
261 ("mean_without_zeros", lambda x: x.replace(0, np.nan).mean()),
262 (
263 "mean_25-75",
264 lambda x: np.mean(
265 np.nan_to_num(
266 sorted(x)[
267 round(len(x) * 25 / 100) : round(len(x) * 75 / 100)
268 ]
269 )
270 ),
271 ),
272 ("mean_with_na", "mean"),
273 ("min", lambda x: x.min()),
274 ("5%", lambda x: x.quantile(0.05)),
275 ("25%", lambda x: x.quantile(0.25)),

Callers

nothing calls this directly

Calls 3

countMethod · 0.80
copyMethod · 0.80
assignMethod · 0.80

Tested by

no test coverage detected