Example df generated by this function: | event_timestamp | num_rides | avg_ride_length | created | |------------------+-------------+-----------------+------------------| | 2021-03-17 19:00 | 99 | 0.889188 | 2021-03-24 19:38 | | 2021-03-18 19:00 | 52
(start_date, end_date)
| 279 | |
| 280 | |
| 281 | def create_global_daily_stats_df(start_date, end_date) -> pd.DataFrame: |
| 282 | """ |
| 283 | Example df generated by this function: |
| 284 | |
| 285 | | event_timestamp | num_rides | avg_ride_length | created | |
| 286 | |------------------+-------------+-----------------+------------------| |
| 287 | | 2021-03-17 19:00 | 99 | 0.889188 | 2021-03-24 19:38 | |
| 288 | | 2021-03-18 19:00 | 52 | 0.979273 | 2021-03-24 19:38 | |
| 289 | | 2021-03-19 19:00 | 66 | 0.976549 | 2021-03-24 19:38 | |
| 290 | | 2021-03-20 19:00 | 84 | 0.273697 | 2021-03-24 19:38 | |
| 291 | | 2021-03-21 19:00 | 89 | 0.438262 | 2021-03-24 19:38 | |
| 292 | | | ... | ... | | |
| 293 | | 2021-03-24 19:00 | 54 | 0.738860 | 2021-03-24 19:38 | |
| 294 | | 2021-03-25 19:00 | 58 | 0.848397 | 2021-03-24 19:38 | |
| 295 | | 2021-03-26 19:00 | 69 | 0.301552 | 2021-03-24 19:38 | |
| 296 | | 2021-03-27 19:00 | 63 | 0.943030 | 2021-03-24 19:38 | |
| 297 | | 2021-03-28 19:00 | 79 | 0.354919 | 2021-03-24 19:38 | |
| 298 | """ |
| 299 | df_daily = pd.DataFrame( |
| 300 | { |
| 301 | "event_timestamp": [ |
| 302 | pd.Timestamp( |
| 303 | dt, |
| 304 | unit="ms", |
| 305 | ).round("ms") |
| 306 | for dt in pd.date_range( |
| 307 | start=start_date, |
| 308 | end=end_date, |
| 309 | freq="1D", |
| 310 | inclusive="left", |
| 311 | tz="UTC", |
| 312 | ) |
| 313 | ] |
| 314 | } |
| 315 | ) |
| 316 | rows = df_daily["event_timestamp"].count() |
| 317 | |
| 318 | df_daily["num_rides"] = np.random.randint(50, 100, size=rows).astype(np.int32) |
| 319 | df_daily["avg_ride_length"] = np.random.random(size=rows).astype(np.float32) |
| 320 | |
| 321 | # TODO: Remove created timestamp in order to test whether its really optional |
| 322 | df_daily["created"] = pd.to_datetime(pd.Timestamp.now(tz=None).round("ms")) |
| 323 | return df_daily |
| 324 | |
| 325 | |
| 326 | def create_field_mapping_df(start_date, end_date) -> pd.DataFrame: |
no test coverage detected