Enrich an entity dataframe with historical feature values for either training or batch scoring. This method joins historical feature data from one or more feature views to an entity dataframe by using a time travel join. Alternatively, features can be retrieved for a specific timest
(
self,
entity_df: Optional[Union[pd.DataFrame, str]] = None,
features: Union[List[str], FeatureService] = [],
full_feature_names: bool = False,
start_date: Optional[datetime] = None,
end_date: Optional[datetime] = None,
)
| 1677 | self.registry.teardown() |
| 1678 | |
| 1679 | def get_historical_features( |
| 1680 | self, |
| 1681 | entity_df: Optional[Union[pd.DataFrame, str]] = None, |
| 1682 | features: Union[List[str], FeatureService] = [], |
| 1683 | full_feature_names: bool = False, |
| 1684 | start_date: Optional[datetime] = None, |
| 1685 | end_date: Optional[datetime] = None, |
| 1686 | ) -> RetrievalJob: |
| 1687 | """Enrich an entity dataframe with historical feature values for either training or batch scoring. |
| 1688 | |
| 1689 | This method joins historical feature data from one or more feature views to an entity dataframe by using a time |
| 1690 | travel join. Alternatively, features can be retrieved for a specific timestamp range without requiring an entity |
| 1691 | dataframe. |
| 1692 | |
| 1693 | Each feature view is joined to the entity dataframe using all entities configured for the respective feature |
| 1694 | view. All configured entities must be available in the entity dataframe. Therefore, the entity dataframe must |
| 1695 | contain all entities found in all feature views, but the individual feature views can have different entities. |
| 1696 | |
| 1697 | Time travel is based on the configured TTL for each feature view. A shorter TTL will limit the |
| 1698 | amount of scanning that will be done in order to find feature data for a specific entity key. Setting a short |
| 1699 | TTL may result in null values being returned. |
| 1700 | |
| 1701 | Args: |
| 1702 | features: The list of features that should be retrieved from the offline store. These features can be |
| 1703 | specified either as a list of string feature references or as a feature service. String feature |
| 1704 | references must have format "feature_view:feature", e.g. "customer_fv:daily_transactions". |
| 1705 | entity_df (Optional[Union[pd.DataFrame, str]]): An entity dataframe is a collection of rows containing all entity |
| 1706 | columns (e.g., customer_id, driver_id) on which features need to be joined, as well as a event_timestamp |
| 1707 | column used to ensure point-in-time correctness. Either a Pandas DataFrame can be provided or a string |
| 1708 | SQL query. The query must be of a format supported by the configured offline store (e.g., BigQuery). |
| 1709 | If not provided, features will be retrieved for the specified timestamp range without entity joins. |
| 1710 | full_feature_names: If True, feature names will be prefixed with the corresponding feature view name, |
| 1711 | changing them from the format "feature" to "feature_view__feature" (e.g. "daily_transactions" |
| 1712 | changes to "customer_fv__daily_transactions"). |
| 1713 | start_date (Optional[datetime]): Start date for the timestamp range when retrieving features without entity_df. |
| 1714 | Required when entity_df is not provided. |
| 1715 | end_date (Optional[datetime]): End date for the timestamp range when retrieving features without entity_df. |
| 1716 | Required when entity_df is not provided. By default, the current time is used. |
| 1717 | |
| 1718 | Returns: |
| 1719 | RetrievalJob which can be used to materialize the results. |
| 1720 | |
| 1721 | Raises: |
| 1722 | ValueError: Both or neither of features and feature_refs are specified. |
| 1723 | |
| 1724 | Examples: |
| 1725 | Retrieve historical features from a local offline store. |
| 1726 | |
| 1727 | >>> from feast import FeatureStore, RepoConfig |
| 1728 | >>> import pandas as pd |
| 1729 | >>> fs = FeatureStore(repo_path="project/feature_repo") |
| 1730 | >>> entity_df = pd.DataFrame.from_dict( |
| 1731 | ... { |
| 1732 | ... "driver_id": [1001, 1002], |
| 1733 | ... "event_timestamp": [ |
| 1734 | ... datetime(2021, 4, 12, 10, 59, 42), |
| 1735 | ... datetime(2021, 4, 12, 8, 12, 10), |
| 1736 | ... ], |