hub / github.com/feast-dev/feast / create_saved_dataset

Method create_saved_dataset

sdk/python/feast/feature_store.py:1873–1938 · view source on GitHub ↗

Execute provided retrieval job and persist its outcome in given storage. Storage type (eg, BigQuery or Redshift) must be the same as globally configured offline store. After data successfully persisted saved dataset object with dataset metadata is committed to the registry.

(
        self,
        from_: RetrievalJob,
        name: str,
        storage: SavedDatasetStorage,
        tags: Optional[Dict[str, str]] = None,
        feature_service: Optional[FeatureService] = None,
        allow_overwrite: bool = False,
    )

Source from the content-addressed store, hash-verified

1871	return job
1872
1873	def create_saved_dataset(
1874	self,
1875	from_: RetrievalJob,
1876	name: str,
1877	storage: SavedDatasetStorage,
1878	tags: Optional[Dict[str, str]] = None,
1879	feature_service: Optional[FeatureService] = None,
1880	allow_overwrite: bool = False,
1881	) -> SavedDataset:
1882	"""
1883	Execute provided retrieval job and persist its outcome in given storage.
1884	Storage type (eg, BigQuery or Redshift) must be the same as globally configured offline store.
1885	After data successfully persisted saved dataset object with dataset metadata is committed to the registry.
1886	Name for the saved dataset should be unique within project, since it's possible to overwrite previously stored dataset
1887	with the same name.
1888
1889	Args:
1890	from_: The retrieval job whose result should be persisted.
1891	name: The name of the saved dataset.
1892	storage: The saved dataset storage object indicating where the result should be persisted.
1893	tags (optional): A dictionary of key-value pairs to store arbitrary metadata.
1894	feature_service (optional): The feature service that should be associated with this saved dataset.
1895	allow_overwrite (optional): If True, the persisted result can overwrite an existing table or file.
1896
1897	Returns:
1898	SavedDataset object with attached RetrievalJob
1899
1900	Raises:
1901	ValueError if given retrieval job doesn't have metadata
1902	"""
1903	if not flags_helper.is_test():
1904	warnings.warn(
1905	"Saving dataset is an experimental feature. "
1906	"This API is unstable and it could and most probably will be changed in the future. "
1907	"We do not guarantee that future changes will maintain backward compatibility.",
1908	RuntimeWarning,
1909	)
1910
1911	if not from_.metadata:
1912	raise ValueError(
1913	f"The RetrievalJob {type(from_)} must implement the metadata property."
1914	)
1915
1916	dataset = SavedDataset(
1917	name=name,
1918	features=from_.metadata.features,
1919	join_keys=from_.metadata.keys,
1920	full_feature_names=from_.full_feature_names,
1921	storage=storage,
1922	tags=tags,
1923	feature_service_name=feature_service.name if feature_service else None,
1924	)
1925
1926	dataset.min_event_timestamp = from_.metadata.min_event_timestamp
1927	dataset.max_event_timestamp = from_.metadata.max_event_timestamp
1928
1929	from_.persist(storage=storage, allow_overwrite=allow_overwrite)
1930

Callers 6

test_persist_does_not_overwriteFunction · 0.80

test_historical_retrieval_with_validationFunction · 0.80

test_historical_retrieval_fails_on_validationFunction · 0.80

test_logged_features_validationFunction · 0.80

test_e2e_validation_via_cliFunction · 0.80

test_historical_features_persistingFunction · 0.80

Calls 6

with_retrieval_jobMethod · 0.95

_get_providerMethod · 0.95

SavedDatasetClass · 0.90

persistMethod · 0.45

retrieve_saved_datasetMethod · 0.45

apply_saved_datasetMethod · 0.45

Tested by 6

test_persist_does_not_overwriteFunction · 0.64

test_historical_retrieval_with_validationFunction · 0.64

test_historical_retrieval_fails_on_validationFunction · 0.64

test_logged_features_validationFunction · 0.64

test_e2e_validation_via_cliFunction · 0.64

test_historical_features_persistingFunction · 0.64