hub / github.com/microsoft/qlib / setup_data

Method setup_data

qlib/data/dataset/handler.py:634–664 · view source on GitHub ↗

Set up the data in case of running initialization for multiple time Parameters ---------- init_type : str The type `IT_*` listed above. enable_cache : bool default value is false: - if `enable_cache` == True:

(self, init_type: str = IT_FIT_SEQ, **kwargs)

Source from the content-addressed store, hash-verified

632	IT_LS = "load_state" # The state of the object has been load by pickle
633
634	def setup_data(self, init_type: str = IT_FIT_SEQ, **kwargs):
635	"""
636	Set up the data in case of running initialization for multiple time
637
638	Parameters
639	----------
640	init_type : str
641	The type `IT_*` listed above.
642	enable_cache : bool
643	default value is false:
644
645	- if `enable_cache` == True:
646
647	the processed data will be saved on disk, and handler will load the cached data from the disk directly
648	when we call `init` next time
649	"""
650	# init raw data
651	super().setup_data(**kwargs)
652
653	with TimeInspector.logt("fit & process data"):
654	if init_type == DataHandlerLP.IT_FIT_IND:
655	self.fit()
656	self.process_data()
657	elif init_type == DataHandlerLP.IT_LS:
658	self.process_data()
659	elif init_type == DataHandlerLP.IT_FIT_SEQ:
660	self.fit_process_data()
661	else:
662	raise NotImplementedError(f"This type of input is not supported")
663
664	# TODO: Be able to cache handler data. Save the memory for data processing
665
666	def _get_df_by_key(self, data_key: DATA_KEY_TYPE = DataHandlerABC.DK_I) -> pd.DataFrame:
667	if data_key == self.DK_R and self.drop_raw:

Callers

nothing calls this directly

Calls 5

fitMethod · 0.95

process_dataMethod · 0.95

fit_process_dataMethod · 0.95

logtMethod · 0.80

setup_dataMethod · 0.45

Tested by

no test coverage detected