hub / github.com/ScrapeGraphAI/Scrapegraph-ai / execute

Method execute

scrapegraphai/nodes/parse_node.py:62–129 · view source on GitHub ↗

Executes the node's logic to parse the HTML document content and split it into chunks. Args: state (dict): The current state of the graph. The input keys will be used to fetch the correct data from the state. Returns: dic

(self, state: dict)

Source from the content-addressed store, hash-verified

60	self.chunk_size = node_config.get("chunk_size")
61
62	def execute(self, state: dict) -> dict:
63	"""
64	Executes the node's logic to parse the HTML document content and split it into chunks.
65
66	Args:
67	state (dict): The current state of the graph. The input keys will be used to fetch the
68	correct data from the state.
69
70	Returns:
71	dict: The updated state with the output key containing the parsed content chunks.
72
73	Raises:
74	KeyError: If the input keys are not found in the state, indicating that the
75	necessary information for parsing the content is missing.
76	"""
77
78	self.logger.info(f"--- Executing {self.node_name} Node ---")
79
80	input_keys = self.get_input_keys(state)
81	input_data = [state[key] for key in input_keys]
82	docs_transformed = input_data[0]
83	source = input_data[1] if self.parse_urls else None
84
85	if self.parse_html:
86	docs_transformed = Html2TextTransformer(
87	ignore_links=False
88	).transform_documents(input_data[0])
89	docs_transformed = docs_transformed[0]
90
91	link_urls, img_urls = self._extract_urls(
92	docs_transformed.page_content, source
93	)
94
95	chunks = split_text_into_chunks(
96	text=docs_transformed.page_content,
97	chunk_size=self.chunk_size - 250,
98	)
99	else:
100	docs_transformed = docs_transformed[0]
101
102	try:
103	link_urls, img_urls = self._extract_urls(
104	docs_transformed.page_content, source
105	)
106	except Exception:
107	link_urls, img_urls = "", ""
108
109	chunk_size = self.chunk_size
110	chunk_size = min(chunk_size - 500, int(chunk_size * 0.8))
111
112	if isinstance(docs_transformed, Document):
113	chunks = split_text_into_chunks(
114	text=docs_transformed.page_content,
115	chunk_size=chunk_size,
116	)
117	else:
118	chunks = split_text_into_chunks(
119	text=docs_transformed, chunk_size=chunk_size

Callers

nothing calls this directly

Calls 5

_extract_urlsMethod · 0.95

split_text_into_chunksFunction · 0.90

get_input_keysMethod · 0.80

updateMethod · 0.80

infoMethod · 0.45

Tested by

no test coverage detected