hub / github.com/nodejs/node / parse_email

Function parse_email

tools/gyp/pylib/packaging/metadata.py:278–451 · view source on GitHub ↗

Parse a distribution's metadata stored as email headers (e.g. from ``METADATA``). This function returns a two-item tuple of dicts. The first dict is of recognized fields from the core metadata specification. Fields that can be parsed and translated into Python's built-in types are conve

(data: Union[bytes, str])

Source from the content-addressed store, hash-verified

276
277
278	def parse_email(data: Union[bytes, str]) -> Tuple[RawMetadata, Dict[str, List[str]]]:
279	"""Parse a distribution's metadata stored as email headers (e.g. from ``METADATA``).
280
281	This function returns a two-item tuple of dicts. The first dict is of
282	recognized fields from the core metadata specification. Fields that can be
283	parsed and translated into Python's built-in types are converted
284	appropriately. All other fields are left as-is. Fields that are allowed to
285	appear multiple times are stored as lists.
286
287	The second dict contains all other fields from the metadata. This includes
288	any unrecognized fields. It also includes any fields which are expected to
289	be parsed into a built-in type but were not formatted appropriately. Finally,
290	any fields that are expected to appear only once but are repeated are
291	included in this dict.
292
293	"""
294	raw: Dict[str, Union[str, List[str], Dict[str, str]]] = {}
295	unparsed: Dict[str, List[str]] = {}
296
297	if isinstance(data, str):
298	parsed = email.parser.Parser(policy=email.policy.compat32).parsestr(data)
299	else:
300	parsed = email.parser.BytesParser(policy=email.policy.compat32).parsebytes(data)
301
302	# We have to wrap parsed.keys() in a set, because in the case of multiple
303	# values for a key (a list), the key will appear multiple times in the
304	# list of keys, but we're avoiding that by using get_all().
305	for name in frozenset(parsed.keys()):
306	# Header names in RFC are case insensitive, so we'll normalize to all
307	# lower case to make comparisons easier.
308	name = name.lower()
309
310	# We use get_all() here, even for fields that aren't multiple use,
311	# because otherwise someone could have e.g. two Name fields, and we
312	# would just silently ignore it rather than doing something about it.
313	headers = parsed.get_all(name) or []
314
315	# The way the email module works when parsing bytes is that it
316	# unconditionally decodes the bytes as ascii using the surrogateescape
317	# handler. When you pull that data back out (such as with get_all() ),
318	# it looks to see if the str has any surrogate escapes, and if it does
319	# it wraps it in a Header object instead of returning the string.
320	#
321	# As such, we'll look for those Header objects, and fix up the encoding.
322	value = []
323	# Flag if we have run into any issues processing the headers, thus
324	# signalling that the data belongs in 'unparsed'.
325	valid_encoding = True
326	for h in headers:
327	# It's unclear if this can return more types than just a Header or
328	# a str, so we'll just assert here to make sure.
329	assert isinstance(h, (email.header.Header, str))
330
331	# If it's a header object, we need to do our little dance to get
332	# the real data out of it. In cases where there is invalid data
333	# we're going to end up with mojibake, but there's no obvious, good
334	# way around that without reimplementing parts of the Header object
335	# ourselves.

Callers 1

from_emailMethod · 0.85

Calls 13

strFunction · 0.85

_parse_keywordsFunction · 0.85

_parse_project_urlsFunction · 0.85

_get_payloadFunction · 0.85

popMethod · 0.80

keysMethod · 0.65

decodeMethod · 0.65

getMethod · 0.65

castFunction · 0.50

get_allMethod · 0.45

appendMethod · 0.45

setdefaultMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…