MCPcopy Index your code
hub / github.com/nodejs/node / parse_email

Function parse_email

tools/gyp/pylib/packaging/metadata.py:278–451  ·  view source on GitHub ↗

Parse a distribution's metadata stored as email headers (e.g. from ``METADATA``). This function returns a two-item tuple of dicts. The first dict is of recognized fields from the core metadata specification. Fields that can be parsed and translated into Python's built-in types are conve

(data: Union[bytes, str])

Source from the content-addressed store, hash-verified

276
277
278def parse_email(data: Union[bytes, str]) -> Tuple[RawMetadata, Dict[str, List[str]]]:
279 """Parse a distribution's metadata stored as email headers (e.g. from ``METADATA``).
280
281 This function returns a two-item tuple of dicts. The first dict is of
282 recognized fields from the core metadata specification. Fields that can be
283 parsed and translated into Python's built-in types are converted
284 appropriately. All other fields are left as-is. Fields that are allowed to
285 appear multiple times are stored as lists.
286
287 The second dict contains all other fields from the metadata. This includes
288 any unrecognized fields. It also includes any fields which are expected to
289 be parsed into a built-in type but were not formatted appropriately. Finally,
290 any fields that are expected to appear only once but are repeated are
291 included in this dict.
292
293 """
294 raw: Dict[str, Union[str, List[str], Dict[str, str]]] = {}
295 unparsed: Dict[str, List[str]] = {}
296
297 if isinstance(data, str):
298 parsed = email.parser.Parser(policy=email.policy.compat32).parsestr(data)
299 else:
300 parsed = email.parser.BytesParser(policy=email.policy.compat32).parsebytes(data)
301
302 # We have to wrap parsed.keys() in a set, because in the case of multiple
303 # values for a key (a list), the key will appear multiple times in the
304 # list of keys, but we're avoiding that by using get_all().
305 for name in frozenset(parsed.keys()):
306 # Header names in RFC are case insensitive, so we'll normalize to all
307 # lower case to make comparisons easier.
308 name = name.lower()
309
310 # We use get_all() here, even for fields that aren't multiple use,
311 # because otherwise someone could have e.g. two Name fields, and we
312 # would just silently ignore it rather than doing something about it.
313 headers = parsed.get_all(name) or []
314
315 # The way the email module works when parsing bytes is that it
316 # unconditionally decodes the bytes as ascii using the surrogateescape
317 # handler. When you pull that data back out (such as with get_all() ),
318 # it looks to see if the str has any surrogate escapes, and if it does
319 # it wraps it in a Header object instead of returning the string.
320 #
321 # As such, we'll look for those Header objects, and fix up the encoding.
322 value = []
323 # Flag if we have run into any issues processing the headers, thus
324 # signalling that the data belongs in 'unparsed'.
325 valid_encoding = True
326 for h in headers:
327 # It's unclear if this can return more types than just a Header or
328 # a str, so we'll just assert here to make sure.
329 assert isinstance(h, (email.header.Header, str))
330
331 # If it's a header object, we need to do our little dance to get
332 # the real data out of it. In cases where there is invalid data
333 # we're going to end up with mojibake, but there's no obvious, good
334 # way around that without reimplementing parts of the Header object
335 # ourselves.

Callers 1

from_emailMethod · 0.85

Calls 13

strFunction · 0.85
_parse_keywordsFunction · 0.85
_parse_project_urlsFunction · 0.85
_get_payloadFunction · 0.85
popMethod · 0.80
keysMethod · 0.65
decodeMethod · 0.65
getMethod · 0.65
castFunction · 0.50
get_allMethod · 0.45
appendMethod · 0.45
setdefaultMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…