MCPcopy
hub / github.com/maziyarpanahi/openmed / normalize_label

Function normalize_label

openmed/core/labels.py:670–708  ·  view source on GitHub ↗

Normalize an entity label to the canonical taxonomy. Accepts any of: - English lowercase ``snake_case`` (``first_name``) - Portuguese ``UPPERCASE`` no-separator (``FIRSTNAME``) - BIOES-tagged forms (``B-NAME``, ``I-EMAIL``) - Mixed case with arbitrary separators (``First

(label: str, lang: str = "en")

Source from the content-addressed store, hash-verified

668
669
670def normalize_label(label: str, lang: str = "en") -> str:
671 """Normalize an entity label to the canonical taxonomy.
672
673 Accepts any of:
674 - English lowercase ``snake_case`` (``first_name``)
675 - Portuguese ``UPPERCASE`` no-separator (``FIRSTNAME``)
676 - BIOES-tagged forms (``B-NAME``, ``I-EMAIL``)
677 - Mixed case with arbitrary separators (``First-Name``, ``First Name``)
678
679 Unknown labels fall through to ``OTHER`` rather than raising — callers
680 that need strict checking should compare against ``CANONICAL_LABELS``
681 explicitly.
682
683 Args:
684 label: Source label as emitted by a model or registered in a config.
685 lang: ISO 639-1 language hint (currently unused but reserved for
686 language-conditional disambiguation, e.g. mapping ambiguous
687 tokens differently per locale).
688
689 Returns:
690 A canonical label in ``UPPER_SNAKE_CASE``.
691 """
692 if not label:
693 return OTHER
694 key = _key(label)
695 if not key:
696 return OTHER
697 canonical = _ALIAS_MAP.get(key)
698 if canonical is not None:
699 return canonical
700 # If the input already matches a canonical label after stripping
701 # separators (e.g. ``ID_NUM`` -> key ``idnum`` -> aliased; but
702 # ``CREDIT_CARD`` -> ``creditcard`` -> aliased), the alias map covers
703 # it. The ``upper`` fallback handles any future canonical label not
704 # yet in the alias map.
705 upper = re.sub(r"[^A-Z0-9_]", "", label.upper().replace("-", "_").replace(" ", "_"))
706 if upper in CANONICAL_LABELS:
707 return upper
708 return OTHER
709
710
711def id_subtype_for(label: str, lang: str = "en") -> str | None:

Callers 15

_entity_to_recordFunction · 0.90
_canonical_typeFunction · 0.90
_entity_to_merger_dictFunction · 0.90
_merger_dict_to_entityFunction · 0.90
_canonical_labelFunction · 0.90
_copy_entitiesFunction · 0.90
make_entityFunction · 0.90
_copy_entitiesFunction · 0.90
_entity_to_merger_dictFunction · 0.90
_merger_dict_to_entityFunction · 0.90
_entity_to_recordFunction · 0.90

Calls 2

_keyFunction · 0.85
getMethod · 0.45