hub / github.com/TheAlgorithms/Python / document_frequency

Function document_frequency

machine_learning/word_frequency_functions.py:64–83 · view source on GitHub ↗

Calculate the number of documents in a corpus that contain a given term @params : term, the term to search each document for, and corpus, a collection of documents. Each document should be separated by a newline. @returns : the number of documents in the corpus that con

(term: str, corpus: str)

Source from the content-addressed store, hash-verified

62
63
64	def document_frequency(term: str, corpus: str) -> tuple[int, int]:
65	"""
66	Calculate the number of documents in a corpus that contain a
67	given term
68	@params : term, the term to search each document for, and corpus, a collection of
69	documents. Each document should be separated by a newline.
70	@returns : the number of documents in the corpus that contain the term you are
71	searching for and the number of documents in the corpus
72	@examples :
73	>>> document_frequency("first", "This is the first document in the corpus.\\nThIs\
74	is the second document in the corpus.\\nTHIS is \
75	the third document in the corpus.")
76	(1, 3)
77	"""
78	corpus_without_punctuation = corpus.lower().translate(
79	str.maketrans("", "", string.punctuation)
80	) # strip all punctuation and replace it with ''
81	docs = corpus_without_punctuation.split("\n")
82	term = term.lower()
83	return (len([doc for doc in docs if term in doc]), len(docs))
84
85
86	def inverse_document_frequency(df: int, n: int, smoothing=False) -> float:

Callers

nothing calls this directly

Calls 1

splitMethod · 0.80

Tested by

no test coverage detected