MCPcopy
hub / github.com/TheAlgorithms/Python / document_frequency

Function document_frequency

machine_learning/word_frequency_functions.py:64–83  ·  view source on GitHub ↗

Calculate the number of documents in a corpus that contain a given term @params : term, the term to search each document for, and corpus, a collection of documents. Each document should be separated by a newline. @returns : the number of documents in the corpus that con

(term: str, corpus: str)

Source from the content-addressed store, hash-verified

62
63
64def document_frequency(term: str, corpus: str) -> tuple[int, int]:
65 """
66 Calculate the number of documents in a corpus that contain a
67 given term
68 @params : term, the term to search each document for, and corpus, a collection of
69 documents. Each document should be separated by a newline.
70 @returns : the number of documents in the corpus that contain the term you are
71 searching for and the number of documents in the corpus
72 @examples :
73 >>> document_frequency("first", "This is the first document in the corpus.\\nThIs\
74is the second document in the corpus.\\nTHIS is \
75the third document in the corpus.")
76 (1, 3)
77 """
78 corpus_without_punctuation = corpus.lower().translate(
79 str.maketrans("", "", string.punctuation)
80 ) # strip all punctuation and replace it with ''
81 docs = corpus_without_punctuation.split("\n")
82 term = term.lower()
83 return (len([doc for doc in docs if term in doc]), len(docs))
84
85
86def inverse_document_frequency(df: int, n: int, smoothing=False) -> float:

Callers

nothing calls this directly

Calls 1

splitMethod · 0.80

Tested by

no test coverage detected