hub / github.com/scikit-learn/scikit-learn / transform

Method transform

sklearn/feature_extraction/text.py:865–893 · view source on GitHub ↗

Transform a sequence of documents to a document-term matrix. Parameters ---------- X : iterable over raw text documents, length = n_samples Samples. Each sample must be a text document (either bytes or unicode strings, file name or file object dependi

(self, X)

Source from the content-addressed store, hash-verified

863	return self
864
865	def transform(self, X):
866	"""Transform a sequence of documents to a document-term matrix.
867
868	Parameters
869	----------
870	X : iterable over raw text documents, length = n_samples
871	Samples. Each sample must be a text document (either bytes or
872	unicode strings, file name or file object depending on the
873	constructor argument) which will be tokenized and hashed.
874
875	Returns
876	-------
877	X : sparse matrix of shape (n_samples, n_features)
878	Document-term matrix.
879	"""
880	if isinstance(X, str):
881	raise ValueError(
882	"Iterable over raw text documents expected, string object received."
883	)
884
885	self._validate_ngram_range()
886
887	analyzer = self.build_analyzer()
888	X = self._get_hasher().transform(analyzer(doc) for doc in X)
889	if self.binary:
890	X.data.fill(1)
891	if self.norm is not None:
892	X = normalize(X, norm=self.norm, copy=False)
893	return _align_api_if_sparse(X)
894
895	def fit_transform(self, X, y=None):
896	"""Transform a sequence of documents to a document-term matrix.

Callers 5

test_hashing_vectorizerFunction · 0.95

test_hashed_binary_occurrencesFunction · 0.95

test_vectorizer_unicodeFunction · 0.95

test_nonnegative_hashing_vectorizer_result_indicesFunction · 0.95

test_hashing_vectorizer_transform_without_fitFunction · 0.95

Calls 7

_get_hasherMethod · 0.95

normalizeFunction · 0.90

_align_api_if_sparseFunction · 0.90

analyzerFunction · 0.85

_validate_ngram_rangeMethod · 0.80

build_analyzerMethod · 0.80

transformMethod · 0.45

Tested by 5

test_hashing_vectorizerFunction · 0.76

test_hashed_binary_occurrencesFunction · 0.76

test_vectorizer_unicodeFunction · 0.76

test_nonnegative_hashing_vectorizer_result_indicesFunction · 0.76

test_hashing_vectorizer_transform_without_fitFunction · 0.76