MCPcopy
hub / github.com/MaartenGr/BERTopic / SpacyBackend

Class SpacyBackend

bertopic/backend/_spacy.py:7–94  ·  view source on GitHub ↗

Spacy embedding model. The Spacy embedding model used for generating document and word embeddings. Arguments: embedding_model: A spacy embedding model Examples: To create a Spacy backend, you need to create an nlp object and pass it through this backend: ```py

Source from the content-addressed store, hash-verified

5
6
7class SpacyBackend(BaseEmbedder):
8 """Spacy embedding model.
9
10 The Spacy embedding model used for generating document and
11 word embeddings.
12
13 Arguments:
14 embedding_model: A spacy embedding model
15
16 Examples:
17 To create a Spacy backend, you need to create an nlp object and
18 pass it through this backend:
19
20 ```python
21 import spacy
22 from bertopic.backend import SpacyBackend
23
24 nlp = spacy.load("en_core_web_md", exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])
25 spacy_model = SpacyBackend(nlp)
26 ```
27
28 To load in a transformer model use the following:
29
30 ```python
31 import spacy
32 from thinc.api import set_gpu_allocator, require_gpu
33 from bertopic.backend import SpacyBackend
34
35 nlp = spacy.load("en_core_web_trf", exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])
36 set_gpu_allocator("pytorch")
37 require_gpu(0)
38 spacy_model = SpacyBackend(nlp)
39 ```
40
41 If you run into gpu/memory-issues, please use:
42
43 ```python
44 import spacy
45 from bertopic.backend import SpacyBackend
46
47 spacy.prefer_gpu()
48 nlp = spacy.load("en_core_web_trf", exclude=['tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer'])
49 spacy_model = SpacyBackend(nlp)
50 ```
51 """
52
53 def __init__(self, embedding_model):
54 super().__init__()
55
56 if "spacy" in str(type(embedding_model)):
57 self.embedding_model = embedding_model
58 else:
59 raise ValueError(
60 "Please select a correct Spacy model by either using a string such as 'en_core_web_md' "
61 "or create a nlp model using: `nlp = spacy.load('en_core_web_md')"
62 )
63
64 def embed(self, documents: List[str], verbose: bool = False) -> np.ndarray:

Callers 1

select_backendFunction · 0.90

Calls

no outgoing calls

Tested by

no test coverage detected