MCPcopy Index your code
hub / github.com/LargeWorldModel/LWM / insert_needle

Method insert_needle

scripts/eval_needle.py:162–197  ·  view source on GitHub ↗
(self, needle, context, depth_percent, context_length)

Source from the content-addressed store, hash-verified

160 return results
161
162 def insert_needle(self, needle, context, depth_percent, context_length):
163 tokens_needle = self.enc_tiktoken.encode(needle)
164 tokens_context = self.enc_tiktoken.encode(context)
165
166 # Reducing the context length by 150 buffer. This is to account for system message, the user question, and response.
167 context_length -= self.final_context_length_buffer
168
169 # If your context + needle are longer than the context length (which it will be), then reduce tokens from the context by the needle length
170 if len(tokens_context) + len(tokens_needle) > context_length:
171 tokens_context = tokens_context[:context_length - len(tokens_needle)]
172
173 if depth_percent == 100:
174 # If your depth percent is 100 (which means your needle is the last thing in the doc), throw it at the end
175 tokens_new_context = tokens_context + tokens_needle
176 else:
177 # Go get the position (in terms of tokens) to insert your needle
178 insertion_point = int(len(tokens_context) * (depth_percent / 100))
179
180 # tokens_new_context represents the tokens before the needle
181 tokens_new_context = tokens_context[:insertion_point]
182
183 # We want to make sure that we place our needle at a sentence break so we first see what token a '.' is
184 period_tokens = self.enc_tiktoken.encode('.')
185
186 # Then we iteration backwards until we find the first period
187 while tokens_new_context and tokens_new_context[-1] not in period_tokens:
188 insertion_point -= 1
189 tokens_new_context = tokens_context[:insertion_point]
190
191 # Once we get there, then add in your needle, and stick the rest of your context in on the other end.
192 # Now we have a needle in a haystack
193 tokens_new_context += tokens_needle + tokens_context[insertion_point:]
194
195 # Convert back to a string and return it
196 new_context = self.enc_tiktoken.decode(tokens_new_context)
197 return new_context
198
199 def generate_context(self, needle, trim_context, context_length, depth_percent):
200 context = self.insert_needle(needle, trim_context, depth_percent, context_length)

Callers 1

generate_contextMethod · 0.95

Calls 2

encodeMethod · 0.45
decodeMethod · 0.45

Tested by

no test coverage detected