Called twice in inference generation loop to get the input_token_count and output_token_count. This step can be skipped by setting the OVConfig as follows: `from llmware.configs import OVConfig OVConfig().set_config("get_token_counts", False)` In our testin
(self, text)
| 5625 | return output_response |
| 5626 | |
| 5627 | def ov_token_counter(self, text): |
| 5628 | |
| 5629 | """ Called twice in inference generation loop to get the input_token_count and |
| 5630 | output_token_count. This step can be skipped by setting the OVConfig as follows: |
| 5631 | |
| 5632 | `from llmware.configs import OVConfig |
| 5633 | OVConfig().set_config("get_token_counts", False)` |
| 5634 | |
| 5635 | In our testing, the performance impact is negligible, but may be different in your |
| 5636 | environment and use case. |
| 5637 | |
| 5638 | If this is set to False, then no token counts will be provided in the usage totals. |
| 5639 | """ |
| 5640 | |
| 5641 | if self.tokenizer: |
| 5642 | toks = len(self.tokenizer.encode(text)) |
| 5643 | else: |
| 5644 | toks = 0 |
| 5645 | |
| 5646 | return toks |
| 5647 | |
| 5648 | def prompt_engineer(self, query, context, inference_dict): |
| 5649 | """ Implemented by openvino_genai module """ |