Create a pipeline for inference. Args: model_path: the path of a model. It could be one of the following options: - i) A local directory path of a turbomind model which is converted by ``lmdeploy convert`` command or download from ii) and iii).
(model_path: str,
backend_config: TurbomindEngineConfig | PytorchEngineConfig | None = None,
chat_template_config: ChatTemplateConfig | None = None,
log_level: str = 'WARNING',
max_log_len: int | None = None,
trust_remote_code: bool = False,
speculative_config: SpeculativeConfig | None = None,
**kwargs)
| 13 | |
| 14 | |
| 15 | def pipeline(model_path: str, |
| 16 | backend_config: TurbomindEngineConfig | PytorchEngineConfig | None = None, |
| 17 | chat_template_config: ChatTemplateConfig | None = None, |
| 18 | log_level: str = 'WARNING', |
| 19 | max_log_len: int | None = None, |
| 20 | trust_remote_code: bool = False, |
| 21 | speculative_config: SpeculativeConfig | None = None, |
| 22 | **kwargs): |
| 23 | """Create a pipeline for inference. |
| 24 | |
| 25 | Args: |
| 26 | model_path: the path of a model. It could be one of the following options: |
| 27 | |
| 28 | - i) A local directory path of a turbomind model which is |
| 29 | converted by ``lmdeploy convert`` command or download from |
| 30 | ii) and iii). |
| 31 | - ii) The model_id of a lmdeploy-quantized model hosted |
| 32 | inside a model repo on huggingface.co, such as |
| 33 | ``InternLM/internlm-chat-20b-4bit``, |
| 34 | ``lmdeploy/llama2-chat-70b-4bit``, etc. |
| 35 | - iii) The model_id of a model hosted inside a model repo |
| 36 | on huggingface.co, such as ``internlm/internlm-chat-7b``, |
| 37 | ``Qwen/Qwen-7B-Chat``, ``baichuan-inc/Baichuan2-7B-Chat`` |
| 38 | and so on. |
| 39 | backend_config: backend config instance. Default to None. |
| 40 | chat_template_config: chat template configuration. Default to None. |
| 41 | log_level: set log level whose value among [``CRITICAL``, ``ERROR``, |
| 42 | ``WARNING``, ``INFO``, ``DEBUG``] |
| 43 | max_log_len: Max number of prompt characters or prompt tokens |
| 44 | being printed in log. |
| 45 | trust_remote_code: whether to trust remote code from model repositories. |
| 46 | speculative_config: speculative decoding configuration. |
| 47 | **kwargs: additional keyword arguments passed to the pipeline. |
| 48 | |
| 49 | Returns: |
| 50 | Pipeline: a pipeline instance for inference. |
| 51 | |
| 52 | Examples: |
| 53 | |
| 54 | .. code-block:: python |
| 55 | |
| 56 | # LLM |
| 57 | import lmdeploy |
| 58 | pipe = lmdeploy.pipeline('internlm/internlm-chat-7b') |
| 59 | response = pipe(['hi','say this is a test']) |
| 60 | print(response) |
| 61 | |
| 62 | # VLM |
| 63 | from lmdeploy.vl import load_image |
| 64 | from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig |
| 65 | pipe = pipeline('liuhaotian/llava-v1.5-7b', |
| 66 | backend_config=TurbomindEngineConfig(session_len=8192), |
| 67 | chat_template_config=ChatTemplateConfig(model_name='vicuna')) |
| 68 | im = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg') |
| 69 | response = pipe([('describe this image', [im])]) |
| 70 | print(response) |
| 71 | """ # noqa E501 |
| 72 |