OpenAI APIs - Vision#

RTP-LLM provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models. A complete reference for the API is available in the OpenAI API Reference.

Launch A Server#

Launch the server in your terminal and wait for it to initialize.

[ ]:
import subprocess
from rtp_llm.utils.util import wait_sever_done, stop_server
port=8090
server_process = subprocess.Popen(
        ["/opt/conda310/bin/python", "-m", "rtp_llm.start_server",
         "--checkpoint_path=/mnt/nas1/hf/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/6114e9c18dac0042fa90925f03b046734369472f/",
         "--model_type=qwen_2",
         f"--start_port={port}"
         ]
    )
wait_sever_done(server_process, port)

Using Python Requests#

[ ]:
import requests

url = f"http://localhost:{port}/v1/chat/completions"

data = {
    "model": "Qwen/Qwen2.5-VL-7B-Instruct",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "describe photo"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "/mnt/nas1/hf/llava-v1.5-7b/1.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 300,
}

response = requests.post(url, json=data)
print(response.text)

Using OpenAI Python Client#

[ ]:
from openai import OpenAI

client = OpenAI(base_url=f"http://localhost:{port}/v1", api_key="None")

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-VL-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "describe photo"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "/mnt/nas1/hf/llava-v1.5-7b/1.jpg"
                    }
                }
            ]
        }
    ],
    max_tokens=300,
)

print(response.choices[0].message.content)