Gemma-3-270M API 使用说明

服务地址

API 文档: http://localhost:18003/docs
健康检查: http://localhost:18003/health

API 调用示例

1. 健康检查

curl http://localhost:18003/health

2. 聊天接口（非流式）

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "什么是人工智能？"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

3. 聊天接口（流式）

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "介绍一下Python"}
    ],
    "max_tokens": 300,
    "temperature": 0.7,
    "stream": true
  }'

4. 文本补全接口（非流式）

curl -X POST "http://localhost:18003/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "人工智能的未来发展趋势是",
    "max_tokens": 200,
    "temperature": 0.8
  }'

5. 文本补全接口（流式）

curl -X POST "http://localhost:18003/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "写一首关于春天的诗",
    "max_tokens": 150,
    "temperature": 0.9,
    "stream": true
  }'

6. 多轮对话示例

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "你是一个有用的AI助手"},
      {"role": "user", "content": "你好"},
      {"role": "assistant", "content": "你好！有什么可以帮助你的吗？"},
      {"role": "user", "content": "介绍一下机器学习"}
    ],
    "max_tokens": 300
  }'

主要参数

messages: 消息列表（聊天接口）
prompt: 输入文本（补全接口）
max_tokens: 最大生成token数（默认512）
temperature: 采样温度（0.0-2.0，默认0.7）
top_p: Nucleus采样（0.0-1.0，默认0.9）
stream: 是否流式输出（true/false，默认false）
stop: 停止序列（字符串或数组）

Gemma-3-270M API 使用说明

服务地址

API 文档: http://localhost:18003/docs
健康检查: http://localhost:18003/health

API 调用示例

1. 健康检查

curl http://localhost:18003/health

2. 聊天接口（非流式）

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "什么是人工智能？"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

3. 聊天接口（流式）

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "介绍一下Python"}
    ],
    "max_tokens": 300,
    "temperature": 0.7,
    "stream": true
  }'

4. 文本补全接口（非流式）

curl -X POST "http://localhost:18003/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "人工智能的未来发展趋势是",
    "max_tokens": 200,
    "temperature": 0.8
  }'

5. 文本补全接口（流式）

curl -X POST "http://localhost:18003/v1/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "写一首关于春天的诗",
    "max_tokens": 150,
    "temperature": 0.9,
    "stream": true
  }'

6. 多轮对话示例

curl -X POST "http://localhost:18003/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "你是一个有用的AI助手"},
      {"role": "user", "content": "你好"},
      {"role": "assistant", "content": "你好！有什么可以帮助你的吗？"},
      {"role": "user", "content": "介绍一下机器学习"}
    ],
    "max_tokens": 300
  }'

主要参数

messages: 消息列表（聊天接口）
prompt: 输入文本（补全接口）
max_tokens: 最大生成token数（默认512）
temperature: 采样温度（0.0-2.0，默认0.7）
top_p: Nucleus采样（0.0-1.0，默认0.9）
stream: 是否流式输出（true/false，默认false）
stop: 停止序列（字符串或数组）