porn hongkongdoll LLMs之Qwen2：Qwen2的简介、装配和使用轮换、案例期骗之详备攻略

LLMs之Qwen2：Qwen2的简介、装配和使用轮换、案例期骗之详备攻略porn hongkongdoll

Qwen2的简介

2024年6月6日，发布Qwen2，Qwen2是从Qwen1.5进化而来，提供五种范畴的预检修和教导微调模子（0.5B、1.5B、7B、57B-A14B和72B），撑握多达27种话语，具有顶尖的基准测试性能，在编码和数学方面显耀进步，并延长了Qwen2-7B-Instruct和Qwen2-72B-Instruct模子的蜿蜒文长度至128K tokens，同期撑握器具调用、RAG、变装束演和AI Agent等功能。

经过数月的极力，咱们很舒适肠布告从 Qwen1.5 进化到 Qwen2。此次，咱们为您带来了： >> 五种范畴的预检修和教导微调模子，包括 Qwen2-0.5B、Qwen2-1.5B、Qwen2-7B、Qwen2-57B-A14B 和 Qwen2-72B；针对每种尺寸提供基础模子和教导微调模子，并确保教导微调模子按照东谈主类偏好进行校准； >> 基础模子和教导微调模子的多话语撑握；除英语和华文外，还检修了27种罕见话语的数据； >> 在大皆基准评估中进展优异，编码和数学性能显耀进步；撑握器具调用、RAG（检索增强文本生成）、变装束演、AI Agent等； >> 悉数模子均巩固撑握32K长度蜿蜒文；Qwen2-7B-Instruct与Qwen2-72B-Instruct可撑握128K蜿蜒文（需罕见成就）。Qwen2-7B-Instruct 和 Qwen2-72B-Instruct 撑握最长 128K tokens 的蜿蜒文长度。

官网地址：Qwen

Github地址：GitHub - QwenLM/Qwen2: Qwen2 is the large language model series developed by Qwen team， Alibaba Cloud.

1、新闻

2024.06.06：咱们发布了 Qwen2 系列。巡逻咱们的博客！

2024.03.28：咱们发布了 Qwen 的首个 MoE 模子：Qwen1.5-MoE-A2.7B！暂时惟一 HF transformers 和 vLLM 撑握该模子。咱们将很快添加对 llama.cpp、mlx-lm 等的撑握。巡逻咱们的博客以取得更多信息！

2024.02.05：咱们发布了 Qwen1.5 系列。

2、性能

更新中……

图片

Qwen2的装配和使用轮换 1、装配

Qwen2 密集和 MoE 模子需要 transformers>=4.40.0。淡薄使用最新版块。

申饬，这是必需的，因为 transformers 从 4.37.0 运行集成了 Qwen2 代码，从 4.40.0 运行集成了 Qwen2Moe 代码。

对于 GPU 内存要乞降相应的模糊量，见此处的恶果。

2、模子使用轮换 T1、继承Hugging Face Transformers

这里咱们展示了一段代码片断，展示了何如使用 transformers 的聊天模子：porn hongkongdoll

from transformers import AutoModelForCausalLM， AutoTokenizer

model_name = "Qwen/Qwen2-7B-Instruct"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    model_name，
    torch_dtype="auto"，
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system"， "content": "You are a helpful assistant."}，
    {"role": "user"， "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages，
    tokenize=False，
    add_generation_prompt=True
)
model_inputs = tokenizer([text]， return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs，
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids， output_ids in zip(model_inputs.input_ids， generated_ids)
]

response = tokenizer.batch_decode(generated_ids， skip_special_tokens=True)[0]

对于量化模子，咱们淡薄使用 GPTQ 和 AWQ 对应的模子，即 Qwen2-7B-Instruct-GPTQ-Int8 和 Qwen2-7B-Instruct-AWQ。

T2、继承ModelScope

咱们热烈淡薄用户，相等是中国大陆的用户，使用 ModelScope。snapshot_download 不错匡助您惩处下载检讨点的问题。

3、模子推理 3.1、腹地运行

腹地运行

llama.cppMLX-LMOllama T1、Ollama

申饬，你需要 ollama>=0.1.42。

注目，Ollama 提供了一个与 OpenAI 兼容的 API，但不撑握函数调用。对于器具使勤快能，酌量使用 Qwen-Agent，它为 API 上的函数调用提供了一个包装器。

装配 ollama 后，不错用以下大喊启动 ollama 事业：

ollama serve
# 在使用 ollama 时需要保握此事业运行

要拉取模子检讨点并运行模子，请使用 ollama run 大喊。你不错通过在 qwen2 后添加后缀来指定模子大小，如 :0.5b、:1.5b、:7b 或 :72b：

ollama run qwen2:7b
# 要退出，请输入 "/bye" 并按回车

你还不错通过其 OpenAI 兼容 API 拜访 ollama 事业。请注目，您需要（1）在使用 API 时保握 ollama serve 运行，（2）在使用此 API 之前实行 ollama run qwen2:7b 以确保模子检讨点已准备就绪。

from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1/'，
    api_key='ollama'，  # required but ignored
)
chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user'，
            'content': 'Say this is a test'，
        }
    ]，
    model='qwen2:7b'，
)

欲了解更多细则，请拜访 ollama.ai。

T2、llama.cp

下载咱们提供的 GGUF 文献或自行创建，然后不错径直使用最新的 llama.cpp 运行以下大喊：

./main -m <path-to-file> -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

要是遭逢 GPU 上量化模子的问题，请尝试传递 -fa 参数以在最新版块的 llama.cpp 中启用 flash attention 终了。

T3、MLX-LM

要是你在使用 Apple Silicon，咱们也提供了兼容 mlx-lm 的检讨点。在 HuggingFace Hub 上查找以 MLX 终结的模子，如 Qwen2-7B-Instruct-MLX。

T4、LMStudio

Qwen2 已被 lmstudio.ai 撑握。你不错径直使用 LMStudio 和咱们的 GGUF 文献。

T5、OpenVINO

Qwen2 已被 OpenVINO 器具包撑握。你不错装配并运行这个聊天机器东谈主示例，使用 Intel CPU、集成 GPU 或独处 GPU。

肛交推特 3.2、Web UI

WEB UI

Text Generation Web UI T1、文本生成 Web UI

你不错径直使用 text-generation-webui 创建一个 Web UI 演示。要是使用 GGUF，铭记装配撑握 Qwen2 的最新 llama.cpp 轮子。

T2、llamafile

克隆 llamafile，运行 source install，然后按照指南使用 GGUF 文献创建你我方的 llamafile。你不错运行一溜大喊，比如 ./qwen.llamafile，创建一个演示。

4、模子部署

部署

vLLMTGISkyPilot

Qwen2 受多个推理框架撑握。这里咱们演示了 vLLM 和 SGLang 的使用。

申饬，vLLM 和 SGLang 提供的 OpenAI 兼容 API 现在不撑握函数调用。对于器具使勤快能，Qwen-Agent 提供了一个围绕这些 API 的包装器，以撑握函数调用。

T1、vLLM

咱们淡薄使用 vLLM>=0.4.0 构建 OpenAI 兼容 API 事业。使用聊天模子启动事业器，举例 Qwen2-7B-Instruct：

python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-7B-Instruct --model Qwen/Qwen2-7B-Instruct

然后按照底下的示例使用 chat API：

curl http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "Qwen2-7B-Instruct"，
    "messages": [
    {"role": "system"， "content": "You are a helpful assistant."}，
    {"role": "user"， "content": "Tell me something about large language models."}
    ]
    }'

curl http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
    "model": "Qwen2-7B-Instruct"，
    "messages": [
    {"role": "system"， "content": "You are a helpful assistant."}，
    {"role": "user"， "content": "Tell me something about large language models."}
    ]
    }'
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key，
    base_url=openai_api_base，
)

chat_response = client.chat.completions.create(
    model="Qwen2-7B-Instruct"，
    messages=[
        {"role": "system"， "content": "You are a helpful assistant."}，
        {"role": "user"， "content": "Tell me something about large language models."}，
    ]
)
print("Chat response:"， chat_response)

T2、SGLang

注目，SGLang 现在不撑握 Qwen2MoeForCausalLM 架构，因此 Qwen2-57B-A14B 不兼容。

请从源代码装配 SGLang。肖似于 vLLM，你需要启动事业器并使用 OpenAI 兼容的 API 事业。最初启动事业器：

python -m sglang.launch_server --model-path Qwen/Qwen2-7B-Instruct --port 30000

你不错在 Python 中按如下所示使用它：

from sglang import function， system， user， assistant， gen， set_default_backend， RuntimeEndpoint

@function
def multi_turn_question(s， question_1， question_2):
    s += system("You are a helpful assistant.")
    s += user(question_1)
    s += assistant(gen("answer_1"， max_tokens=256))
    s += user(question_2)
    s += assistant(gen("answer_2"， max_tokens=256))

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = multi_turn_question.run(
    question_1="What is the capital of China?"，
    question_2="List two local attractions."，
)

for m in state.messages():
    print(m["role"]， ":"， m["content"])

print(state["answer_1"])

5、模子微调

咱们淡薄使用包括 Axolotl、Llama-Factory、Swift 等检修框架，以使用 SFT、DPO、PPO 等进行模子微调。

检修

有监督微调示例LLaMA-Factory 6、Docker

为简化部署经由，咱们提供了带有预构建环境的 Docker 镜像：qwenllm/qwen。你只需要装配驱动法子并下载模子文献，即可启动演示和微调模子。

docker run --gpus all --ipc=host --network=host --rm --name qwen2 -it qwenllm/qwen:2-cu121 bash

Qwen2的案例期骗

握续更新中……porn hongkongdoll

本站仅提供存储事业，悉数执行均由用户发布，如发现存害或侵权执行，请点击举报。