https://showlab.github.io/videollm-online/
git clone https://github.com/showlab/videollm-online请确保已安装 Miniconda 和版本 ≥ 3.10 的 Python,然后运行:
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers accelerate deepspeed peft editdistance Levenshtein tensorboard gradio moviepy submitit
pip install flash-attn --no-build-isolationPyTorch 源码会安装 ffmpeg,但安装的是旧版本,通常会导致预处理质量很低。请按照以下步骤安装最新版 ffmpeg:
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
tar xvf ffmpeg-release-amd64-static.tar.xz
rm ffmpeg-release-amd64-static.tar.xz
mv ffmpeg-7.0.1-amd64-static ffmpeg如果您想使用实时流音频来试用我们的模型,还请克隆 ChatTTS。
pip install omegaconf vocos vector_quantize_pytorch cython
git clone git+https://github.com/2noise/ChatTTS
mv ChatTTS demo/rendering/python -m demo.app --resume_from_checkpoint chenjoya/videollm-online-8b-v1pluspython -m demo.cli --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus@inproceedings{videollm-online,
author = {Joya Chen and Zhaoyang Lv and Shiwei Wu and Kevin Qinghong Lin and Chenan Song and Difei Gao and Jia-Wei Liu and Ziteng Gao and Dongxing Mao and Mike Zheng Shou},
title = {VideoLLM-online: Online Video Large Language Model for Streaming Video},
booktitle = {CVPR},
year = {2024},
}