2023-12-29 低配大模型gpt-2本地部署


点击 快速C语言入门


低配大模型gpt-2本地部署

  • 前言
  • 一、`ggml.cpp`编译
  • 二、阿里福音书
  • 总结


前言

要解决问题: 想本地部署个大模型, 发现, 能比较正常下载的只有gpt-2了, 出于某种未可说原因, metallama不能下载, 其实主要是我不想登梯子, 太贵.

想到的思路: 通过ggml.cpp, 但这回比较无奈, msys2没得收录, 能不能上https://github.com/ggerganov/ggml, 能上就下载源码进行部署, 又要用讨厌的CMAKE了, 不会用的赶紧学习, 如果碰到错误, 个人基本就没辙了, 换平台吧, 一般Linux都行, Windows看命.

其它的补充: 如果你有梯子, 可以直接在msys2上安装llama.cpp, 使用llama2大模型, 这个是生产级别的.


一、ggml.cpp编译

https://github.com/ggerganov/ggml下载源码, cmake, 在你的编译目录下, 比如E:\clangC++\ggml-master\bin\gpt-2-alloc.exe, 这个文件就是模型加载文件了.

E:\clangC++>gpt-2-alloc.exe -m e:\clangC++\voiceToText\ggml-model-gpt-2-345M.bin -p "why you are unhappy"
main: seed = 1703836306
gpt2_model_load: loading model from 'e:\clangC++\voiceToText\ggml-model-gpt-2-345M.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 1024
gpt2_model_load: n_head  = 16
gpt2_model_load: n_layer = 24
gpt2_model_load: ftype   = 1
gpt2_model_load: qntvr   = 0
gpt2_model_load: ggml tensor size = 368 bytes
gpt2_model_load: ggml ctx size = 969.97 MB
gpt2_model_load: memory size =   192.00 MB, n_mem = 24576
gpt2_model_load: model size  =   679.38 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: compute buffer size: 5.00 MB
main: prompt: 'why you are unhappy'
main: number of tokens in prompt = 4, first 8 tokens: 22850 345 389 19283

why you are unhappy,' " says the man in charge of the camp. "I don't care about your feelings. I have to have my place."

If it sounds like this camp has always been a place for children, it's not. In the 1990s, many families with young children had to move to rural areas or abandon their homes. Many of these families found a 
way to survive, by selling their children to local children's charities and by selling their children to child-care facilities. But the problem got worse in the 2000s. The children of the camps no longer had places to return to.

Today, children are staying in the camps because the situation is worse than ever. "I want to go to school and have a decent life," says a boy from a small town called Sivak. "But they keep us here, because we don't want to leave. I think it's an obligation for the government to keep them here."<|endoftext|>

main:     load time =   783.07 ms
main:   sample time =   569.64 ms
main:  predict time = 28990.74 ms / 147.16 ms per token
main:    total time = 30494.97 ms
E:\clangC++>gpt-2-alloc.exe -m e:\clangC++\voiceToText\ggml-model-gpt-2-345M.bin -p "show me the Btree code" 
main: seed = 1703836410
gpt2_model_load: loading model from 'e:\clangC++\voiceToText\ggml-model-gpt-2-345M.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 1024
gpt2_model_load: n_head  = 16
gpt2_model_load: n_layer = 24
gpt2_model_load: ftype   = 1
gpt2_model_load: qntvr   = 0
gpt2_model_load: ggml tensor size = 368 bytes
gpt2_model_load: ggml ctx size = 969.97 MB
gpt2_model_load: memory size =   192.00 MB, n_mem = 24576
gpt2_model_load: model size  =   679.38 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: compute buffer size: 5.00 MB
main: prompt: 'show me the Btree code'
main: number of tokens in prompt = 6, first 8 tokens: 12860 502 262 347 21048 2438

show me the Btree code" function is pretty cool, but it doesn't have any useful functionality:

def btree_to_btree(a: btree_type): '''Btree to btree''' a.btree_type = btree_type return a.btree_type def btree_to_set(a: btree_type): '''Btree to set''' a.btree_type = btree_type set(a) return a.btree_type
As you can see, there are only 2 methods available to use, btree_to_btree and btree_to_set. They both require a btree object to be created with them, and both only support Btree types.

However, both of these methods can be used to create a Btree of arbitrary types, and the latter uses a Btree object to represent the Btree type. Let's start by looking at what these

main:     load time =   771.24 ms
main:   sample time =   606.02 ms
main:  predict time = 30442.44 ms / 148.50 ms per token
main:    total time = 31970.19 ms

就当个玩具吧.

二、阿里福音书

https://www.modelscope.cn/home搜索llama, 我留下了鼻涕, GJ给你关上一扇门, AL帮你打开一扇窗, 炼丹的码畜有福了.

msys2输入:

$ pacman -S mingw-w64-clang-x86_64-llama.cpp

然后

E:\clangC++>llama.cpp
main: build = 980 (f31b5397)
main: seed  = 1703836946
error loading model: failed to open models/7B/ggml-model.bin: No such file or directory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/7B/ggml-model.bin'
main: error: unable to load model

安装成功, 但没有模型, 我正在下载, 所以不知道效果.


总结

感谢开源社区, 使得部署自己的私有大模型傻瓜化, 所以, 赶紧趁着门还有道缝儿赶紧部署,

未来是我们的, 也是你们的, 但终归到底是属于AI的,

这是继互联网, 移动互联网, 微信, 抖音后的最近一波红利,

相信又会造一波儿亿万富翁, 同时, 让更多人失业,

无论你是以玩儿的心态, 还是快饿死的心态, 上车看看, 别再井底下待太久, 祝福你们.


点击 快速C语言入门


你可能感兴趣的:(笔记,gpt,llama)