Serving a custom LLM
Plus Mixtral One-click Template, GGUF quantization, reverting HF commits, and Mistral v0.2
Hi folks, the latest video is out - covering how to cost-effectively serve a custom LLM to 100+ customers. Check it out on YouTube:
In other more technical updates:
Mixtral 8x7B One-click Runpod Template
Here’s the link. It works on an A6000 or A100 or H100. [Powered by TGI with —eetq 8-bit quantization].
And this is a runpod affiliate link if you’re new to runpod and you’d like to support Trelis.
Making GGUFs no longer requires tokenizer.model.
To date, quantizing a model to make a GGUF file required having tokenizer.model in the base model repo. Now, certain models don’t ship with tokenizer.model - this caused an issue that has just been resolved. TL;DR you can now quantize most models even if they don’t have a tokenizer.model file. PR here.
Auto reverting a HuggingFace commit.
What happens if you accidentally delete files (like I did!)? Here’s the fix:
Try using GIT_LFS_SKIP_SMUDGE=1
to only copy references to LFS, then you reset to a specific commit hash, and force push. Make sure you duplicate your space/dataset/model before testing this, here is a duplicator.
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/USER/REPO
git reset --force HASH
git push --force
With thanks to radames on the HuggingFace forum here.
Mistral v0.2 bug fixed
There had been issues running Mistral v0.2 (newly released) with transformers. That’s now fixed. The v0.2 model offers 32k context and is stronger than v0.1 .
That’s it for this ad hoc update, cheers, Ronan
~~~
Links: