Chat Templates
When using OpenAI, you send your user and assistant messages directly to the API. In the background, OpenAI looks after formatting the messages into a single prompt that is fed into the language model.
With open source, it’s typically up to the developer to convert messages into the proper chat format. However, there’s a HuggingFace tool that makes this easier - called apply_chat_template.
First, check there is a chat_template defined in the model’s tokenizer_config.json file.
Load the tokenizer
Use tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
add_generation_prompt will add in the assistant’s prefix (e.g. ### Assistant: for some models or [/INST] for Llama).
LLM performance is very sensitive to the prompt format, and using chat templates is one way to be sure that your messages are correctly formatted.
Try out the public Llama 2 function calling model for yourself.
Function-Calling DeepSeek 33B (v3).
A version 3 (OpenAI format compatible) model is now available for purchase here.
This is likely the strongest open source model for function calling that is available right now. Models like Yi and Mixtral, even though stronger in general, are not as strong as when it comes to function calling because coding abilities help with function calling performance (as coding models see a lot of structured data). You can view a quick video of performance here.
64k Context Length Mistral 7B Model.
I was asked some questions about the 64k context length fine-tuned Mistral model (original video here). I’ve made a quick follow-on video showing one-click pod setup and summarization of 200,000 characters.
Cheers, Ronan
~~~
Other Links: