Mistral v0.2, SUS Chat and Flash Attn

New function calling models, and model loading updates

Dec 13, 2023

I’ve started this newsletter to share ad-hoc updates (not weekly/monthly etc.) on:

Trelis products (see Trelis.com).
General LLM updates (e.g. updates to commonly used packages, like transformers).

If you have purchased access to one of the Trelis repositories, you’ll still see detailed updates on Github that I make via commits and PRs. This newsletter will provide a more summarized/highlighted version of that content.

New models

Yes, Mixtral (Mistral’s MoE) has been released. I plan to make a video on that shortly. In the meantime, you may enjoy this MoE explainer from some months ago.

Mistral 7B v0.2 (instead of 0.1) is out, and it is quite strong, as well as supporting context of up to 32k in length. You can find a function calling version on mart.trelis.com .

One of the strongest chat models out right now is SUSChat. You can also find a function calling version of that model on Trelis Mart.

Transformers flash attention update

There’s a slight change to how flash attention is specified when loading models, it’s now like this:

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    # quantization_config=bnb_config, #uncomment to use quantization, comment out to use full-precision
    torch_dtype=torch.bfloat16,
    device_map="auto",
    # trust_remote_code=True,
    attn_implementation="flash_attention_2",
    cache_dir=cache_dir
)

That’s it for this ad hoc newsletter, Ronan

Trelis Research Updates

Mistral v0.2, SUS Chat and Flash Attn

New function calling models, and model loading updates

New models

Transformers flash attention update