Monte Carlo can Boost LLM Accuracy

Jun 20, 2024

In some cases, the Monte Carlo Tree Search technique boosts performance from Llama 3 8B level up to GPT-4.

I built out the approach from scratch in this video, and show how it helps to solve maths problems.

➡️ How it works:
- Start with a seed answer (e.g. "I don't know")
- Ask the LLM to suggest improvements to the seed answer
- Use the suggestions to generate 3 improved responses
- Randomly select one response and have the LLM rate it out of 100
- Calculate the Upper Confidence Bound (UCT) for each node based on the rating (more on that in the vid)
- Traverse the tree following the nodes with the highest UCT
- Repeat the process, balancing between exploiting good answers and exploring new ones

➡️ Why it's promising:
- Provides a programmatic way to vary prompts and refine answers
- Balances improving on good answers (exploitation) with trying out new options (exploration)
- Can significantly boost accuracy of smaller models to approach or exceed GPT-4 level

➡️ Potential limitations:
- Slow and expensive due to many LLM queries
- Not guaranteed to be correct if LLM lacks relevant knowledge
- Best suited for high-latency, high-quality tasks (not real-time applications)

Cheers, Ronan

Trelis Research

More Resources at Trelis.com/About

Trelis Research Updates

Discussion about this post