This week, it’s a video on open source versions of GPT-4o!
🛠️ How the Hugging Face implementation by Andrés Marafioti works - it's a 4-piece model:
- Voice Activity Detection: Identifies when speech starts/stops
- Speech-to-Text: Converts audio to text (e.g., Whisper models)
- Text-to-Text: Processes and generates responses (e.g., LLMs like Llama 3, Phi)
- Text-to-Speech: Converts text back to audio (e.g., Melo, Parler)
💻 Implementation Options:
Local: Run on Mac M1/M2/M3
Cloud: Use services like RunPod or Vast.ai
It's workable on a Mac with 8 GB RAM and decent if you have more than that. On GPU, you can get high quality conversations.
Cheers, Ronan
More Resources at Trelis.com/About