Jump to content

Local Pocket TTS 260124

From Game in the Brain Wiki

πŸŽ™οΈ Local TTS Wiki: Pocket TTS on Ubuntu

Template:Notice

πŸ› οΈ System Requirements

This guide is specifically optimized for the following stack:

GPU: AMD Radeon RX 7600 (RDNA3 / Navi 33)

OS: Ubuntu 24.04 LTS (Noble Numbat)

Platform: ROCm 6.x

πŸ” Phase 0: Pre-Flight Compatibility Check

Before you begin, run these commands to ensure your hardware is ready.

1. Check CPU Support

Pocket TTS requires AVX2 instructions for real-time CPU performance.

lscpu | grep -i "avx2"

2. Verify GPU Detection

Ensure your RX 7600 is visible to the system.

lspci -nn | grep -i vga

3. Check Kernel Version

AMD RDNA3 cards (RX 7600) require Kernel 6.2 or higher for stable support.

uname -r

πŸ› οΈ Phase 1: Driver & ROCm Setup

The RX 7600 uses the gfx1102 architecture.

1. Install AMD ROCm 6.2

sudo apt update

Download the installer for Ubuntu 24.04

wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb

Install ROCm and Graphics components

sudo amdgpu-install --usecase=rocm,graphics --no-dkms

2. Hardware Access Permissions

Check if your user already has the necessary hardware permissions:

groups | grep -E 'render|video'

If output shows render and video: Skip to Phase 2.

If empty: Run the commands below and then log out/in.

sudo usermod -a -G render,video $LOGNAME

πŸ“¦ Phase 2: Pocket TTS Installation

1. Install uv (Fast Package Manager)

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
source ~/.bashrc

2. Environment Setup

Create and enter virtual environment

uv venv tts_env
source tts_env/bin/activate

Install ROCm-specific PyTorch for AMD GPUs

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
uv pip install pocket-tts

πŸš€ Phase 3: Launching the Service

Standard Launch

By default, the server runs on port 8000. Use a unique port (like 5821) to avoid conflicts.

uvx pocket-tts serve --port 5821

RX 7600 Hardware Override

If the server starts but uses the CPU instead of the GPU, use this environment variable to force RDNA3 compatibility:

export HSA_OVERRIDE_GFX_VERSION=11.0.0
uvx pocket-tts serve --port 5821

πŸ§ͺ Phase 4: Verification

Create a script named check_gpu.py to confirm hardware acceleration is active:

import torch
print(f"Is ROCm/GPU available? {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"Device Name: {torch.cuda.get_device_name(0)}")
else:
print("Warning: Running on CPU. Check Phase 3 for the RX 7600 Override.")

Run it with: python check_gpu.py

⚠️ Common Failure Steps & Fixes

Symptom Fix
amdgpu-lib32 unmet dependencies Run the 32-bit architecture fix (see below).
Permission Denied on /dev/kfd Ensure you added groups and restarted your session.
Address already in use Use the --port 5821 flag.
HSA_STATUS_ERROR_OUT_OF_RESOURCES Close other GPU-heavy apps (Browsers/Games).

Detailed Fix: amdgpu-lib32 Unmet Dependencies

If you encounter errors regarding :i386 packages, enable the 32-bit architecture:

sudo dpkg --add-architecture i386
sudo apt update
sudo amdgpu-install --usecase=rocm,graphics --no-dkms

πŸŽ›οΈ Usage & Automation Guide

πŸ”˜ Create an Ubuntu App Button

To launch the server from your App Grid or Dock without typing commands every time.

1. Create the Desktop Entry

Open your terminal and run this block of code (copy and paste the whole thing):

cat <<EOF > ~/.local/share/applications/pocket-tts.desktop
[Desktop Entry]
Name=Pocket TTS Server
Comment=Start local TTS with RX 7600 override
Exec=bash -c "export HSA_OVERRIDE_GFX_VERSION=11.0.0; $HOME/.local/bin/uvx pocket-tts serve --port 5821; read -p 'Server stopped. Press Enter to close...'"
Icon=audio-headphones
Terminal=true
Type=Application
Categories=Utility;Audio;
EOF

2. Activate It

Run chmod +x ~/.local/share/applications/pocket-tts.desktop

Press the Super (Windows) Key and search for Pocket TTS.

Right-click the icon and select Pin to Dash.

Note: When you click the icon, a terminal window will open. Minimize it to keep the server running. Close it to stop the server.

πŸ›‘ How to Stop Speaking

Since this is a streaming server, it will keep generating audio until finished. To interrupt it:

If using the Web Interface: Refresh the page (F5 or Ctrl+R). This immediately cuts the connection.

If using the Terminal: Click inside the terminal window running the server and press Ctrl+C. (You will need to restart the server to speak again).

🎨 How to Tweak the Tone

The AI reads purely based on text input. You can manipulate the "acting" using these methods:

Change the Voice

Use the --voice flag in the command line or select it in the UI.

kyle: Authoritative, standard American male.

fme: Softer, faster female.

af_heart: Often used for audiobooks, generally expressive.

"Prompt Engineering" with Punctuation

The AI breathes and pauses based on grammar.

Excitement/Anger: Use exclamation marks and short sentences! Like this!

Whispering/Sadness: Use lowercase text... with ellipses... and no hard stops...

Speed: Remove commas to make the sentence run on and feel rushed or panicked.

Pauses: Use [pause] (if supported) or simply add ... or -- to force a breath.

🧡 Script: Stitching Files (Audiobook Maker)

To turn a long text file into a single MP3, use this Python script. It breaks text into paragraphs, generates audio for each, and stitches them together with ffmpeg.

Prerequisites: sudo apt install ffmpeg

The Script (make_audiobook.py):

import os
import subprocess

--- CONFIGURATION ---

INPUT_FILE = "my_book.txt"      # Your text file
OUTPUT_FILENAME = "audiobook.mp3" # Final result
VOICE = "af_heart"              # Change to your preferred voice
GPU_OVERRIDE = "11.0.0"         # For RX 7600

---------------------

def install_check():
if subprocess.call(['which', 'ffmpeg']) != 0:
print("Error: ffmpeg is not installed. Run 'sudo apt install ffmpeg'")
exit()

def read_text(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
return f.read().split('\n\n')

def generate_chunk(text, index):
filename = f"chunk_{index:03d}.wav"
print(f"Generating Part {index}...")
cmd = ["uvx", "pocket-tts", "generate", "--text", text, "--voice", VOICE, "--out", filename]
env = os.environ.copy()
env["HSA_OVERRIDE_GFX_VERSION"] = GPU_OVERRIDE
subprocess.run(cmd, env=env, check=True)
return filename

def stitch_files(file_list):
with open("files.txt", "w") as f:
for audio in file_list:
f.write(f"file '{audio}'\n")

print("Stitching files together...")
subprocess.run(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "files.txt", "-c:a", "libmp3lame", "-q:a", "2", OUTPUT_FILENAME])

# Cleanup
for audio in file_list: os.remove(audio)
os.remove("files.txt")


if name == "main":
install_check()
paragraphs = read_text(INPUT_FILE)
generated_files = []
for i, p in enumerate(paragraphs):
if len(p.strip()) > 0:
try:
generated_files.append(generate_chunk(p, i))
except Exception as e:
print(f"Failed on chunk {i}: {e}")
stitch_files(generated_files)
print(f"Done! Saved as {OUTPUT_FILENAME}")

πŸ“– How to Embed Chapters & Stops

To make the file navigable (Next/Previous Chapter support), you have two options.

Option A: Separate Files (Easiest)

Instead of stitching them into one big MP3, keep the files separate. Modify the script above to remove the stitch_files function call at the end. You will end up with chunk_001.wav, chunk_002.wav, etc. Rename these to 01 - Chapter 1.wav, etc., and put them in a folder on your phone.

Option B: Metadata Chapters (Advanced)

If you want one single file with click-points, you must inject metadata after stitching.

Create a text file metadata.txt:

  1. ;FFMETADATA1
    title=My AI Book
    artist=Pocket TTS
    
    [CHAPTER]
    TIMEBASE=1/1000
    START=0
    END=60000
    title=Chapter 1
    
    [CHAPTER]
    TIMEBASE=1/1000
    START=60000
    END=120000
    title=Chapter 2
    

Run this command to combine them:

  1. ffmpeg -i audiobook.mp3 -i metadata.txt -map_metadata 1 -codec copy audiobook_chapters.mp3