Jump to content

Local Pocket TTS 260124: Difference between revisions

From Game in the Brain Wiki
Created page with "== ๐ŸŽ™๏ธ Local TTS Wiki: Pocket TTS on Ubuntu == {{Notice|This tutorial is based on the guide by '''1littlecoder''': [https://www.youtube.com/watch?v=KISuRY1WXSs This is the ONLY CPU TTS you need in 2026]}} === ๐Ÿ› ๏ธ System Requirements === This guide is specifically optimized for the following stack: * '''GPU:''' AMD Radeon RX 7600 (RDNA3 / Navi 33) * '''OS:''' Ubuntu 24.04 LTS (Noble Numbat) * '''Platform:''' ROCm 6.x === ๐Ÿ” Phase 0: Pre-Flight Compatibility..."
ย 
No edit summary
ย 
Line 7: Line 7:
This guide is specifically optimized for the following stack:
This guide is specifically optimized for the following stack:


* '''GPU:''' AMD Radeon RX 7600 (RDNA3 / Navi 33)
'''GPU:''' AMD Radeon RX 7600 (RDNA3 / Navi 33)
* '''OS:''' Ubuntu 24.04 LTS (Noble Numbat)
ย 
* '''Platform:''' ROCm 6.x
'''OS:''' Ubuntu 24.04 LTS (Noble Numbat)
ย 
'''Platform:''' ROCm 6.x


=== ๐Ÿ” Phase 0: Pre-Flight Compatibility Check ===
=== ๐Ÿ” Phase 0: Pre-Flight Compatibility Check ===
Line 48: Line 50:
sudo apt update
sudo apt update


# Download the installer for Ubuntu 24.04
Download the installer for Ubuntu 24.04
ย 
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb


# Install ROCm and Graphics components
Install ROCm and Graphics components
ย 
sudo amdgpu-install --usecase=rocm,graphics --no-dkms
sudo amdgpu-install --usecase=rocm,graphics --no-dkms
</syntaxhighlight>
</syntaxhighlight>
Line 64: Line 68:
</syntaxhighlight>
</syntaxhighlight>


* '''If output shows render and video:''' Skip to Phase 2.
'''If output shows render and video:''' Skip to Phase 2.
* '''If empty:''' Run the commands below and then '''log out/in'''.
ย 
'''If empty:''' Run the commands below and then '''log out/in'''.


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
Line 84: Line 89:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Create and enter virtual environment
ย 
Create and enter virtual environment
ย 
uv venv tts_env
uv venv tts_env
source tts_env/bin/activate
source tts_env/bin/activate


# Install ROCm-specific PyTorch for AMD GPUs
Install ROCm-specific PyTorch for AMD GPUs
ย 
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
uv pip install pocket-tts
uv pip install pocket-tts
Line 120: Line 128:
print(f"Is ROCm/GPU available? {torch.cuda.is_available()}")
print(f"Is ROCm/GPU available? {torch.cuda.is_available()}")
if torch.cuda.is_available():
if torch.cuda.is_available():
ย  ย  print(f"Device Name: {torch.cuda.get_device_name(0)}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")
else:
else:
ย  ย  print("Warning: Running on CPU. Check Phase 3 for the RX 7600 Override.")
print("Warning: Running on CPU. Check Phase 3 for the RX 7600 Override.")
</syntaxhighlight>
</syntaxhighlight>


Line 154: Line 162:
sudo apt update
sudo apt update
sudo amdgpu-install --usecase=rocm,graphics --no-dkms
sudo amdgpu-install --usecase=rocm,graphics --no-dkms
</syntaxhighlight>
== ๐ŸŽ›๏ธ Usage & Automation Guide ==
=== ๐Ÿ”˜ Create an Ubuntu App Button ===
To launch the server from your App Grid or Dock without typing commands every time.
'''1. Create the Desktop Entry'''
Open your terminal and run this block of code (copy and paste the whole thing):
<syntaxhighlight lang="bash">
cat <<EOF > ~/.local/share/applications/pocket-tts.desktop
[Desktop Entry]
Name=Pocket TTS Server
Comment=Start local TTS with RX 7600 override
Exec=bash -c "export HSA_OVERRIDE_GFX_VERSION=11.0.0; $HOME/.local/bin/uvx pocket-tts serve --port 5821; read -p 'Server stopped. Press Enter to close...'"
Icon=audio-headphones
Terminal=true
Type=Application
Categories=Utility;Audio;
EOF
</syntaxhighlight>
'''2. Activate It'''
Run <nowiki>chmod +x ~/.local/share/applications/pocket-tts.desktop</nowiki>
Press the '''Super (Windows) Key''' and search for '''Pocket TTS'''.
Right-click the icon and select '''Pin to Dash'''.
''Note: When you click the icon, a terminal window will open. Minimize it to keep the server running. Close it to stop the server.''
=== ๐Ÿ›‘ How to Stop Speaking ===
Since this is a streaming server, it will keep generating audio until finished. To interrupt it:
'''If using the Web Interface:''' Refresh the page (<nowiki>F5</nowiki> or <nowiki>Ctrl+R</nowiki>). This immediately cuts the connection.
'''If using the Terminal:''' Click inside the terminal window running the server and press <nowiki>Ctrl+C</nowiki>. (You will need to restart the server to speak again).
=== ๐ŸŽจ How to Tweak the Tone ===
The AI reads purely based on text input. You can manipulate the "acting" using these methods:
==== Change the Voice ====
Use the <nowiki>--voice</nowiki> flag in the command line or select it in the UI.
'''<nowiki>kyle</nowiki>''': Authoritative, standard American male.
'''<nowiki>fme</nowiki>''': Softer, faster female.
'''<nowiki>af_heart</nowiki>''': Often used for audiobooks, generally expressive.
==== "Prompt Engineering" with Punctuation ====
The AI breathes and pauses based on grammar.
'''Excitement/Anger:''' Use exclamation marks and short sentences! Like this!
'''Whispering/Sadness:''' Use lowercase text... with ellipses... and no hard stops...
'''Speed:''' Remove commas to make the sentence run on and feel rushed or panicked.
'''Pauses:''' Use <nowiki>[pause]</nowiki> (if supported) or simply add <nowiki>...</nowiki> or <nowiki>--</nowiki> to force a breath.
=== ๐Ÿงต Script: Stitching Files (Audiobook Maker) ===
To turn a long text file into a single MP3, use this Python script. It breaks text into paragraphs, generates audio for each, and stitches them together with <nowiki>ffmpeg</nowiki>.
'''Prerequisites:'''
<nowiki>sudo apt install ffmpeg</nowiki>
'''The Script (<nowiki>make_audiobook.py</nowiki>):'''
<syntaxhighlight lang="python">
import os
import subprocess
--- CONFIGURATION ---
INPUT_FILE = "my_book.txt"ย  ย  ย  # Your text file
OUTPUT_FILENAME = "audiobook.mp3" # Final result
VOICE = "af_heart"ย  ย  ย  ย  ย  ย  ย  # Change to your preferred voice
GPU_OVERRIDE = "11.0.0"ย  ย  ย  ย  # For RX 7600
---------------------
def install_check():
if subprocess.call(['which', 'ffmpeg']) != 0:
print("Error: ffmpeg is not installed. Run 'sudo apt install ffmpeg'")
exit()
def read_text(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
return f.read().split('\n\n')
def generate_chunk(text, index):
filename = f"chunk_{index:03d}.wav"
print(f"Generating Part {index}...")
cmd = ["uvx", "pocket-tts", "generate", "--text", text, "--voice", VOICE, "--out", filename]
env = os.environ.copy()
env["HSA_OVERRIDE_GFX_VERSION"] = GPU_OVERRIDE
subprocess.run(cmd, env=env, check=True)
return filename
def stitch_files(file_list):
with open("files.txt", "w") as f:
for audio in file_list:
f.write(f"file '{audio}'\n")
print("Stitching files together...")
subprocess.run(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "files.txt", "-c:a", "libmp3lame", "-q:a", "2", OUTPUT_FILENAME])
# Cleanup
for audio in file_list: os.remove(audio)
os.remove("files.txt")
if name == "main":
install_check()
paragraphs = read_text(INPUT_FILE)
generated_files = []
for i, p in enumerate(paragraphs):
if len(p.strip()) > 0:
try:
generated_files.append(generate_chunk(p, i))
except Exception as e:
print(f"Failed on chunk {i}: {e}")
stitch_files(generated_files)
print(f"Done! Saved as {OUTPUT_FILENAME}")
</syntaxhighlight>
=== ๐Ÿ“– How to Embed Chapters & Stops ===
To make the file navigable (Next/Previous Chapter support), you have two options.
==== Option A: Separate Files (Easiest) ====
Instead of stitching them into one big MP3, keep the files separate. Modify the script above to '''remove''' the <nowiki>stitch_files</nowiki> function call at the end. You will end up with <nowiki>chunk_001.wav</nowiki>, <nowiki>chunk_002.wav</nowiki>, etc. Rename these to <nowiki>01 - Chapter 1.wav</nowiki>, etc., and put them in a folder on your phone.
==== Option B: Metadata Chapters (Advanced) ====
If you want one single file with click-points, you must inject metadata after stitching.
Create a text file <nowiki>metadata.txt</nowiki>:
#:<syntaxhighlight lang="ini">
;FFMETADATA1
title=My AI Book
artist=Pocket TTS
[CHAPTER]
TIMEBASE=1/1000
START=0
END=60000
title=Chapter 1
[CHAPTER]
TIMEBASE=1/1000
START=60000
END=120000
title=Chapter 2
</syntaxhighlight>
Run this command to combine them:
#:<syntaxhighlight lang="bash">
ffmpeg -i audiobook.mp3 -i metadata.txt -map_metadata 1 -codec copy audiobook_chapters.mp3
</syntaxhighlight>
</syntaxhighlight>



Latest revision as of 12:20, 7 February 2026

๐ŸŽ™๏ธ Local TTS Wiki: Pocket TTS on Ubuntu

Template:Notice

๐Ÿ› ๏ธ System Requirements

This guide is specifically optimized for the following stack:

GPU: AMD Radeon RX 7600 (RDNA3 / Navi 33)

OS: Ubuntu 24.04 LTS (Noble Numbat)

Platform: ROCm 6.x

๐Ÿ” Phase 0: Pre-Flight Compatibility Check

Before you begin, run these commands to ensure your hardware is ready.

1. Check CPU Support

Pocket TTS requires AVX2 instructions for real-time CPU performance.

lscpu | grep -i "avx2"

2. Verify GPU Detection

Ensure your RX 7600 is visible to the system.

lspci -nn | grep -i vga

3. Check Kernel Version

AMD RDNA3 cards (RX 7600) require Kernel 6.2 or higher for stable support.

uname -r

๐Ÿ› ๏ธ Phase 1: Driver & ROCm Setup

The RX 7600 uses the gfx1102 architecture.

1. Install AMD ROCm 6.2

sudo apt update

Download the installer for Ubuntu 24.04

wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
sudo apt install ./amdgpu-install_6.2.60204-1_all.deb

Install ROCm and Graphics components

sudo amdgpu-install --usecase=rocm,graphics --no-dkms

2. Hardware Access Permissions

Check if your user already has the necessary hardware permissions:

groups | grep -E 'render|video'

If output shows render and video: Skip to Phase 2.

If empty: Run the commands below and then log out/in.

sudo usermod -a -G render,video $LOGNAME

๐Ÿ“ฆ Phase 2: Pocket TTS Installation

1. Install uv (Fast Package Manager)

curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
source ~/.bashrc

2. Environment Setup

Create and enter virtual environment

uv venv tts_env
source tts_env/bin/activate

Install ROCm-specific PyTorch for AMD GPUs

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
uv pip install pocket-tts

๐Ÿš€ Phase 3: Launching the Service

Standard Launch

By default, the server runs on port 8000. Use a unique port (like 5821) to avoid conflicts.

uvx pocket-tts serve --port 5821

RX 7600 Hardware Override

If the server starts but uses the CPU instead of the GPU, use this environment variable to force RDNA3 compatibility:

export HSA_OVERRIDE_GFX_VERSION=11.0.0
uvx pocket-tts serve --port 5821

๐Ÿงช Phase 4: Verification

Create a script named check_gpu.py to confirm hardware acceleration is active:

import torch
print(f"Is ROCm/GPU available? {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"Device Name: {torch.cuda.get_device_name(0)}")
else:
print("Warning: Running on CPU. Check Phase 3 for the RX 7600 Override.")

Run it with: python check_gpu.py

โš ๏ธ Common Failure Steps & Fixes

Symptom Fix
amdgpu-lib32 unmet dependencies Run the 32-bit architecture fix (see below).
Permission Denied on /dev/kfd Ensure you added groups and restarted your session.
Address already in use Use the --port 5821 flag.
HSA_STATUS_ERROR_OUT_OF_RESOURCES Close other GPU-heavy apps (Browsers/Games).

Detailed Fix: amdgpu-lib32 Unmet Dependencies

If you encounter errors regarding :i386 packages, enable the 32-bit architecture:

sudo dpkg --add-architecture i386
sudo apt update
sudo amdgpu-install --usecase=rocm,graphics --no-dkms

๐ŸŽ›๏ธ Usage & Automation Guide

๐Ÿ”˜ Create an Ubuntu App Button

To launch the server from your App Grid or Dock without typing commands every time.

1. Create the Desktop Entry

Open your terminal and run this block of code (copy and paste the whole thing):

cat <<EOF > ~/.local/share/applications/pocket-tts.desktop
[Desktop Entry]
Name=Pocket TTS Server
Comment=Start local TTS with RX 7600 override
Exec=bash -c "export HSA_OVERRIDE_GFX_VERSION=11.0.0; $HOME/.local/bin/uvx pocket-tts serve --port 5821; read -p 'Server stopped. Press Enter to close...'"
Icon=audio-headphones
Terminal=true
Type=Application
Categories=Utility;Audio;
EOF

2. Activate It

Run chmod +x ~/.local/share/applications/pocket-tts.desktop

Press the Super (Windows) Key and search for Pocket TTS.

Right-click the icon and select Pin to Dash.

Note: When you click the icon, a terminal window will open. Minimize it to keep the server running. Close it to stop the server.

๐Ÿ›‘ How to Stop Speaking

Since this is a streaming server, it will keep generating audio until finished. To interrupt it:

If using the Web Interface: Refresh the page (F5 or Ctrl+R). This immediately cuts the connection.

If using the Terminal: Click inside the terminal window running the server and press Ctrl+C. (You will need to restart the server to speak again).

๐ŸŽจ How to Tweak the Tone

The AI reads purely based on text input. You can manipulate the "acting" using these methods:

Change the Voice

Use the --voice flag in the command line or select it in the UI.

kyle: Authoritative, standard American male.

fme: Softer, faster female.

af_heart: Often used for audiobooks, generally expressive.

"Prompt Engineering" with Punctuation

The AI breathes and pauses based on grammar.

Excitement/Anger: Use exclamation marks and short sentences! Like this!

Whispering/Sadness: Use lowercase text... with ellipses... and no hard stops...

Speed: Remove commas to make the sentence run on and feel rushed or panicked.

Pauses: Use [pause] (if supported) or simply add ... or -- to force a breath.

๐Ÿงต Script: Stitching Files (Audiobook Maker)

To turn a long text file into a single MP3, use this Python script. It breaks text into paragraphs, generates audio for each, and stitches them together with ffmpeg.

Prerequisites: sudo apt install ffmpeg

The Script (make_audiobook.py):

import os
import subprocess

--- CONFIGURATION ---

INPUT_FILE = "my_book.txt"      # Your text file
OUTPUT_FILENAME = "audiobook.mp3" # Final result
VOICE = "af_heart"              # Change to your preferred voice
GPU_OVERRIDE = "11.0.0"         # For RX 7600

---------------------

def install_check():
if subprocess.call(['which', 'ffmpeg']) != 0:
print("Error: ffmpeg is not installed. Run 'sudo apt install ffmpeg'")
exit()

def read_text(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
return f.read().split('\n\n')

def generate_chunk(text, index):
filename = f"chunk_{index:03d}.wav"
print(f"Generating Part {index}...")
cmd = ["uvx", "pocket-tts", "generate", "--text", text, "--voice", VOICE, "--out", filename]
env = os.environ.copy()
env["HSA_OVERRIDE_GFX_VERSION"] = GPU_OVERRIDE
subprocess.run(cmd, env=env, check=True)
return filename

def stitch_files(file_list):
with open("files.txt", "w") as f:
for audio in file_list:
f.write(f"file '{audio}'\n")

print("Stitching files together...")
subprocess.run(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "files.txt", "-c:a", "libmp3lame", "-q:a", "2", OUTPUT_FILENAME])

# Cleanup
for audio in file_list: os.remove(audio)
os.remove("files.txt")


if name == "main":
install_check()
paragraphs = read_text(INPUT_FILE)
generated_files = []
for i, p in enumerate(paragraphs):
if len(p.strip()) > 0:
try:
generated_files.append(generate_chunk(p, i))
except Exception as e:
print(f"Failed on chunk {i}: {e}")
stitch_files(generated_files)
print(f"Done! Saved as {OUTPUT_FILENAME}")

๐Ÿ“– How to Embed Chapters & Stops

To make the file navigable (Next/Previous Chapter support), you have two options.

Option A: Separate Files (Easiest)

Instead of stitching them into one big MP3, keep the files separate. Modify the script above to remove the stitch_files function call at the end. You will end up with chunk_001.wav, chunk_002.wav, etc. Rename these to 01 - Chapter 1.wav, etc., and put them in a folder on your phone.

Option B: Metadata Chapters (Advanced)

If you want one single file with click-points, you must inject metadata after stitching.

Create a text file metadata.txt:

  1. ;FFMETADATA1
    title=My AI Book
    artist=Pocket TTS
    
    [CHAPTER]
    TIMEBASE=1/1000
    START=0
    END=60000
    title=Chapter 1
    
    [CHAPTER]
    TIMEBASE=1/1000
    START=60000
    END=120000
    title=Chapter 2
    

Run this command to combine them:

  1. ffmpeg -i audiobook.mp3 -i metadata.txt -map_metadata 1 -codec copy audiobook_chapters.mp3