Resume Renamer 260120
1. The Problem
Students and applicants rarely follow file naming conventions. You likely have a folder that looks like this:
Resume.pdf
CV_Final_v2.docx
MyResume(1).pdf
john_doe.pdf
This makes sorting by date or qualification impossible without opening every single file.
The Goal: Automatically rename these files based on their content to a standard format:
- YYMMDD Name Degree/Background.pdf
- Example: 250101 Juan Dela Cruz BS Information Technology.pdf
2. Requirements Checklist
Please ensure you have the following ready before starting.
[ ] Ubuntu 24.04 System.
[ ] Python 3.12+ (Pre-installed on Ubuntu 24.04).
[ ] Ollama installed locally (The AI engine).
[ ] A Small Language Model pulled (e.g., granite3.3:2b or llama3.2).
- Note: Small models are fast but can make mistakes. The script has logic to catch these, but a human review is always recommended.
[ ] Python Libraries: pdfplumber (for PDFs), python-docx (for Word), requests (to talk to Ollama).
[ ] No Images: The files must have embedded text. This script excludes OCR (Optical Character Recognition) to keep it fast and lightweight. Pure image scans will be skipped.
3. How the Script Works (The Logic)
This script acts as a "Project Manager" that hires two distinct specialists to process each file. It does not blindly ask the AI for everything, as small AIs make mistakes with math and dates.
File Discovery:
- The script looks for .pdf and .docx files in the folder where the script is located.
Text Extraction:
- It pulls raw text. If the text is less than 50 characters (likely an image scan), it skips the file.
The Date Specialist (Python Regex):
- Logic: It scans the text for explicit years (e.g., "2023", "2024").
- Rule: It ignores the word "Present". Why? If a resume from 2022 says "2022 - Present", treating "Present" as "Today" (2026) would incorrectly date the old resume. We stick to the highest printed number.
- Output: Sets the date to Jan 1st of the highest year found (e.g., 240101).
The Content Specialist (Ollama AI):
- Logic: It sends the text to the local AI with strict instructions.
- Rule 1 (Priority): It looks for a Degree (e.g., "BS IT") first. It is forbidden from using "Intern" or "Student" if a degree is found.
- Rule 2 (Fallback): If the AI fails to find a name, the script grabs the first line of the document as a fallback.
Sanitization & Renaming:
- It fixes "Spaced Names" (e.g., J O H N -> John).
- It ensures the filename isn't too long.
- It renames the file only if the name doesn't already exist.
4. Installation Guide (Ubuntu 24.04)
Open your terminal (Ctrl+Alt+T) and follow these steps exactly.
Step A: System Update
Ensure your system tools are fresh to avoid installation conflicts.
sudo apt update && sudo apt upgrade -y
Step B: Install Ollama & The Model
Install the Ollama Engine:
curl -fsSL https://ollama.com/install.sh | sh
Download the Brain (The Model):
- We use granite3.3:2b because it is very fast.
ollama pull granite3.3:2b
Step C: Setup Python Environment
Ubuntu 24.04 requires Virtual Environments (venv) for Python scripts.
Create a Project Folder:
mkdir ~/resume_renamer cd ~/resume_renamer
Create the Virtual Environment:
python3 -m venv venv
Activate the Environment:
source venv/bin/activate
- (You should see (venv) at the start of your command line now).
Install Required Libraries:
pip install requests pdfplumber python-docx
Step D: Create the Script
Create the python file:
nano rename_resumes.py
Paste the Python code provided in the appendix below.
Save and exit: Press Ctrl+O, Enter, then Ctrl+X.
5. Running the Renamer
This script is portable. It works on the files sitting next to it.
Copy the Script: Move the rename_resumes.py file into your folder full of PDFs (e.g., ~/Documents/Student_CVs).
Open Terminal in that folder:
cd ~/Documents/Student_CVs
Activate your Python Environment (Point to where you created it):
source ./venv/bin/activate
Run the script:
python3 rename_resumes.py
6. Common Errors & Troubleshooting
| Error / Behavior | Why it happens | The Fix (Included in Script) |
|---|---|---|
| "Intern" instead of "Degree" | The Resume had "INTERN" in big bold letters. | The script's prompt explicitly forbids "Intern" if a Degree is found. |
| Wrong Date (e.g., 260101) | The resume said "2021-Present" and the script assumed "Present" = 2026. | We disabled "Present" logic. It now only trusts explicit numbers (e.g., 2021). |
| Spaced Names (J O H N) | PDF formatting added spaces between letters. | A Regex function detects single letters + spaces and collapses them. |
| Script Freezes | Ollama is overwhelmed. | We added a 60-second timeout and a 0.5s pause between files. |
| Skipped Files | The PDF is a scanned image (no text). | This is intended. You need an OCR tool for these (not included here). |
Appendix: The Python Script
Copy the code below into rename_resumes.py.
# --- IMPROVED FUNCTION: SMART PDF READER (Skips Forms & Signature Pages) ---
def get_smart_pdf_text(filepath):
"""
Reads PDF pages but SKIPS pages that look like 'Application Forms'.
Returns the text of the first 2 'valid' resume pages found.
"""
valid_text = ""
pages_read = 0
# Phrases that indicate a page is a FORM, not a Resume
skip_phrases = [
"APPLICATION FOR EMPLOYMENT",
"OFFICIAL USE ONLY",
"DO NOT WRITE BELOW THIS LINE",
"PERSONAL DATA SHEET",
"APPLICANT'S SIGNATURE", # Found on Page 2 of your file
"FAMILY BACKGROUND" # Found on Page 2 of your file
]
try:
with pdfplumber.open(filepath) as pdf:
for page in pdf.pages:
text = page.extract_text() or ""
# CHECK: Is this page just a form?
# We check if ANY of the skip phrases appear in the text
is_form = any(phrase in text.upper() for phrase in skip_phrases)
if is_form:
print(f" [INFO] Skipped a 'Form' page (found key phrase)...")
continue # Skip this page, check the next one
# If not a form, it's likely the resume. Keep it.
valid_text += text + "\n"
pages_read += 1
# Stop after finding 2 valid pages of resume content
if pages_read >= 2:
break
except Exception as e:
print(f" [ERROR] PDF Read Error: {e}")
return ""
return valid_text
# --------------------------------------