2.7 KiB
2.7 KiB
Copilot Instructions for Schloter Project
Project Overview
This project automates the extraction of transcripts from a specific YouTube channel, analyzes them to extract quotes, stores those quotes in a SQLite database, and serves them via an API. A separate script can call this API daily to post a quote to a Microsoft Teams channel.
Architecture & Key Components
- YouTube Transcript Extractor (
YouTube Transcript Extractor with line correction.py):- Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using
yt-dlp. - Parses
.vttfiles, splits text into sentences, and saves them as.txtfiles.
- Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using
- Quote API (
quotes_api.py):- FastAPI app serving
/quotesendpoint. - Returns a random quote from the SQLite database, avoiding the last 20 served quotes.
- FastAPI app serving
- Database:
- SQLite database (
quotes.db) with aquotestable (id,quote).
- SQLite database (
- Teams Integration (planned):
- A script will call the API and post the quote to a Teams channel (not yet implemented).
Developer Workflows
- Extracting Transcripts: Run the transcript extractor script. It will create
.vttand.txtfiles in thetranscripts/directory. - Populating Quotes Database: (Manual/Scripted) Parse
.txtfiles and insert quotes into thequotestable inquotes.db. - Running the API: Start with
uvicorn quotes_api:app --reload. - Testing the API: Call
GET /quotesto receive a random quote.
Conventions & Patterns
- Always prefer manual subtitles over auto-generated for accuracy.
- Quotes are stored as one sentence per line in
.txtfiles, then inserted into the database. - The API avoids repeating the last 20 quotes by tracking IDs in memory.
- All scripts assume the working directory is the project root.
External Dependencies
yt-dlp(for subtitle download, called via Python subprocess)ffmpeg(binary required for bestyt-dlpresults)fastapi,uvicorn(for API)sqlite3(for database)
Example Data Flow
- Extractor downloads and parses subtitles →
.txtfiles. - Quotes are loaded into
quotes.db. - API serves random quotes.
- (Planned) Teams bot posts a quote daily.
Key Files
YouTube Transcript Extractor with line correction.py: Transcript download and parsing logic.quotes_api.py: FastAPI app for serving quotes.transcripts/: Stores all subtitle and parsed text files.quotes.db: SQLite database of quotes.
Tips for AI Agents
- When adding new extraction or analysis logic, follow the pattern of sentence splitting and file naming in the extractor script.
- When extending the API, maintain the non-repetition logic for quotes.
- If adding Teams integration, use the API endpoint for quote retrieval.