# Copilot Instructions for Schloter Project ## Project Overview This project automates the extraction of transcripts from a specific YouTube channel, analyzes them to extract quotes, stores those quotes in a SQLite database, and serves them via an API. A separate script can call this API daily to post a quote to a Microsoft Teams channel. ## Architecture & Key Components - **YouTube Transcript Extractor** (`YouTube Transcript Extractor with line correction.py`): - Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using `yt-dlp`. - Parses `.vtt` files, splits text into sentences, and saves them as `.txt` files. - **Quote API** (`quotes_api.py`): - FastAPI app serving `/quotes` endpoint. - Returns a random quote from the SQLite database, avoiding the last 20 served quotes. - **Database**: - SQLite database (`quotes.db`) with a `quotes` table (`id`, `quote`). - **Teams Integration** (planned): - A script will call the API and post the quote to a Teams channel (not yet implemented). ## Developer Workflows - **Extracting Transcripts**: Run the transcript extractor script. It will create `.vtt` and `.txt` files in the `transcripts/` directory. - **Populating Quotes Database**: (Manual/Scripted) Parse `.txt` files and insert quotes into the `quotes` table in `quotes.db`. - **Running the API**: Start with `uvicorn quotes_api:app --reload`. - **Testing the API**: Call `GET /quotes` to receive a random quote. ## Conventions & Patterns - Always prefer manual subtitles over auto-generated for accuracy. - Quotes are stored as one sentence per line in `.txt` files, then inserted into the database. - The API avoids repeating the last 20 quotes by tracking IDs in memory. - All scripts assume the working directory is the project root. ## External Dependencies - `yt-dlp` (for subtitle download, called via Python subprocess) - `ffmpeg` (binary required for best `yt-dlp` results) - `fastapi`, `uvicorn` (for API) - `sqlite3` (for database) ## Example Data Flow 1. Extractor downloads and parses subtitles → `.txt` files. 2. Quotes are loaded into `quotes.db`. 3. API serves random quotes. 4. (Planned) Teams bot posts a quote daily. ## Key Files - `YouTube Transcript Extractor with line correction.py`: Transcript download and parsing logic. - `quotes_api.py`: FastAPI app for serving quotes. - `transcripts/`: Stores all subtitle and parsed text files. - `quotes.db`: SQLite database of quotes. ## Tips for AI Agents - When adding new extraction or analysis logic, follow the pattern of sentence splitting and file naming in the extractor script. - When extending the API, maintain the non-repetition logic for quotes. - If adding Teams integration, use the API endpoint for quote retrieval.