schloter/.github/copilot-instructions.md

52 lines
2.7 KiB
Markdown

# Copilot Instructions for Schloter Project
## Project Overview
This project automates the extraction of transcripts from a specific YouTube channel, analyzes them to extract quotes, stores those quotes in a SQLite database, and serves them via an API. A separate script can call this API daily to post a quote to a Microsoft Teams channel.
## Architecture & Key Components
- **YouTube Transcript Extractor** (`YouTube Transcript Extractor with line correction.py`):
- Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using `yt-dlp`.
- Parses `.vtt` files, splits text into sentences, and saves them as `.txt` files.
- **Quote API** (`quotes_api.py`):
- FastAPI app serving `/quotes` endpoint.
- Returns a random quote from the SQLite database, avoiding the last 20 served quotes.
- **Database**:
- SQLite database (`quotes.db`) with a `quotes` table (`id`, `quote`).
- **Teams Integration** (planned):
- A script will call the API and post the quote to a Teams channel (not yet implemented).
## Developer Workflows
- **Extracting Transcripts**: Run the transcript extractor script. It will create `.vtt` and `.txt` files in the `transcripts/` directory.
- **Populating Quotes Database**: (Manual/Scripted) Parse `.txt` files and insert quotes into the `quotes` table in `quotes.db`.
- **Running the API**: Start with `uvicorn quotes_api:app --reload`.
- **Testing the API**: Call `GET /quotes` to receive a random quote.
## Conventions & Patterns
- Always prefer manual subtitles over auto-generated for accuracy.
- Quotes are stored as one sentence per line in `.txt` files, then inserted into the database.
- The API avoids repeating the last 20 quotes by tracking IDs in memory.
- All scripts assume the working directory is the project root.
## External Dependencies
- `yt-dlp` (for subtitle download, called via Python subprocess)
- `ffmpeg` (binary required for best `yt-dlp` results)
- `fastapi`, `uvicorn` (for API)
- `sqlite3` (for database)
## Example Data Flow
1. Extractor downloads and parses subtitles → `.txt` files.
2. Quotes are loaded into `quotes.db`.
3. API serves random quotes.
4. (Planned) Teams bot posts a quote daily.
## Key Files
- `YouTube Transcript Extractor with line correction.py`: Transcript download and parsing logic.
- `quotes_api.py`: FastAPI app for serving quotes.
- `transcripts/`: Stores all subtitle and parsed text files.
- `quotes.db`: SQLite database of quotes.
## Tips for AI Agents
- When adding new extraction or analysis logic, follow the pattern of sentence splitting and file naming in the extractor script.
- When extending the API, maintain the non-repetition logic for quotes.
- If adding Teams integration, use the API endpoint for quote retrieval.