2.7 KiB

Raw Blame History

Copilot Instructions for Schloter Project

Project Overview

This project automates the extraction of transcripts from a specific YouTube channel, analyzes them to extract quotes, stores those quotes in a SQLite database, and serves them via an API. A separate script can call this API daily to post a quote to a Microsoft Teams channel.

Architecture & Key Components

YouTube Transcript Extractor (YouTube Transcript Extractor with line correction.py):
- Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using yt-dlp.
- Parses .vtt files, splits text into sentences, and saves them as .txt files.
Quote API (quotes_api.py):
- FastAPI app serving /quotes endpoint.
- Returns a random quote from the SQLite database, avoiding the last 20 served quotes.
Database:
- SQLite database (quotes.db) with a quotes table (id, quote).
Teams Integration (planned):
- A script will call the API and post the quote to a Teams channel (not yet implemented).

Developer Workflows

Extracting Transcripts: Run the transcript extractor script. It will create .vtt and .txt files in the transcripts/ directory.
Populating Quotes Database: (Manual/Scripted) Parse .txt files and insert quotes into the quotes table in quotes.db.
Running the API: Start with uvicorn quotes_api:app --reload.
Testing the API: Call GET /quotes to receive a random quote.

Conventions & Patterns

Always prefer manual subtitles over auto-generated for accuracy.
Quotes are stored as one sentence per line in .txt files, then inserted into the database.
The API avoids repeating the last 20 quotes by tracking IDs in memory.
All scripts assume the working directory is the project root.

External Dependencies

yt-dlp (for subtitle download, called via Python subprocess)
ffmpeg (binary required for best yt-dlp results)
fastapi, uvicorn (for API)
sqlite3 (for database)

Example Data Flow

Extractor downloads and parses subtitles → .txt files.
Quotes are loaded into quotes.db.
API serves random quotes.
(Planned) Teams bot posts a quote daily.

Key Files

YouTube Transcript Extractor with line correction.py: Transcript download and parsing logic.
quotes_api.py: FastAPI app for serving quotes.
transcripts/: Stores all subtitle and parsed text files.
quotes.db: SQLite database of quotes.

Tips for AI Agents

When adding new extraction or analysis logic, follow the pattern of sentence splitting and file naming in the extractor script.
When extending the API, maintain the non-repetition logic for quotes.
If adding Teams integration, use the API endpoint for quote retrieval.

2.7 KiB Raw Blame History