schloter/.github/copilot-instructions.md

2.7 KiB

Copilot Instructions for Schloter Project

Project Overview

This project automates the extraction of transcripts from a specific YouTube channel, analyzes them to extract quotes, stores those quotes in a SQLite database, and serves them via an API. A separate script can call this API daily to post a quote to a Microsoft Teams channel.

Architecture & Key Components

  • YouTube Transcript Extractor (YouTube Transcript Extractor with line correction.py):
    • Downloads German (manual preferred, auto-generated fallback) subtitles for all videos in a channel using yt-dlp.
    • Parses .vtt files, splits text into sentences, and saves them as .txt files.
  • Quote API (quotes_api.py):
    • FastAPI app serving /quotes endpoint.
    • Returns a random quote from the SQLite database, avoiding the last 20 served quotes.
  • Database:
    • SQLite database (quotes.db) with a quotes table (id, quote).
  • Teams Integration (planned):
    • A script will call the API and post the quote to a Teams channel (not yet implemented).

Developer Workflows

  • Extracting Transcripts: Run the transcript extractor script. It will create .vtt and .txt files in the transcripts/ directory.
  • Populating Quotes Database: (Manual/Scripted) Parse .txt files and insert quotes into the quotes table in quotes.db.
  • Running the API: Start with uvicorn quotes_api:app --reload.
  • Testing the API: Call GET /quotes to receive a random quote.

Conventions & Patterns

  • Always prefer manual subtitles over auto-generated for accuracy.
  • Quotes are stored as one sentence per line in .txt files, then inserted into the database.
  • The API avoids repeating the last 20 quotes by tracking IDs in memory.
  • All scripts assume the working directory is the project root.

External Dependencies

  • yt-dlp (for subtitle download, called via Python subprocess)
  • ffmpeg (binary required for best yt-dlp results)
  • fastapi, uvicorn (for API)
  • sqlite3 (for database)

Example Data Flow

  1. Extractor downloads and parses subtitles → .txt files.
  2. Quotes are loaded into quotes.db.
  3. API serves random quotes.
  4. (Planned) Teams bot posts a quote daily.

Key Files

  • YouTube Transcript Extractor with line correction.py: Transcript download and parsing logic.
  • quotes_api.py: FastAPI app for serving quotes.
  • transcripts/: Stores all subtitle and parsed text files.
  • quotes.db: SQLite database of quotes.

Tips for AI Agents

  • When adding new extraction or analysis logic, follow the pattern of sentence splitting and file naming in the extractor script.
  • When extending the API, maintain the non-repetition logic for quotes.
  • If adding Teams integration, use the API endpoint for quote retrieval.