Bluesky Post Explainer

AI Agent that Explains Social Media Posts with Web Context and Source Citations

Python FastAPI React SSE Streaming Cross-Encoder LiteLLM Hexagonal Architecture

Project Overview

Making sense of social media posts through AI-powered context retrieval

Bluesky Post Explainer is an AI agent that takes any Bluesky social media post and generates clear, sourced explanations by searching the web for relevant context, reranking results using ML, and synthesizing explanatory bullets with citations.

The system streams results progressively via Server-Sent Events, giving users real-time feedback as each pipeline stage completes — from post extraction to final explanation.

Problem Statement

Social media posts are often cryptic, reference niche topics, or assume context the reader doesn't have. Manually researching every unfamiliar post is time-consuming. This agent automates the entire process: extract → search → rank → explain.

Key Features

🔍 Smart Context Retrieval

LLM rewrites posts into focused search queries, then Tavily retrieves relevant web context

🎯 ML Reranking

Cross-encoder (ms-marco-MiniLM-L-6-v2) reranks results by semantic relevance to the post

⚡ Real-time Streaming

SSE delivers progressive updates at each pipeline stage — no waiting for full completion

🤖 Multi-LLM Routing

LiteLLM routes to Claude, GPT-4o, or Gemini based on availability and cost

System Architecture

Hexagonal (Ports & Adapters) with clean domain separation

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        ENTRYPOINTS                               │
│   React Frontend ──SSE──▶ FastAPI + SSE    │    CLI (argparse)   │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                          DOMAIN                                   │
│   ExplainPostUseCase  │  Ports (Abstract)  │  Entities (Models)  │
└──────────────────────────────┬──────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────┐
│                         ADAPTERS                                  │
│  BlueskyExtractor  │  TavilySearcher  │  CrossEncoderReranker   │
│  QueryRewriter     │  LiteLLMRouter   │  ExplanationGenerator   │
└─────────────────────────────────────────────────────────────────┘

The hexagonal architecture ensures the domain logic (use cases, entities) has zero external dependencies. All I/O goes through abstract ports, making the system highly testable and easy to swap components (e.g., replace Tavily with another search engine without touching business logic).

Pipeline Flow

1. Post Extraction

AT Protocol fetches the Bluesky post content (text, author, images)

2. Query Rewriting

LLM transforms the post into focused search queries for better retrieval

3. Web Search

Tavily API retrieves relevant web pages and articles

4. Cross-Encoder Reranking

ms-marco-MiniLM-L-6-v2 scores and reranks results by relevance

5. Explanation Generation

Multi-LLM generates 3-5 explanatory bullets with source citations

💡 Streaming Architecture

Each pipeline stage emits an SSE event as it completes. The React frontend renders progressively — users see the post content immediately, then sources appear, then explanation bullets stream in one by one.

Technical Deep Dive

Key implementation details and engineering decisions

Cross-Encoder Reranking

Initial search results from Tavily are good but noisy. The cross-encoder provides a second-pass relevance scoring that dramatically improves explanation quality.

from sentence_transformers import CrossEncoder

class CrossEncoderReranker:
    def __init__(self):
        self.model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

    def rerank(self, query: str, results: list[SearchResult]) -> list[SearchResult]:
        pairs = [(query, r.content) for r in results]
        scores = self.model.predict(pairs)

        scored_results = list(zip(results, scores))
        scored_results.sort(key=lambda x: x[1], reverse=True)

        return [r for r, _ in scored_results[:TOP_K]]

Why Cross-Encoder over Bi-Encoder?

Cross-encoders process query-document pairs jointly, achieving higher accuracy than bi-encoders (which encode separately). The trade-off is speed — but since we only rerank ~20 search results, latency is negligible (~50ms).

Multi-LLM Routing with LiteLLM

The system uses LiteLLM to abstract away provider differences, enabling seamless routing between Claude, GPT-4o, and Gemini with a unified interface.

import litellm

class LiteLLMRouter:
    """Unified interface for multiple LLM providers."""

    def __init__(self, model_list: list[dict]):
        self.router = litellm.Router(model_list=model_list)

    async def generate(self, messages: list[dict], **kwargs) -> str:
        response = await self.router.acompletion(
            model="gpt-4o",  # Routes based on availability
            messages=messages,
            **kwargs
        )
        return response.choices[0].message.content

Claude

Best for nuanced explanations

GPT-4o

Fast + multimodal support

Gemini

Cost-effective fallback

Server-Sent Events (SSE) Streaming

The FastAPI backend streams events to the React frontend as each pipeline stage completes:

from fastapi.responses import StreamingResponse

@app.post("/api/explain")
async def explain_post(request: ExplainRequest):
    async def event_stream():
        # Stage 1: Extract post
        post = await extractor.extract(request.url)
        yield f"event: post\ndata: {post.json()}\n\n"

        # Stage 2: Search for context
        sources = await searcher.search(post.text)
        yield f"event: sources\ndata: {json.dumps(sources)}\n\n"

        # Stage 3: Rerank results
        ranked = reranker.rerank(post.text, sources)

        # Stage 4: Generate explanation bullets
        async for bullet in generator.stream_bullets(post, ranked):
            yield f"event: bullet\ndata: {bullet.json()}\n\n"

        yield f"event: done\ndata: {{}}\n\n"

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Query Rewriting Strategy

Raw social media text makes poor search queries. The query rewriter transforms posts into focused, search-optimized queries:

❌ Raw post as query:

"wow the new paper from deepmind is insane, AGI is closer than we think 🤯"

✅ Rewritten queries:

1. "DeepMind latest research paper 2024"

2. "DeepMind AGI progress recent publication"

The system also supports a local CPU-based rewriter (Qwen2.5-0.5B) for environments without cloud API access, trading quality for zero-cost operation.

Project Structure

Clean separation following hexagonal architecture principles

src/
├── domain/                    # Core logic — zero external dependencies
│   ├── entities.py            # Pure dataclasses (PostContent, SearchResult, etc.)
│   ├── exceptions.py          # Domain exception hierarchy
│   ├── ports.py               # Abstract interfaces (PostExtractorPort, SearchPort, etc.)
│   └── use_cases.py           # ExplainPostUseCase — orchestrates the pipeline
├── adapters/                  # Concrete implementations of ports
│   ├── bluesky_extractor.py   # AT Protocol post extraction
│   ├── tavily_searcher.py     # Tavily web search
│   ├── query_rewriter.py      # LLM-based search query rewriting (cloud)
│   ├── local_query_rewriter.py# CPU-based query rewriting (Qwen2.5-0.5B)
│   ├── crossencoder_reranker.py # ML reranking (sentence-transformers)
│   ├── litellm_router.py      # Multi-LLM routing via LiteLLM
│   └── explanation_generator.py # Bullet generation with citations
└── entrypoints/               # Interface layer
    ├── api.py                 # FastAPI + Server-Sent Events streaming
    └── cli.py                 # CLI interface

tests/                         # Unit tests (pytest)
frontend/                      # React (Vite + TypeScript)
eval/                          # Evaluation harness (test cases, judge, reports)

Quick Start

Get the agent running locally in minutes

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • API keys: Tavily (search), Anthropic or OpenAI (LLM)

Setup & Run

# Clone and setup backend
git clone https://github.com/raulprocha/bluesky-post-explainer.git
cd bluesky-post-explainer
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

# Setup frontend
cd frontend && npm install && cd ..

# Terminal 1: Backend (FastAPI)
uvicorn src.entrypoints.api:app --host 0.0.0.0 --port 8000

# Terminal 2: Frontend (React/Vite)
cd frontend && npm run dev

Open the frontend URL, paste your API keys and a Bluesky post URL, and watch the explanation stream in real-time.

Key Design Decisions

Engineering trade-offs and rationale

Why Hexagonal Architecture?

Enables swapping any component (search engine, LLM provider, reranker) without touching domain logic. Critical for a system that depends on multiple third-party APIs that may change or fail.

Why SSE over WebSockets?

The data flow is unidirectional (server → client). SSE is simpler, works natively with HTTP/2, auto-reconnects, and doesn't require a persistent bidirectional connection.

Why Cross-Encoder Reranking?

Search APIs return keyword-relevant results, but not always semantically relevant to the post's intent. The cross-encoder reranks by actual meaning overlap, significantly improving explanation quality.

Why Multi-LLM Routing?

No single LLM provider guarantees 100% uptime. Routing provides resilience, cost optimization (use cheaper models when quality is sufficient), and flexibility to adopt better models as they release.

Why Local Query Rewriter Option?

Qwen2.5-0.5B runs on CPU with no API cost. Useful for development, testing, and environments with limited API budgets. Quality is lower but acceptable for most posts.