EMBEDDER

Overview

Summarizer-Embedder is a microservice that processes user activities into semantic vector representations for long-term project memory. This autonomous background service transforms activity events into a searchable knowledge base using modern AI/ML techniques.

Functionality

Core Tasks

1

Event Reception

Listens to Pub/Sub messages containing user activities (heartbeats, code changes, edits).

2

Summary Generation

Uses LLM (Gemini 2.0 Flash) to create concise, one-sentence activity summaries.

3

Vectorization

Transforms summaries into embeddings enabling semantic search.

4

Archival

Stores data in BigQuery for long-term retention and analysis.

Data Flow

Event Source (Pub/Sub)

Summarizer-Embedder

   ┌─────────────────┐
   │ 1. Parse event  │
   └────────┬────────┘

   ┌─────────────────┐
   │ 2. LLM Summary  │ ← Gemini 2.0 Flash
   └────────┬────────┘

   ┌─────────────────┐
   │ 3. Embedding    │ ← Gemini Embedding
   └────────┬────────┘

   ┌─────────────────┐
   │ 4. Data Storage │
   └─────────────────┘

Technical Architecture

Technology Stack

  • Runtime: Node.js 18

  • Framework: Express.js

  • AI/ML: Google Cloud AI Platform

    • LLM: Gemini 2.0 Flash (summary generation)

    • Embeddings: Gemini Embedding Model

  • Storage:

    • BigQuery (data warehouse)

    • Cloud SQL MySQL (team metadata)

  • Deployment: Docker, Cloud Run

Components

1

Endpoint HTTP (POST /)

Receives Pub/Sub messages in the format:

{
  "message": {
    "data": "<base64-encoded-payload>"
  }
}

Payload contains:

  • user_id / uid – user identifier

  • ts – activity timestamp

  • content / delta / entity – activity content

2

LLM Processing

// Prompt: "Summarize in 1 sentence:\n{content}"
// Model: gemini-2.0-flash
// Timeout: 8s

Generates concise, contextual summary of user activity.

3

Embedding Generation

  • Task Type: RETRIEVAL_DOCUMENT

  • Parameters: Configuration optimized for semantic search

  • Fallback: If embedding fails, record is saved without vector

4

BigQuery Storage

Dataset: your_project.summaries

Schema:

id          STRING    (UUID)
team_id     STRING    (from Cloud SQL)
user_id     STRING    
ts          TIMESTAMP 
summary     STRING    (generated by LLM)
embedding   FLOAT64[] (semantic vector)

Ecosystem Integration

Upstream Dependencies

  • Event Source – activity events via Pub/Sub messaging

  • Cloud SQL – user and team metadata storage

  • Google Cloud AI Platform – LLM and embedding services

Downstream Capabilities

1
Query: "Who worked on authentication last year?"
→ Similarity search in BigQuery ML
→ Returns relevant activities with context
2

Conversing with the Project

RAG (Retrieval-Augmented Generation):

  • Embedding user query

  • Finding similar activities

  • Context for LLM → response

3

Team Analytics

  • Activity trends

  • Domain expert identification

  • Project timeline

Configuration

This section describes how to configure the microservice for your specific environment and requirements.

Environment Variables

# Optional
PORT=8080                    # HTTP port (default: 8080)
PROJECT_ID=my-gcp-project    # GCP project (auto-detected if empty)

# Required
DB_HOST=your-db-host        # Cloud SQL host
DB_USER=your-service-account     
DB_PASS=***
DB_NAME=your_database

Deployment (Cloud Run)

# Build
docker build -t summarizer-embedder .

# Deploy
gcloud run deploy summarizer-embedder \
  --image=gcr.io/PROJECT_ID/summarizer-embedder \
  --region=us-central1 \
  --set-env-vars="DB_HOST=...,DB_USER=...,DB_PASS=...,DB_NAME=..." \
  --allow-unauthenticated=false

Pub/Sub Subscription

gcloud pubsub subscriptions create summarizer-sub \
  --topic=report-events \
  --push-endpoint=https://summarizer-embedder-xxx.run.app/ \
  --ack-deadline=60

Error Handling

Resilience Strategy

1

Graceful degradation

  • If LLM fails → uses first 250 characters as summary

  • If embedding fails → saves record without vector

  • Always returns 200 OK to Pub/Sub (ack message)

2

Timeouts

  • LLM: 8s

  • Embedding: 8s

  • Prevents Pub/Sub queue blocking

3

Logging

✅ {uuid} → {team_id} (vec/no-vec)  # Success
✖ LLM: {error}                      # LLM error
✖ embedding: {error}                # Embedding error
❌ BQ insert error: {error}         # BigQuery error

Security

Data Protection

  • Authentication: Google Cloud IAM

  • Authorization: Service Account with minimal permissions:

    • bigquery.dataEditor (your_project.summaries)

    • cloudsql.client

    • aiplatform.user

  • Network: Cloud SQL via Private IP

  • Secrets: Env vars in Cloud Run (not in code)

Data Security

  • Embedding Configuration: Vector parameters are configurable based on your use case

  • Model Abstraction: AI services are accessed through Google Cloud APIs with proper authentication

  • Data Isolation: Each deployment maintains separate data boundaries

Monitoring & Observability

Key Metrics

  1. Throughput: Number of processed events/min

  2. Latency: End-to-end processing time

  3. Success rate: % of events with vector vs without

  4. Error rate: LLM, embedding, BigQuery errors

Cloud Monitoring:
- Request count (Cloud Run)
- Request latency p50/p95/p99
- Error rate by type
- BigQuery insert operations
- Cloud SQL connection pool

Development & Roadmap

Current Limitations

  • No retry logic for transient errors

  • Single region (us-central1)

  • No batching for embeddings

  • No event deduplication

Potential Improvements

1

Performance

  • Embedding batching (5-10 at a time)

  • Caching summaries for identical content

  • Connection pooling optimization

2

Reliability

  • Dead Letter Queue for failed messages

  • Exponential backoff retry

  • Circuit breaker for AI Platform

3

Features

  • Multi-language summaries

  • Custom embedding fine-tuning

  • Real-time similarity alerts

FAQ

Q: Why do we save records without embeddings?

A: We preserve complete activity history. Embeddings can be generated later via batch processing.

Q: How often are embeddings generated?

A: Real-time, with every event from Report API.

Q: Can the embedding model be changed?

A: Yes, but requires migration of existing vectors (dimensionality change).

Q: How long do we retain data?

A: BigQuery retention policy (currently: unlimited).


Version: 1.0.0 Date: July 30, 2025 Maintainer: Development Team