Skip to content

Semantic Search

NeuralRepo goes beyond keyword matching. Every idea is converted into a vector embedding that captures its meaning, enabling search by concept rather than exact words.

NeuralRepo uses the @cf/baai/bge-m3 model running on Cloudflare Workers AI:

PropertyValue
Model@cf/baai/bge-m3
Dimensions1024
Max input tokens8192
MultilingualYes (100+ languages)
RuntimeCloudflare Workers AI (on-device GPU)

The BGE-M3 model was chosen for its strong multilingual performance, high dimension count for nuanced similarity, and native availability on Cloudflare’s infrastructure with zero cold start.

Embeddings are stored in Cloudflare Vectorize with the following configuration:

SettingValue
MetricCosine similarity
Dimensions1024
Metadatauser_id, status, tags, created_at

Metadata is stored alongside each vector to enable filtered queries (e.g., search only within building status ideas).

When a new idea is created, the embedding process runs asynchronously:

Idea Created
Queue Message (idea_created)
Text Normalization
│ - Combine title + body
│ - Trim whitespace
│ - Collapse multiple spaces/newlines
│ - Truncate to 8,192 characters
Workers AI Embedding
│ - POST @cf/baai/bge-m3
│ - Returns 1024-dimension vector
Vectorize Upsert
│ - ID: idea_{user_id}_{idea_id}
│ - Vector: 1024 floats
│ - Metadata: user_id, status, tags
Duplicate Detection
│ - Query topK=5 for similar vectors
│ - Flag pairs above similarity threshold
Done

When you search for ideas, the query goes through a hybrid pipeline:

Search Query
├──────────────────┐
▼ ▼
FTS5 Keyword Search Vectorize Semantic Search
│ │ - Embed query text
│ │ - topK nearest neighbors
│ │ - Filter by user_id
│ │ - Optional: filter by status, tag
│ │
▼ ▼
└──────┬───────────┘
Rerank & Merge
│ - Combine FTS and vector results
│ - Deduplicate by idea ID
│ - Score: weighted blend of BM25 + cosine
Return Results
│ - Ordered by combined score
│ - Includes similarity score per result

The hybrid approach ensures that exact keyword matches rank highly while semantically similar ideas (different words, same meaning) are also surfaced.

When a new idea is embedded, NeuralRepo automatically checks for duplicates:

ThresholdAction
>= 0.75Flagged as potential duplicate — stored in duplicate_detections with pending status
0.50 - 0.74Auto-linked as related — a relation is created automatically
< 0.50No action

Duplicate detections are stored in the duplicate_detections table with the similarity score. Users can confirm (merge or link), dismiss, or ignore the detection.

Before embedding, text is normalized to improve consistency:

  1. Combine fields. Title and body are concatenated with a newline separator: {title}\n{body}.
  2. Trim. Leading and trailing whitespace is removed.
  3. Collapse whitespace. Multiple consecutive spaces or newlines are collapsed to a single space.
  4. Truncate. The combined text is truncated to 8,192 characters to fit within the model’s context window.

Ideas with only a title (no body) are still embedded. The title alone provides enough signal for meaningful similarity matching.

When an idea’s title or body is updated, a new embedding is generated:

  1. A idea_updated queue message is dispatched.
  2. The consumer re-normalizes and re-embeds the updated text.
  3. The existing Vectorize record is upserted (replaced) with the new vector.
  4. Duplicate detection runs again against the new embedding.

Metadata-only changes (status, tags) dispatch an idea_metadata_updated message that updates the Vectorize metadata without re-embedding.

The backfill_vectors queue message triggers a full re-indexing of all ideas for a user. This is used for:

  • Account recovery after data issues
  • Model upgrades (if the embedding model changes)
  • Manual re-indexing triggered by support
Terminal window
# Admin-only: trigger backfill for a user
curl -X POST https://neuralrepo.com/__admin/backfill-vectors \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"user_id": "user_abc123"}'