Skip to content

Duplicate Detection

When you save a new idea, NeuralRepo checks your existing ideas for potential duplicates using vector similarity. This keeps your repository clean without requiring you to remember every idea you have ever captured.

Duplicate detection runs automatically as part of the idea creation pipeline:

  1. A new idea is created and queued for processing.
  2. The queue worker generates an embedding using the @cf/baai/bge-m3 model.
  3. The embedding is compared against all existing idea embeddings in Cloudflare Vectorize.
  4. If an existing idea scores above the dedup threshold (default 0.75), a duplicate_detections record is created with status pending.
  5. If an existing idea scores above the related threshold (default 0.5) but below the dedup threshold, a related relation is created automatically instead.

The dashboard shows a Duplicates badge in the navigation when pending detections exist. Click it to open the duplicates review panel, which displays pairs of ideas side by side with their similarity score.

Every pending detection must be resolved with one of two actions: dismiss or merge.

If the detection is a false positive — the ideas are similar but distinct — dismiss it. The detection is marked as dismissed and will not appear again.

Terminal window
POST /api/v1/ideas/duplicates/1/dismiss

If the ideas are truly duplicates, merge them. Merging combines both ideas into one and cleans up the duplicate.

The merge process:

  1. Bodies are concatenated — the absorbed idea’s body is appended to the kept idea’s body, separated by a horizontal rule.
  2. Tags are unioned — all tags from both ideas are combined (up to the 20-tag limit).
  3. A supersedes relation is created — the kept idea supersedes the absorbed idea.
  4. The absorbed idea is archived — it is removed from active views but retained in the database for history.
Terminal window
# Merge: keep idea 42, absorb idea 58
nrepo merge 42 58

The dedup threshold controls how similar two ideas must be to trigger a duplicate detection. Adjust it in your user settings.

SettingRangeDefaultEffect
dedup_threshold0.1 - 0.90.75Higher = fewer detections, only near-exact matches
related_threshold0.1 - 0.90.50Ideas above this but below dedup get a related relation
Terminal window
curl -X PATCH https://neuralrepo.com/api/v1/user/me \
-H "X-API-Key: nrp_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"settings_json": {"dedup_threshold": 0.80}}'