← 返回
AI智能 中文

Zotero Vectorize

Build and maintain a cross-platform local Zotero semantic index using metadata embeddings and PDF full-text chunk embeddings. Use when the user asks to vecto...
利用元数据嵌入和PDF全文分块嵌入,构建并维护跨平台本地Zotero语义索引。
yckbz
AI智能 clawhub v0.1.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 554
下载
💾 62
安装
1
版本
#latest

概述

Zotero Vectorize

Build and maintain a local-first, cross-platform Zotero vector store for semantic search and RAG over bibliographic metadata and PDF full text.

Keep SKILL.md focused on workflow. Read the reference files only when needed:

  • references/config.md — paths, environment variables, output layout
  • references/data-format.md — JSON schemas and file naming
  • references/windows.md / macos.md / linux.md — platform-specific path defaults and notes
  • references/troubleshooting.md — common failures and recovery

Core rules

  • Treat Zotero as read-only input. Never modify the user’s Zotero database or attachment storage.
  • Prefer creating a database snapshot before reading.
  • For incremental updates: check first, report missing items, wait for user confirmation, then apply.
  • Before any update that rewrites store files: back up first, then write.
  • Backup retention for this skill is fixed: keep only the latest and previous backup per file.
  • Default output filenames are:
  • metadata_vectors.json
  • fulltext_vectors.json
  • vector_store_metadata.json

Workflow decision tree

1) Detect or confirm paths

If the Zotero data directory, database path, or storage path is unknown:

  1. Read references/config.md
  2. Read the platform-specific reference (windows.md, macos.md, or linux.md)
  3. Run:
python scripts/detect_zotero_paths.py

If the detected paths are wrong, ask the user to open Zotero and use Show Data Directory, then rerun with explicit --data-dir, --db, or --storage-dir.

2) Create a database snapshot

Before full builds or incremental checks, snapshot the Zotero database:

python scripts/snapshot_zotero_db.py --output-dir <store-dir>

If snapshotting fails because SQLite is locked, ask the user to close Zotero and retry.

3) Build the metadata vector store

Use this when the user asks to create or rebuild metadata embeddings for the Zotero library.

python scripts/build_metadata_vectors.py --output-dir <store-dir>

This writes metadata_vectors.json and refreshes vector_store_metadata.json + README.md.

4) Build the full-text vector store

Use this when the user asks to create or rebuild PDF full-text embeddings.

python scripts/build_fulltext_vectors.py --output-dir <store-dir>

This scans Zotero PDF attachments, extracts text, chunks it, embeds each chunk, and writes fulltext_vectors.json.

5) Check incremental updates

Use this when the user asks whether Zotero contains new items not yet added to the vector store.

python scripts/check_incremental_updates.py --output-dir <store-dir>

Report:

  • total top-level Zotero items
  • total PDF-parent items
  • current metadata/fulltext vector counts
  • missing metadata items
  • missing fulltext items

Do not update the store yet.

6) Apply incremental updates

Only run this after the user confirms the update.

python scripts/apply_incremental_updates.py --output-dir <store-dir>

This script:

  1. snapshots the DB
  2. backs up store files
  3. appends missing metadata/fulltext entries
  4. keeps only the latest and previous backup per file
  5. updates store metadata and README

Use --item-id to limit the update to specific items if the user wants a partial apply.

7) Verify the finished store

After any build or incremental update, verify counts and sizes:

python scripts/verify_vector_store.py --output-dir <store-dir>

Always report:

  • metadata item count
  • fulltext item count
  • fulltext chunk count
  • metadata file size
  • fulltext file size

Scripts

  • scripts/detect_zotero_paths.py — resolve default/current Zotero paths
  • scripts/snapshot_zotero_db.py — create a safe SQLite snapshot
  • scripts/build_metadata_vectors.py — full rebuild of metadata vectors
  • scripts/build_fulltext_vectors.py — full rebuild of PDF full-text vectors
  • scripts/check_incremental_updates.py — compare Zotero against current vector store
  • scripts/apply_incremental_updates.py — append missing items after user confirmation
  • scripts/backup_with_retention.py — back up store files and retain only the latest two states
  • scripts/verify_vector_store.py — report counts, sizes, and store metadata

Output expectations

When using this skill successfully, return concise operational summaries such as:

  • detected paths
  • snapshot path used
  • number of items/chunks written
  • current file sizes
  • whether any items are missing
  • which itemIDs were appended during incremental update

Escalation notes

Read references/troubleshooting.md when:

  • SQLite snapshot fails
  • HuggingFace/model download or local model loading fails
  • PDFs are missing or unreadable
  • full-text extraction is incomplete
  • file paths differ from defaults on the current OS

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-29 19:45 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 710 📥 243,653
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,353 📥 317,923
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 834 📥 212,935