Your personal RAG (Retrieval-Augmented Generation) document library backed by Google Drive.
Supports multiple Google Drive folders dynamically, interactive folder routing, incremental sync, choosing between Gemini or OpenAI for embeddings, and connecting to Qdrant.
FIRST verify that the required environment variables are set in /workspace/skills/filechat/.env:
EMBEDDING_PROVIDER (either gemini or openai)GEMINI_API_KEY or OPENAI_API_KEYQDRANT_URL and QDRANT_API_KEY (If absent, it uses local disk-based JSON).Create the .env file like this:
echo "EMBEDDING_PROVIDER=gemini" > ./skills/filechat/.env
echo "GEMINI_API_KEY=your_key_here" >> ./skills/filechat/.env
Google Workspace Authentication:
Before running any commands, check if the system is authenticated by running:
npx @googleworkspace/cli auth status
If it returns an auth error or indicates no token, you MUST prompt the user to authenticate. Trigger the interactive login flow:
npx @googleworkspace/cli auth login --services drive
Wait for the user to complete the browser OAuth flow before proceeding.
The user can have infinite folders synced. You manage them using folders.js.
cd ./skills/filechat && node folders.js listnode folders.js add "Taxes 2026" (Auto-discovers the ID via gws drive files list if you don't know it!)node folders.js default "Taxes 2026"If the user asks to do something with a file/folder but doesn't specify which one, run node folders.js get-default to find the default ID. If no folders exist, ask them to set one up!
When the user asks to "sync", "flush", or "update", you must run the ingestion script.
To sync a specific folder:
cd ./skills/filechat && node sync.js <FOLDER_ID>
To sync EVERYTHING (all folders in the registry):
cd ./skills/filechat && node sync-all.js
Note: Syncs are highly incremental and use a local cache! If a file hasn't been modified in Drive, the script will skip it instantly and output "0 chunks" embedded. This is NORMAL behavior. If you are debugging, testing, or the user specifically requests a hard flush, you MUST delete the cache files first:
rm ./skills/filechat/meta_<FOLDER_ID>.json
rm ./skills/filechat/vector_db_<FOLDER_ID>.json
Query the local vector store or Qdrant for the target Folder ID to fetch relevant text chunks:
cd ./skills/filechat && node query.js <FOLDER_ID> "What does my medical discharge say?"
Use the snippets returned to answer the user.
Find the File ID using the query script, then download it:
gws drive files get --params '{"fileId": "<FILE_ID>", "alt": "media"}' --output /workspace/discharge.pdf
Reply using the media tag: MEDIA:/workspace/discharge.pdf.
If the user uploads a file and asks you to save it (or implicitly sends a file per your automatic processing rules):
node folders.js list). gws:```bash
gws drive files create --json '{"name": "filename.pdf", "parents": ["
```
node sync.js so the vector database chunks and embeds the file into the corresponding vectordb.If the user asks you to verify the skill is working, or if you just set it up and want to ensure end-to-end functionality, follow these exact steps:
npx @googleworkspace/cli auth status. Ensure it shows a valid token.```bash
npx @googleworkspace/cli drive files list --params '{"q": "'\''
```
(If this fails, check folder permissions or GWS credentials.)
```bash
rm -f ./skills/filechat/meta_
node ./skills/filechat/sync.js
```
(You should see files being downloaded, OCR'd, and chunks being embedded. If it says "0 chunks", verify the folder isn't empty.)
```bash
node ./skills/filechat/query.js
```
(You should see a list of "Top matches" with similarity scores and text snippets. If you do, the RAG pipeline is 100% operational!)
共 1 个版本