Knowledge Base Troubleshooting
Fix common issues with document uploads, web crawling, and processing. This guide covers failed documents, stuck processing, and retrieval problems.
Check status: Open your Knowledge Base to view document statuses and error messages.
Document Status Quick Reference
| Status | Badge | Meaning | Action |
|---|---|---|---|
| Pending | Clock (gray) | Queued | Wait |
| Processing | Spinner (blue) | In progress | Wait |
| Completed | Checkmark (green) | Ready | None |
| Failed | X (red) | Error | Fix & Retry |
| Stuck | Warning (yellow) | Timed out | Retry |
Failed Documents
Viewing Error Details
- Find the failed document in your knowledge base
- Click the info icon (âšī¸) next to the error badge
- View the full error message and metadata
Retrying Failed Documents
- Click the Retry button next to the failed document
- The document will re-process from the beginning
- Monitor status until completed
Common Failure Causes
"No content extracted"
Cause: Document has no extractable text.
Solutions:
- PDF: Ensure it's text-based, not scanned images
- URL: Check if page content loads without JavaScript
- Empty file: Verify file actually contains content
"Embedding failed" / "OpenAI API error"
Cause: Failed to generate vector embeddings.
Solutions:
- Wait a few minutes and retry (temporary API issues)
- Check if document contains valid text
- Contact support if persists
"Invalid file format"
Cause: Unsupported or corrupted file.
Solutions:
- Convert to supported format (PDF, DOCX, TXT, MD, CSV)
- Re-export the file from original application
- Check file isn't password-protected
"File too large"
Cause: File exceeds 50 MB limit.
Solutions:
- Split into smaller files
- Remove embedded images/media
- Compress the document
"Failed to crawl URL"
Cause: Could not fetch web page.
Solutions:
- Verify URL is accessible in browser
- Check if site blocks crawlers (robots.txt)
- Ensure URL doesn't require login
- Try a different page from the same site
Stuck Documents
Documents in "Processing" for more than 10 minutes are automatically marked as "Stuck" with a yellow warning badge.
Why Documents Get Stuck
| Cause | Likelihood | Solution |
|---|---|---|
| Server restart during processing | Common | Retry |
| Embedding API timeout | Occasional | Retry |
| Very large document | Occasional | Split file, retry |
| Database connection lost | Rare | Retry |
Fixing Stuck Documents
- Identify: Look for yellow "Stuck" badge
- Retry: Click the Retry button
- Monitor: Watch for completion
- Split: If fails again, try splitting the document
Automatic Cleanup
The system runs a background cleanup process that:
- Checks for documents stuck > 30 minutes
- Marks them as "Failed" for manual retry
- This happens automatically, no action needed
Upload Issues
"Upload failed"
Check:
- File size under 50 MB?
- Supported format (PDF, DOCX, TXT, MD, CSV)?
- Internet connection stable?
- File not corrupted?
"Processing takes forever"
Normal processing times:
| Document Type | Expected Time |
|---|---|
| Small file (<1 MB) | 5-15 seconds |
| Medium file (1-10 MB) | 30-90 seconds |
| Large file (10-50 MB) | 2-5 minutes |
If exceeding these times:
- Large files take longer - be patient
- If stuck > 10 minutes, you'll see "Stuck" badge
- Retry if stuck
"Document shows 0 chunks"
Cause: No content was extracted from the document.
Solutions:
- Check document isn't empty
- For PDFs: Ensure text-based, not scanned images
- For URLs: Verify content loads without JavaScript
- Try a different document to test
Web Crawling Issues
"Sitemap not found"
Solutions:
- Try direct sitemap URL:
https://site.com/sitemap.xml - Check
robots.txtfor sitemap location - Use single URL mode instead
"Page blocked by robots.txt"
Cause: Website disallows crawling.
Solutions:
- Respect the site's wishes (it's their policy)
- Use file upload instead
- Contact site owner for permission
"No content extracted from URL"
Causes:
- JavaScript-rendered content (SPA)
- Page requires login
- Content in iframes
Solutions:
- Try a different page
- Use file upload for this content
- Export page to PDF and upload
"Crawl rate limited"
Cause: Too many requests too fast.
Solutions:
- Increase crawl delay (default is 2 seconds)
- Reduce number of pages per crawl
- Wait and retry later
Retrieval Issues
"Agent doesn't find information"
Check:
- Document processed? Status should be "Completed"
- Chunks exist? Check chunk count > 0
- RAG configured? Agent has knowledge base linked
- Question matches content? Test with exact phrases from doc
"Wrong information retrieved"
Solutions:
- Lower
minScorethreshold (try 0.5) - Increase
topK(retrieve more chunks) - Improve document quality
- Use more specific document content
See RAG Integration for configuration.
"Too much irrelevant information"
Solutions:
- Raise
minScorethreshold (try 0.8) - Decrease
topK(retrieve fewer chunks) - Split documents by topic
- Remove unrelated content
Performance Issues
Slow Processing
For file uploads:
- Split large files into smaller ones
- Remove unnecessary images/media
- Use simpler formats (TXT vs PDF)
For web crawling:
- Reduce pages per crawl
- Use single URL mode for testing
- Increase crawl delay
High Failure Rate
If many documents are failing:
- Check for common pattern in errors
- Verify account has valid API keys
- Try one document at a time
- Contact support if systematic issue
Error Messages Reference
| Error | Meaning | Fix |
|---|---|---|
No content extracted |
Empty or unreadable | Check file content |
Embedding failed |
API error | Retry later |
Invalid file format |
Unsupported type | Convert file |
File too large |
>50 MB | Split file |
Crawl failed |
URL unreachable | Verify URL |
Rate limited |
Too many requests | Wait & retry |
Processing timed out |
Took too long | Retry |
Connection error |
Network issue | Check connection |
Getting Help
Before Contacting Support
Gather this information:
- Document ID (shown in error details)
- Error message (full text)
- File type and size
- Steps to reproduce
- Screenshot of the issue
Self-Service Options
- Retry: Many issues resolve with a simple retry
- Re-upload: Delete and re-upload the document
- Alternative format: Try PDF â TXT conversion
- Split files: Break large documents into parts
Contact Support
If issues persist:
- Email: [email protected]
- Include: Error details, document ID, screenshots
Best Practices to Avoid Issues
Document Preparation
- Use text-based PDFs (not scans)
- Keep files under 10 MB when possible
- Use clear headings and structure
- Test with one file before bulk upload
Web Crawling
- Start with single URL to test
- Verify pages load without JavaScript
- Respect robots.txt and rate limits
- Select only relevant pages
Monitoring
- Check status after uploads/crawls
- Retry failed documents promptly
- Review chunk counts for expected values
- Test RAG retrieval periodically
Related Documentation
- Upload Documents - File upload guide
- Web Crawling - URL import guide
- Document Processing - Processing pipeline
- RAG Integration - Configure retrieval
Still having issues? Open the Knowledge Base to check your document statuses, or contact support at [email protected].