Knowledge Base Troubleshooting

Fix common issues with document uploads, web crawling, and processing. This guide covers failed documents, stuck processing, and retrieval problems.

Check status: Open your Knowledge Base to view document statuses and error messages.

Document Status Quick Reference

Status	Badge	Meaning	Action
Pending	Clock (gray)	Queued	Wait
Processing	Spinner (blue)	In progress	Wait
Completed	Checkmark (green)	Ready	None
Failed	X (red)	Error	Fix & Retry
Stuck	Warning (yellow)	Timed out	Retry

Failed Documents

Viewing Error Details

Find the failed document in your knowledge base
Click the info icon (ℹ️) next to the error badge
View the full error message and metadata

Retrying Failed Documents

Click the Retry button next to the failed document
The document will re-process from the beginning
Monitor status until completed

Common Failure Causes

"No content extracted"

Cause: Document has no extractable text.

Solutions:

PDF: Ensure it's text-based, not scanned images
URL: Check if page content loads without JavaScript
Empty file: Verify file actually contains content

"Embedding failed" / "OpenAI API error"

Cause: Failed to generate vector embeddings.

Solutions:

Wait a few minutes and retry (temporary API issues)
Check if document contains valid text
Contact support if persists

"Invalid file format"

Cause: Unsupported or corrupted file.

Solutions:

Convert to supported format (PDF, DOCX, TXT, MD, CSV)
Re-export the file from original application
Check file isn't password-protected

"File too large"

Cause: File exceeds 50 MB limit.

Solutions:

Split into smaller files
Remove embedded images/media
Compress the document

"Failed to crawl URL"

Cause: Could not fetch web page.

Solutions:

Verify URL is accessible in browser
Check if site blocks crawlers (robots.txt)
Ensure URL doesn't require login
Try a different page from the same site

Stuck Documents

Documents in "Processing" for more than 10 minutes are automatically marked as "Stuck" with a yellow warning badge.

Why Documents Get Stuck

Cause	Likelihood	Solution
Server restart during processing	Common	Retry
Embedding API timeout	Occasional	Retry
Very large document	Occasional	Split file, retry
Database connection lost	Rare	Retry

Fixing Stuck Documents

Identify: Look for yellow "Stuck" badge
Retry: Click the Retry button
Monitor: Watch for completion
Split: If fails again, try splitting the document

Automatic Cleanup

The system runs a background cleanup process that:

Checks for documents stuck > 30 minutes
Marks them as "Failed" for manual retry
This happens automatically, no action needed

Upload Issues

"Upload failed"

Check:

File size under 50 MB?
Supported format (PDF, DOCX, TXT, MD, CSV)?
Internet connection stable?
File not corrupted?

"Processing takes forever"

Normal processing times:

Document Type	Expected Time
Small file (<1 MB)	5-15 seconds
Medium file (1-10 MB)	30-90 seconds
Large file (10-50 MB)	2-5 minutes

If exceeding these times:

Large files take longer - be patient
If stuck > 10 minutes, you'll see "Stuck" badge
Retry if stuck

"Document shows 0 chunks"

Cause: No content was extracted from the document.

Solutions:

Check document isn't empty
For PDFs: Ensure text-based, not scanned images
For URLs: Verify content loads without JavaScript
Try a different document to test

Web Crawling Issues

"Sitemap not found"

Solutions:

Try direct sitemap URL: https://site.com/sitemap.xml
Check robots.txt for sitemap location
Use single URL mode instead

"Page blocked by robots.txt"

Cause: Website disallows crawling.

Solutions:

Respect the site's wishes (it's their policy)
Use file upload instead
Contact site owner for permission

"No content extracted from URL"

Causes:

JavaScript-rendered content (SPA)
Page requires login
Content in iframes

Solutions:

Try a different page
Use file upload for this content
Export page to PDF and upload

"Crawl rate limited"

Cause: Too many requests too fast.

Solutions:

Increase crawl delay (default is 2 seconds)
Reduce number of pages per crawl
Wait and retry later

Retrieval Issues

"Agent doesn't find information"

Check:

Document processed? Status should be "Completed"
Chunks exist? Check chunk count > 0
RAG configured? Agent has knowledge base linked
Question matches content? Test with exact phrases from doc

"Wrong information retrieved"

Solutions:

Lower minScore threshold (try 0.5)
Increase topK (retrieve more chunks)
Improve document quality
Use more specific document content

See RAG Integration for configuration.

"Too much irrelevant information"

Solutions:

Raise minScore threshold (try 0.8)
Decrease topK (retrieve fewer chunks)
Split documents by topic
Remove unrelated content

Performance Issues

Slow Processing

For file uploads:

Split large files into smaller ones
Remove unnecessary images/media
Use simpler formats (TXT vs PDF)

For web crawling:

Reduce pages per crawl
Use single URL mode for testing
Increase crawl delay

High Failure Rate

If many documents are failing:

Check for common pattern in errors
Verify account has valid API keys
Try one document at a time
Contact support if systematic issue

Error Messages Reference

Error	Meaning	Fix
`No content extracted`	Empty or unreadable	Check file content
`Embedding failed`	API error	Retry later
`Invalid file format`	Unsupported type	Convert file
`File too large`	>50 MB	Split file
`Crawl failed`	URL unreachable	Verify URL
`Rate limited`	Too many requests	Wait & retry
`Processing timed out`	Took too long	Retry
`Connection error`	Network issue	Check connection

Getting Help

Before Contacting Support

Gather this information:

Document ID (shown in error details)
Error message (full text)
File type and size
Steps to reproduce
Screenshot of the issue

Self-Service Options

Retry: Many issues resolve with a simple retry
Re-upload: Delete and re-upload the document
Alternative format: Try PDF → TXT conversion
Split files: Break large documents into parts

Contact Support

If issues persist:

Email: [email protected]
Include: Error details, document ID, screenshots

Best Practices to Avoid Issues

Document Preparation

Use text-based PDFs (not scans)
Keep files under 10 MB when possible
Use clear headings and structure
Test with one file before bulk upload

Web Crawling

Start with single URL to test
Verify pages load without JavaScript
Respect robots.txt and rate limits
Select only relevant pages

Monitoring

Check status after uploads/crawls
Retry failed documents promptly
Review chunk counts for expected values
Test RAG retrieval periodically

Upload Documents - File upload guide
Web Crawling - URL import guide
Document Processing - Processing pipeline
RAG Integration - Configure retrieval

Still having issues? Open the Knowledge Base to check your document statuses, or contact support at [email protected].

Knowledge Base Troubleshooting

Fix common issues with document uploads, web crawling, and processing. This guide covers failed documents, stuck processing, and retrieval problems.

Check status: Open your Knowledge Base to view document statuses and error messages.

Document Status Quick Reference

Status	Badge	Meaning	Action
Pending	Clock (gray)	Queued	Wait
Processing	Spinner (blue)	In progress	Wait
Completed	Checkmark (green)	Ready	None
Failed	X (red)	Error	Fix & Retry
Stuck	Warning (yellow)	Timed out	Retry

Failed Documents

Viewing Error Details

Find the failed document in your knowledge base
Click the info icon (ℹ️) next to the error badge
View the full error message and metadata

Retrying Failed Documents

Click the Retry button next to the failed document
The document will re-process from the beginning
Monitor status until completed

Common Failure Causes

"No content extracted"

Cause: Document has no extractable text.

Solutions:

PDF: Ensure it's text-based, not scanned images
URL: Check if page content loads without JavaScript
Empty file: Verify file actually contains content

"Embedding failed" / "OpenAI API error"

Cause: Failed to generate vector embeddings.

Solutions:

Wait a few minutes and retry (temporary API issues)
Check if document contains valid text
Contact support if persists

"Invalid file format"

Cause: Unsupported or corrupted file.

Solutions:

Convert to supported format (PDF, DOCX, TXT, MD, CSV)
Re-export the file from original application
Check file isn't password-protected

"File too large"

Cause: File exceeds 50 MB limit.

Solutions:

Split into smaller files
Remove embedded images/media
Compress the document

"Failed to crawl URL"

Cause: Could not fetch web page.

Solutions:

Verify URL is accessible in browser
Check if site blocks crawlers (robots.txt)
Ensure URL doesn't require login
Try a different page from the same site

Stuck Documents

Documents in "Processing" for more than 10 minutes are automatically marked as "Stuck" with a yellow warning badge.

Why Documents Get Stuck

Cause	Likelihood	Solution
Server restart during processing	Common	Retry
Embedding API timeout	Occasional	Retry
Very large document	Occasional	Split file, retry
Database connection lost	Rare	Retry

Fixing Stuck Documents

Identify: Look for yellow "Stuck" badge
Retry: Click the Retry button
Monitor: Watch for completion
Split: If fails again, try splitting the document

Automatic Cleanup

The system runs a background cleanup process that:

Checks for documents stuck > 30 minutes
Marks them as "Failed" for manual retry
This happens automatically, no action needed

Upload Issues

"Upload failed"

Check:

File size under 50 MB?
Supported format (PDF, DOCX, TXT, MD, CSV)?
Internet connection stable?
File not corrupted?

"Processing takes forever"

Normal processing times:

Document Type	Expected Time
Small file (<1 MB)	5-15 seconds
Medium file (1-10 MB)	30-90 seconds
Large file (10-50 MB)	2-5 minutes

If exceeding these times:

Large files take longer - be patient
If stuck > 10 minutes, you'll see "Stuck" badge
Retry if stuck

"Document shows 0 chunks"

Cause: No content was extracted from the document.

Solutions:

Check document isn't empty
For PDFs: Ensure text-based, not scanned images
For URLs: Verify content loads without JavaScript
Try a different document to test

Web Crawling Issues

"Sitemap not found"

Solutions:

Try direct sitemap URL: https://site.com/sitemap.xml
Check robots.txt for sitemap location
Use single URL mode instead

"Page blocked by robots.txt"

Cause: Website disallows crawling.

Solutions:

Respect the site's wishes (it's their policy)
Use file upload instead
Contact site owner for permission

"No content extracted from URL"

Causes:

JavaScript-rendered content (SPA)
Page requires login
Content in iframes

Solutions:

Try a different page
Use file upload for this content
Export page to PDF and upload

"Crawl rate limited"

Cause: Too many requests too fast.

Solutions:

Increase crawl delay (default is 2 seconds)
Reduce number of pages per crawl
Wait and retry later

Retrieval Issues

"Agent doesn't find information"

Check:

Document processed? Status should be "Completed"
Chunks exist? Check chunk count > 0
RAG configured? Agent has knowledge base linked
Question matches content? Test with exact phrases from doc

"Wrong information retrieved"

Solutions:

Lower minScore threshold (try 0.5)
Increase topK (retrieve more chunks)
Improve document quality
Use more specific document content

See RAG Integration for configuration.

"Too much irrelevant information"

Solutions:

Raise minScore threshold (try 0.8)
Decrease topK (retrieve fewer chunks)
Split documents by topic
Remove unrelated content

Performance Issues

Slow Processing

For file uploads:

Split large files into smaller ones
Remove unnecessary images/media
Use simpler formats (TXT vs PDF)

For web crawling:

Reduce pages per crawl
Use single URL mode for testing
Increase crawl delay

High Failure Rate

If many documents are failing:

Check for common pattern in errors
Verify account has valid API keys
Try one document at a time
Contact support if systematic issue

Error Messages Reference

Error	Meaning	Fix
`No content extracted`	Empty or unreadable	Check file content
`Embedding failed`	API error	Retry later
`Invalid file format`	Unsupported type	Convert file
`File too large`	>50 MB	Split file
`Crawl failed`	URL unreachable	Verify URL
`Rate limited`	Too many requests	Wait & retry
`Processing timed out`	Took too long	Retry
`Connection error`	Network issue	Check connection

Getting Help

Before Contacting Support

Gather this information:

Document ID (shown in error details)
Error message (full text)
File type and size
Steps to reproduce
Screenshot of the issue

Self-Service Options

Retry: Many issues resolve with a simple retry
Re-upload: Delete and re-upload the document
Alternative format: Try PDF → TXT conversion
Split files: Break large documents into parts

Contact Support

If issues persist:

Email: [email protected]
Include: Error details, document ID, screenshots

Best Practices to Avoid Issues

Document Preparation

Use text-based PDFs (not scans)
Keep files under 10 MB when possible
Use clear headings and structure
Test with one file before bulk upload

Web Crawling

Start with single URL to test
Verify pages load without JavaScript
Respect robots.txt and rate limits
Select only relevant pages

Monitoring

Check status after uploads/crawls
Retry failed documents promptly
Review chunk counts for expected values
Test RAG retrieval periodically

Upload Documents - File upload guide
Web Crawling - URL import guide
Document Processing - Processing pipeline
RAG Integration - Configure retrieval

Still having issues? Open the Knowledge Base to check your document statuses, or contact support at [email protected].

Knowledge Base Troubleshooting

Document Status Quick Reference

Failed Documents

Viewing Error Details

Retrying Failed Documents

Common Failure Causes

"No content extracted"

"Embedding failed" / "OpenAI API error"

"Invalid file format"

"File too large"

"Failed to crawl URL"

Stuck Documents

Why Documents Get Stuck

Fixing Stuck Documents

Automatic Cleanup

Upload Issues

"Upload failed"

"Processing takes forever"

"Document shows 0 chunks"

Web Crawling Issues

"Sitemap not found"

"Page blocked by robots.txt"

"No content extracted from URL"

"Crawl rate limited"

Retrieval Issues

"Agent doesn't find information"

"Wrong information retrieved"

"Too much irrelevant information"

Performance Issues

Slow Processing

High Failure Rate

Error Messages Reference

Getting Help

Before Contacting Support

Self-Service Options

Contact Support

Best Practices to Avoid Issues

Document Preparation

Web Crawling

Monitoring

Related Documentation

Knowledge Base Troubleshooting

Document Status Quick Reference

Failed Documents

Viewing Error Details

Retrying Failed Documents

Common Failure Causes

"No content extracted"

"Embedding failed" / "OpenAI API error"

"Invalid file format"

"File too large"

"Failed to crawl URL"

Stuck Documents

Why Documents Get Stuck

Fixing Stuck Documents

Automatic Cleanup

Upload Issues

"Upload failed"

"Processing takes forever"

"Document shows 0 chunks"

Web Crawling Issues

"Sitemap not found"

"Page blocked by robots.txt"

"No content extracted from URL"

"Crawl rate limited"

Retrieval Issues

"Agent doesn't find information"

"Wrong information retrieved"

"Too much irrelevant information"

Performance Issues

Slow Processing

High Failure Rate

Error Messages Reference

Getting Help

Before Contacting Support

Self-Service Options

Contact Support

Best Practices to Avoid Issues

Document Preparation

Web Crawling