Admin Panel
Manage URLs to crawl and monitor system status with advanced controls
Add URL to Crawl
GitHub Actions Crawler
GitHub Actions Integration
Crawling now runs via GitHub Actions workflows, bypassing Vercel serverless limitations. This provides better reliability and handles authentication issues.
How it works:
- • Add URLs to the crawl queue using the form on the left
- • Click "Start GitHub Crawl" to trigger workflows
- • GitHub Actions will crawl URLs and store results in database
- • Files appear in 2-3 minutes after workflow completion
- • Check GitHub repository Actions tab for progress
Setup Instructions
To use the WebScraper, ensure these are configured:
- 1. Create a Supabase project at supabase.com
- 2. Add
SUPABASE_URL
environment variable in Vercel dashboard - 3. Run the SQL schema from
supabase-schema.sql
in your Supabase SQL Editor - 4. Check the system status above for confirmation
GitHub Actions Integration
Manual Crawling (Available Now):
- 1. Go to GitHub Actions
- 2. Click "🕷️ Crawl and Store Files" → "Run workflow"
- 3. Enter URL to crawl and configure options:
- • Recursive crawling: Enable to crawl subdirectories
- • Max depth: Set crawl depth (1-5 levels deep)
- 4. Click "Run workflow" - files appear in 2-3 minutes
🆕 New Recursive Crawling Features:
- • Folder Discovery: Automatically finds and crawls subdirectories
- • Depth Control: Set how deep to crawl (prevents infinite loops)
- • Smart Detection: Recognizes folders vs files in directory listings
- • Error Handling: Continues crawling even if some folders fail
- • Performance: Limits subdirectories per level to prevent timeouts
Recommended settings: Depth 2-3 for most sites, Depth 1 for large sites
Troubleshooting
If new crawled data doesn't appear:
- • Add Supabase environment variables in Vercel Dashboard
- •
NEXT_PUBLIC_SUPABASE_URL
- •
NEXT_PUBLIC_SUPABASE_ANON_KEY
- • Redeploy after adding variables
Manage Crawled URLs
💡 How to delete: Check the boxes next to URLs you want to delete, then click "Delete Selected". Make sure you've entered your admin password in the form above first.
No URLs found. Add some URLs to get started.