Admin Panel

Manage URLs to crawl and monitor system status with advanced controls

Add URL to Crawl

Enter the URL of a directory listing page (e.g., Apache directory index)

Default password: admin123

GitHub Actions Crawler

GitHub Actions Integration

Crawling now runs via GitHub Actions workflows, bypassing Vercel serverless limitations. This provides better reliability and handles authentication issues.

How it works:

  • • Add URLs to the crawl queue using the form on the left
  • • Click "Start GitHub Crawl" to trigger workflows
  • • GitHub Actions will crawl URLs and store results in database
  • • Files appear in 2-3 minutes after workflow completion
  • • Check GitHub repository Actions tab for progress
Setup Instructions

To use the WebScraper, ensure these are configured:

  1. 1. Create a Supabase project at supabase.com
  2. 2. Add SUPABASE_URL environment variable in Vercel dashboard
  3. 3. Run the SQL schema from supabase-schema.sql in your Supabase SQL Editor
  4. 4. Check the system status above for confirmation
GitHub Actions Integration

Manual Crawling (Available Now):

  1. 1. Go to GitHub Actions
  2. 2. Click "🕷️ Crawl and Store Files" → "Run workflow"
  3. 3. Enter URL to crawl and configure options:
  4. Recursive crawling: Enable to crawl subdirectories
  5. Max depth: Set crawl depth (1-5 levels deep)
  6. 4. Click "Run workflow" - files appear in 2-3 minutes

🆕 New Recursive Crawling Features:

  • Folder Discovery: Automatically finds and crawls subdirectories
  • Depth Control: Set how deep to crawl (prevents infinite loops)
  • Smart Detection: Recognizes folders vs files in directory listings
  • Error Handling: Continues crawling even if some folders fail
  • Performance: Limits subdirectories per level to prevent timeouts

Recommended settings: Depth 2-3 for most sites, Depth 1 for large sites

Troubleshooting

If new crawled data doesn't appear:

  • • Add Supabase environment variables in Vercel Dashboard
  • NEXT_PUBLIC_SUPABASE_URL
  • NEXT_PUBLIC_SUPABASE_ANON_KEY
  • • Redeploy after adding variables
🚀 Open GitHub Actions

Manage Crawled URLs

💡 How to delete: Check the boxes next to URLs you want to delete, then click "Delete Selected". Make sure you've entered your admin password in the form above first.

No URLs found. Add some URLs to get started.