Web crawl
Turn a public URL into an Informly document by fetching the page, extracting the main text, and indexing it alongside your uploaded files.
Web crawl is the fastest way to bring a page on the open web into your Informly library. Instead of downloading a help article or product page and uploading it as a file, you hand Informly a URL and it does the rest. The result is a regular document — same status pipeline, same chunks, same chat behavior as anything you upload by hand.
This is the right choice for content that already lives on the web: help center articles, blog posts, public product pages, changelogs, and FAQs.
When to use a crawl instead of a file upload
| Use case | Pick |
|---|---|
| Public page on a site you control | Crawl |
| Article you copy-pasted from a doc on disk | File upload |
| Internal page behind a login | File upload or data source |
| Content that changes regularly | Crawl, then reprocess |
Run a crawl
Open the crawler
Go to Documents → Crawl URL.
Paste the URL
Enter the full URL, including https://. Informly fetches the page server-side, so the URL must be reachable from the public internet.
Set visibility
Choose public or private visibility, just like a file upload.
Start the crawl
Click Crawl. Informly fetches the page, extracts the main text, and queues the resulting document for processing.
The new document appears in the library immediately with a Pending badge and moves through the same Pending → Processing → Completed lifecycle as an uploaded file.
What gets extracted
Informly pulls out the main body content and discards site furniture — navigation, sidebars, footers, cookie banners, and most ads. What's kept is the article-like text a reader would actually want.
If a page renders most of its content with JavaScript after load, the extracted text may be sparse. In that case, copy the rendered text and create the document via paste text instead.
Multi-page sites
Informly crawls one URL per request today. To ingest an entire help center or documentation site, run a separate crawl for each page you want indexed. This keeps each page in its own document, which makes citations more precise and reprocessing cheaper.
If you have a long list of URLs to ingest, set aside an afternoon and work through them in batches. Each crawl is fast, and you can leave the library page and come back as the statuses update.
Keeping crawled pages fresh
Crawled documents do not auto-refresh. If the source page changes, the document in Informly stays on the version it had when you crawled it.
To refresh:
Open the document
Find it in the library and click into the detail page.
Click Reprocess
Informly re-fetches the original URL, extracts the latest text, and re-indexes it.
For content that changes weekly or daily, consider connecting a data source instead — those sync automatically.
Permissions and good behavior
Only crawl pages you own or have explicit permission to use. Pulling someone else's content into your AI without permission can violate their copyright and their site's terms of service. Stick to your own help center, your own docs, and content you've licensed.
If a site blocks Informly's crawler in robots.txt or returns an authentication wall, the crawl will fail with the reason shown on the document detail page.
When a crawl fails
Common failure modes and fixes:
| Cause | Fix |
|---|---|
| URL unreachable or returned an error | Confirm the URL loads in your browser, then retry. |
| Page is behind a login | Use file upload or a data source. |
| JavaScript-rendered content | Copy the rendered text and use paste text. |
| Page extracted but had no useful text | The page is mostly images or interactive — not a good fit. |
What's next
Upload files
Add PDFs, Word docs, and pasted text alongside crawled pages.
Manage the library
Reprocess, archive, and bulk-edit documents.
Connect a data source
Sync from systems like Google Drive automatically instead of crawling page by page.
Assign to a widget
Let a widget answer from your crawled pages.
Uploading documents
Add documents to your Informly library by uploading a file, pasting plain text, or crawling a URL, and follow the processing pipeline to completion.
Managing documents
Search the Informly document library, run bulk actions, reprocess and archive documents, manage visibility, and download original files.