API ReferencePage Content API

Content API

The Content API allows you to extract structured data from any webpage, including metadata, links, and raw HTML.

Endpoint

https://api.capturekit.dev/content

Example Request

GET https://api.capturekit.dev/content?access_key=<your-access-key>&url=https://capturekit.dev

Response

{
	"success": true,
	"data": {
		"metadata": {
			"title": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API",
			"description": "CaptureKit is a powerful API for capturing screenshots, extracting HTML, gathering links, and summarizing content—all with a simple request.",
			"favicon": "https://capturekit.dev/favicon.ico",
			"ogImage": "https://capturekit-assets.s3.amazonaws.com/capturekit-og+(1).png"
		},
		"links": {
			"internal": [
				"https://capturekit.dev/",
				"https://capturekit.dev/dashboard",
				"https://capturekit.dev/pricing",
				"https://capturekit.dev/blog"
			],
			"external": [
				"https://docs.capturekit.dev",
				"https://zapier.com/apps/capturekit-website-screenshots-p/integrations",
				"https://www.nextupkit.com"
			],
			"social": [
				"https://github.com/CaptureKit-Web-Scraping-API",
				"https://x.com/capturekit"
			]
		},
		"html": "<html><body><h1>Hello, world!</h1></body></html>",
		"markdown": "CaptureKit - Turn any website into a screenshot with our powerful Screenshot API...",
		"sitemap": {
			"source": "https://capturekit.dev/sitemap.xml",
			"totalLinks": 3,
			"links": [
				"https://www.capturekit.dev/",
				"https://www.capturekit.dev/page-content"
				"https://www.capturekit.dev/ai"
			]
		}
	}
}

Parameters

url string Required
The URL of the webpage to capture.


access_key string Required
Your API access key. Can be provided via the access_key query parameter, x-access-key header, or request body.


include_html boolean Optional Defaults to false
Include the raw HTML of the webpage in the response.


include_markdown boolean Optional Defaults to false
Include the Markdown of the webpage in the response.


include_html boolean Optional Defaults to false
Include the raw HTML of the webpage in the response.


use_defuddle boolean Optional Defaults to false
Use Defuddle to clean and extract the main content from web pages. This popular library removes unnecessary elements like comments, sidebars, headers, footers, and other non-essential elements, leaving only the primary content. When enabled, the HTML response will be processed through Defuddle before being returned.


delay number Optional Defaults to 0
Delay in seconds before capturing the screenshot (max 10s).


wait_until string Optional
Define when to capture (networkidle2, load, domcontentloaded, networkidle0).


wait_for_selector string Optional
Wait for a specific element to appear before taking the screenshot.


selector string Optional
Capture a specific element on the page instead of the full viewport.


remove_selectors string Optional
A comma-separated list of elements to hide before capturing (e.g., ads, popups).


block_urls string Optional
Comma-separated list of URL patterns to block (e.g., “analytics,tracking,advertisement”). You can specify URLs, domains, or simple patterns like “.example.com/”.


block_resources string Optional
Comma-separated list of resource types to block (e.g., “image,stylesheet,font”). Available resource types: document, stylesheet, image, media, font, script, texttrack, xhr, fetch, eventsource, websocket, manifest, other. Useful for optimizing page loading speed before capturing web content.


proxy string Optional
Specify a proxy server to route your request through. Supports HTTP, HTTPS, and SOCKS5 proxies. Format: http://username:password@proxy.com:PORT. Useful for bypassing geo-restrictions and rotating IPs.


cache boolean Optional Defaults to false
Cache the response.


cache_ttl number Optional Defaults to 2592000
Cache the response for a custom TTL (in seconds). Maximum 2592000 seconds (1 month), minimum 3600 seconds (1 hour).


remove_cookie_banners boolean Optional Defaults to false
Automatically remove cookie banners before capturing.


viewport_width number Optional Defaults to 1280
The width of the browser viewport in pixels.


viewport_height number Optional Defaults to 1024
The height of the browser viewport in pixels.