What Is URL Extraction?
URL extraction is the process of scanning a block of text and pulling out all web addresses it contains. Whether the URLs include a full protocol like https:// or are bare domains like example.com, this tool finds and lists every one of them instantly.
This is useful for auditing links in documents, extracting references from articles, building link lists from web pages, and verifying URLs in code or configuration files. Everything runs entirely in your browser with no server processing.
How to Use This Tool
Enter Your Text
Type directly into the input editor, paste content with Ctrl+V, or upload/drag a .txt file containing text with URLs.
Toggle Unique Only
Enable the Unique only checkbox to remove duplicate URLs from the results. Disable it to see every occurrence.
Review Extracted URLs
Extracted URLs appear instantly in the output, one per line. The count and domain breakdown update in real-time as you type.
Copy or Download
Use Copy to copy all extracted URLs to clipboard, Download to save as a .txt file, or Clear to reset.
Features Explained
Protocol & Bare Domain Detection
▼
This tool extracts full URLs with any protocol scheme including http://, https://, ftp://, ws://, wss://, and compound schemes like git+ssh://. It also detects bare domains like example.com or docs.google.com/spreadsheets. Bare domain detection supports 60+ top-level domains including .com, .org, .net, .io, .dev, .ai, .co.uk, and many more.
Smart Punctuation Handling
▼
URLs at the end of sentences often have trailing punctuation like periods, commas, or closing parentheses. The tool intelligently strips trailing punctuation while preserving balanced parentheses inside URLs like Wikipedia links.
Duplicate Removal
▼
The Unique only checkbox deduplicates extracted URLs so each address appears only once in the output. This is useful when processing documents where the same link appears in multiple places.
Domain Breakdown
▼
When URLs are found, a statistics panel shows the count of URLs per domain. This gives you a quick overview of which sites are most referenced in your text.
Real-Time Extraction
▼
URLs are extracted instantly as you type or paste text. The extraction is memoized for performance, so only changes to the input or unique toggle trigger recalculation.
File Upload & Drag and Drop
▼
Upload a .txt file using the Upload button or drag and drop a text file directly onto the input area. Files up to 5MB are supported.
Who Is This Tool For?
SEO Specialists
Audit internal and external links in web pages, blog posts, and content to ensure link integrity and optimize site structure.
Content Creators
Extract all references and sources from articles, research documents, and notes to build citation lists and resource pages.
Developers
Pull URLs from log files, configuration files, API responses, and code comments for testing, migration, or debugging.
Researchers
Collect all referenced links from academic papers, reports, and web pages for literature reviews and source verification.
QA & Testers
Extract URLs from test documents and specifications to verify that all links are valid and pointing to the correct destinations.
Project Managers
Gather all resource links from meeting notes, project documents, and email threads into a clean, organized list.
Supported URL Formats
| Format | Example |
|---|---|
| HTTPS with path | https://example.com/path/to/page |
| HTTP with path | http://www.example.com/page |
| FTP link | ftp://files.example.com/pub/data.zip |
| WebSocket | ws://socket.example.com/chat |
| Secure WebSocket | wss://secure.example.com/stream |
| Compound scheme | git+ssh://git@example.com:repo/project.git |
| With query string | https://example.com/search?q=test&lang=en |
| With fragment | https://example.com/docs#section |
| With port | https://example.com:8080/api |
| Subdomain | https://docs.google.com/spreadsheets |
| Country-code TLD | https://example.co.uk/page |
| IP address | http://192.168.1.1:3000/api |
| With parentheses | https://en.wikipedia.org/wiki/URL_(disambiguation) |
| Bare domain | google.com |
| Bare with www | www.github.com |
| Bare with path | stackoverflow.com/questions/12345 |
| Bare subdomain | docs.google.com/spreadsheets |
Bare domains are validated against 60+ known top-level domains. Any scheme:// URL is extracted (http, https, ftp, ws, wss, git+ssh, etc.). Schemes without :// like mailto:, tel:, and data: are not extracted.
Tips for Extracting URLs
Paste entire web pages
Copy the full text of a web page (Ctrl+A, Ctrl+C) and paste it here. The tool will find all URLs buried in the content, navigation, and footer sections.
Process HTML source
Paste raw HTML source code to extract all href and src URLs. The tool will pull URLs from attributes, inline styles, and script references.
Use Unique only for clean lists
When auditing links or building resource lists, enable Unique only to automatically remove duplicates and get a clean set of distinct URLs.
Check the domain breakdown
The domain breakdown panel helps you quickly see which sites are most referenced. Useful for SEO audits, link analysis, and content reviews.
Handles trailing punctuation
URLs at the end of sentences are cleaned automatically. Trailing periods, commas, and unbalanced parentheses are stripped while preserving valid URL characters.
Bare domains detected
You don't need full https:// URLs. The tool also detects bare domains like google.com and www.example.com written casually in text.
Privacy & Security
This tool runs 100% in your browser. Your text and extracted URLs are never uploaded to any server. All extraction and filtering happens locally using JavaScript.
Your input is stored only in your browser's local storage so it persists when you refresh the page. You can clear it at any time using the “Clear” button. No cookies are used, no analytics track your text content, and no third-party services have access to what you type.