Extract Domain from URL¶

Zero-dependency Python snippets for extracting the domain from URLs using the standard library.

6 snippets available in this sub-category.

Simple¶

Extract netloc (host:port) from URL¶

url extract domain netloc host web

Extract netloc (host:port) from URL

from urllib.parse import urlparse

url = 'https://sub.example.com:8080/path?query=1'
parsed = urlparse(url)
print(parsed.netloc)  # sub.example.com:8080

Notes

netloc includes subdomains and port if present

Extract host (domain) only (without port)¶

url extract domain host web

Extract host (domain) only

from urllib.parse import urlparse

url = 'https://sub.example.com:8080/path'
parsed = urlparse(url)
host = parsed.hostname
print(host)  # sub.example.com

Notes

hostname omits port and brackets for IPv6

Extract port from URL¶

url extract port web

Extract port from URL

from urllib.parse import urlparse

url = 'https://example.com:8443/path'
parsed = urlparse(url)
port = parsed.port
print(port)  # 8443

Notes

Returns None if no port is specified

Complex¶

Extract registered domain (TLD split, basic)¶

url extract domain tld registered web

Extract registered domain (basic TLD split)

from urllib.parse import urlparse

def get_registered_domain(url):
    """Extract the registered domain (e.g., example.com) from a URL."""
    host = urlparse(url).hostname or ''
    parts = host.split('.')
    if len(parts) >= 2:
        return '.'.join(parts[-2:])
    return host

# Example usage
print(get_registered_domain('https://sub.example.co.uk'))  # co.uk (basic, not public suffix aware)
print(get_registered_domain('https://sub.example.com'))    # example.com

Notes

This is a basic split; for public suffix awareness, use tldextract (third-party)

Extract domain from IPv6 URL¶

url extract domain ipv6 netloc web

Extract domain from IPv6 URL

from urllib.parse import urlparse

url = 'http://[2001:db8::1]:8080/index.html'
parsed = urlparse(url)
print(parsed.hostname)  # 2001:db8::1
print(parsed.netloc)    # [2001:db8::1]:8080

Notes

hostname omits brackets for IPv6
netloc includes brackets and port

Edge Cases¶

Extract from URL with missing netloc or malformed URL¶

url extract domain edge-case malformed web

Extract from URL with missing netloc or malformed URL

from urllib.parse import urlparse

url = 'not-a-url'
parsed = urlparse(url)
print(parsed.netloc)  # ''
print(parsed.hostname)  # None

Notes

Returns empty string or None if not present

🔗 Cross-References¶

Reference: See 📂 Parse URL
Reference: See 📂 Validate URL
Reference: See 📂 Build URL

🏷️ Tags¶

url, extract, domain, host, port, tld, ipv6, edge-case, web, netloc

📝 Notes¶

Use urlparse for robust domain extraction
For public suffix/registered domain, use tldextract (third-party) for accuracy
Always validate URLs before extracting domains