Which characters are acceptable in a URL host?

There are three solutions to consider. The first solution directs you to the Restrictions on valid host names. The second solution simply states that only specific characters are allowed, with a reference to a relevant webpage. The third solution suggests that validation depends on the level at which it occurs, whether before or after URL escaping. If validation takes place after all the escaping and “punycode” is done, there is no need for validation as the old RFC already guarantees that it only contains valid characters.

Question:

As I code for URL processing, I aim to ensure that there are no unusual scenarios that I have overlooked.

Besides A-Z, 0-9, “-“, and “.”, what other characters are considered valid for a host?

This pertains to anything that can exist in subdomains or any content found between the characters :// and the initial /.

Thanks!


Solution 1:

Kindly review the limitations imposed on acceptable host names.

A hostname is a concatenation of labels separated by dots. All hostnames, like “en.wikipedia.org”, follow this format. Each label must be 1-63 characters long, while the entire hostname must not exceed 255 characters.

According to RFCs, the labels of a hostname are restricted to utilizing only the ASCII letters ‘a’ through ‘z’ (in a case-insensitive manner), the digits ‘0’ through ‘9’, and the hyphen. It is not allowed for hostname labels to commence or terminate with a hyphen, and any other symbols, punctuation marks, or blank spaces are prohibited.


Solution 2:

no, that is all that is allowed

If you’re interested in reading, there’s a reference available at http://www.ietf.org/rfc/rfc1034.txt.


Solution 3:


The extent of validation (prior or subsequent to URL escaping) determines the range of user input validation which can include large portions of Unicode beyond ASCII characters.

Refer to the webpage on Internationalized domain name at http://en.wikipedia.org/wiki/Internationalized_domain_name.

Validating the output after the completion of escaping and “punycode” is redundant since the previous RFC has already ensured that the resulting string only contains valid characters.


Solution 4:

It’s important to remember that DNS systems have the freedom to create names beyond the Internet’s hostname rules. DNS servers have the capability to accept and respond to requests in 8-bit binary format, without violating the DNS wire protocol.

It is possible to have varying regulations for internal
lan url
entities, like the usage of underscores in host names.

Frequently Asked Questions

Posted in Uncategorized