URL is the acronym for "Uniform Resource Locator", which is translated into "URL" in Chinese, which means the Internet address of various resources. Below is a typical URL.
As long as a resource can be accessed via the Internet, it must have a corresponding URL. One URL corresponds to one resource, but the same resource may correspond to multiple URLs.
URL is the foundation of the Internet. The Internet is "interconnected" because web pages can contain other URLs through "links". As long as the user clicks, he can jump from one URL to another and go to different websites.
Components of the URL
The URL consists of multiple parts. The following is a more complex URL, the actual URL usually does not have so many parts.
Let's take a look at the various parts of this URL.
The protocol is the method for the browser to request server resources. The above example is the part of
https://, which means that the HTTPS protocol is used.
The Internet supports multiple protocols. You must specify which protocol the URL uses. The default is HTTP protocol. In other words, if you omit the protocol and enter
www.example.com directly in the browser address bar, the browser will access
http://www.example.com by default. HTTPS is an encrypted version of HTTP. For security reasons, more and more websites use this protocol.
The protocol names of HTTP and HTTPS are followed by a colon and two slashes (
://). This is not necessarily the case for other protocols. The mail address protocol
mailto: has only a colon after the protocol name, such as
The host is the name of the website or server where the resource is located, also known as the domain name. The host of the above example is
Some hosts do not have a domain name but only an IP address, such as
192.168.2.15. This situation often occurs in local area networks.
The same domain name may contain multiple websites at the same time, and they are distinguished by ports. "Port" is an integer, which can be simply understood as the visitor tells the server which website they want to visit. The default port is 80. If this parameter is omitted, the server will return a port 80 website.
The port immediately follows the domain name, separated by a colon, such as
Path (path) is the location of the resource on the website. For example, the path
/path/index.html points to the web page file
index.html under the
/path subdirectory of the website.
In the early days of the Internet, paths were physical locations that actually existed. Now since the server can simulate these locations, the path is just a virtual location.
The path may only include the directory, not the file name, such as
/foo/, and even the trailing slash can be omitted. At this time, the server usually jumps to the
index.html file in the directory by default (that is, it is equivalent to requesting
/foo/index.html), but there may be other processing (such as listing all the files in the directory). File), it depends on the server settings. Generally speaking, when visiting the URL of
www.example.com, it is likely that the web file
www.example.com/index.html will be returned.
Query parameters (parameter) are additional information provided to the server. The position of the parameter is after the path, separated by
?, the above example is
There can be one or more groups of query parameters. Each set of parameters is in the form of a key-value pair, with a key name (key) and a key value (value) at the same time, and they are connected by an equal sign (
=). For example,
key1=value is a key-value pair,
key1 is the key name, and
value1 is the key value.
& to connect multiple sets of parameters, such as
The anchor is the anchor point inside the webpage. Use
# plus the anchor name and put it at the end of the URL, such as
#anchor. After the browser loads the page, it will automatically scroll to the anchor point.
The anchor name is named by the
id attribute of the web page element. For details, see the chapter "Element Properties".
Only the following characters can be used in the various components of the URL.
-26 English letters (both uppercase and lowercase)
-10 Arabic numerals
In addition, there are 18 characters that belong to the reserved characters of the URL, which can only appear in the given position. For example, the beginning of the query parameter is a question mark (
?), that is, the question mark can only appear at the beginning of the query parameter. It is illegal to appear in other positions and will cause URL parsing errors. If you want to use these reserved characters in other parts of the URL, you must use their escaped form.
The way to escape URL characters is to add a percent sign (
%) in front of the hexadecimal ASCII code of these characters. The following are these 18 characters and their escaped forms.
!: %21 -
#: %23 -
$: %24 -
&: %26 -
': %27 -
(: %28 -
): %29 -
*: %2A -
+: %2B -
,: %2C -
/: %2F -
:: %3A -
;: %3B -
=: %3D -
?: %3F -
@: %40 -
[: %5B -
For example, if the URL of a web page is
foo?bar.html, that is, the file contains a question mark, then it needs to be written as
The legal characters of URL can also be escaped in this way, but it is not recommended. For example, the hexadecimal ASCII code of the letter
61, and the escaped form is
www.apple.com can be written as
www.%61pple.com, which is recognized by the browser.
It is worth noting that the escape form of spaces is
%20. For those file names that contain spaces, this escaping is necessary.
Other characters that are neither legal nor reserved characters (such as Chinese characters), theoretically do not need to be manually escaped, and can be written directly in the URL, such as
www.example.com/中国.html, the browser will They are automatically escaped and sent to the server. The escape method is to use the hexadecimal UTF-8 encoding of these characters. Every two digits are counted as a group, and then a percent sign (
%) is added to the head of each group.
For example, the UTF-8 hexadecimal encoding of
in Chinese characters is
e4b8ad, every two characters are set, and the URL is escaped as
%e4%b8%ad. In other words, wherever there are Chinese characters
中 in the URL, it must be written as
%e4%b8%ad. Therefore, to visit the URL of
www.example.com/中国.html, it needs to be written as follows.
In the above code, the escape form of
Absolute URL and relative URL
There are two types of URLs: absolute URLs and relative URLs.
Absolute URL means that the location of a resource can be determined only by the URL itself. This means that the URL must contain the complete information of the resource, including the protocol, host, path, etc. The previous examples are absolute URLs.
Relative URL means that the URL does not contain all the information about the location of the resource. It must be combined with the location of the current web page to locate the resource. For example, the URL of the current webpage is
https://www.example.com/path/index.html, there is a resource on the webpage, and the URL points to
a.html, which is a relative URL. Because I only know
a.html, and cannot locate resources. The browser assumes that
a.html is in the same subdirectory as the current URL, thus obtaining the absolute URL
If a relative URL starts with a slash (
/), it means the root directory of the website. Otherwise, you must use the current directory as a starting point to calculate the location of the resource. For example, the relative URL
/foo/bar.html represents the subdirectory
foo of the website root directory, and
foo/bar.html represents the
foo subdirectory of the current directory.
URLs can also use two special abbreviations to indicate specific locations.
.: indicates the current directory, such as
a.html file in the current directory) -
..: indicates the parent directory, such as
../a.html (the file
a.html in the parent directory)
These two abbreviations can be used in multiples, for example
../../ means the upper two-level directory.
Absolute URLs can also use these two abbreviations. For example,
www.example.com/./index.html is equivalent to
www.example.com/index.html, and then
. is equivalent to the current directory of the root directory , The root directory itself.
<base> tag specifies the calculation basis for all relative URLs inside the web page. The entire webpage can only have one
<base> tag, and it can only be placed in
<head>. It is a label used alone, there is no closed label, the following is an example.
<head> <base href="https://www.example.com/files/" target="_blank" /> </head>
href attribute of the
<base> tag gives the calculated base URL, and the
target attribute gives instructions on how to open the link (see the chapter "Links"). The known calculation base is
https://www.example.com/files/, then the relative URL
foo.html can be converted into an absolute URL
Note that the
<base> tag must have at least one of the
href attribute or the
<base href="http://foo.com/app/" /> <base target="_blank" />
<base> is set, it is valid for the entire web page. If you want to change the behavior of a link, you can only use absolute links instead of relative links. Pay special attention to the anchor point. At this time, the anchor point is also calculated for
<base>, not for the URL of the current webpage.