HTML Introduction

Overview

HTML is the language used by web pages and defines the structure and content of web pages. When the browser accesses the website, it actually downloads the HTML code from the server, and then renders the web page.

The full name of HTML is "HyperText Markup Language" (HyperText Markup Language), invented by the physicist Tim Berners-Lee of CERN in the 1990s. Its biggest feature is that it supports hyperlinks, and you can jump to other web pages by clicking on the link, thus forming the entire Internet.

In 1999, HTML version 4.01 was released and became a widely accepted HTML standard. In 2014, HTML 5 was released, which is the version currently in use.

Browser web development involves three technologies: HTML, CSS and JavaScript. HTML language defines the structure and content of web pages, CSS style sheets defines the style of web pages, and JavaScript language defines the interactive behavior of web pages and users. HTML language is the basis of web page development. CSS and JavaScript are based on HTML to take effect. Even without these two, HTML itself can be used and basic content display can be completed. This tutorial only introduces the HTML language.

Below is the HTML source code of a simple web page.

<!DOCTYPE html>
<html lang="zh-CN">
  <head>
    <meta charset="utf-8" />
    <title>Web page title</title>
  </head>
  <body>
    <p>Hello World</p>
  </body>
</html>

The above code can be saved as a file hello.html. When the browser opens this local file, you can see the text "Hello World".

The "View page source" in the right-click menu of the browser can display the HTML source code of the current web page.

Basic concepts of web pages

Tags

The HTML code of a web page is composed of many different tags. To learn the HTML language is to learn the usage of various tags.

The following is an example of a label.

<title>Web page title</title>

In the above code, <title> and </title> are a pair of tags.

The tag is used to tell the browser how to process this code. The content of the tag is the content that the browser wants to render and display on the web page.

Tags are placed inside a pair of angle brackets (such as <title>). Most tags appear in pairs, divided into start and end tags. The end tag is preceded by a slash (such as </title>) ). However, there are some tags that are not used in pairs, but only start tags and no end tags, such as the <meta> tag in the example in the previous section.

<meta charset="utf-8" />

In the above code, the <meta> tag has no closing tag </meta>.

This kind of label used alone is usually because the label itself is sufficient to complete the function and does not require the content between the labels. In actual applications, they are mainly used to prompt the browser and do some special processing.

Tags can be nested.

<div><p>hello world</p></div>

In the above code, the <div> tag contains a <p> tag.

When nesting, the correct closing order must be ensured, and no cross-layer nesting is allowed, otherwise unexpected rendering results will occur.

<div><p>hello world</div></p>

The above code is wrong nesting, and the closing sequence is incorrect.

HTML tag names are not case sensitive. For example, <title> and <TITLE> are the same tag. However, it is common practice to use lowercase.

In addition, the HTML language ignores indentation and line breaks. The rendering results of the following writing methods are the same.

<title>Web page title</title>

<title>Page title</title>

<title>Web page Title</title>

Furthermore, the HTML code of the entire webpage can be written in one line, and the browser can still parse it, and the result is exactly the same. Therefore, before the official webpage is released, developers sometimes compress the source code into one line to reduce the number of bytes transmitted.

The style effects of various web pages, such as content indentation and line wrapping, are mainly realized by CSS.

Elements

When the browser renders a web page, it will parse the HTML source code into a tag tree, and each tag is a node of the tree. This kind of node is called a page element (element). Therefore, "tag" and "element" are basically synonymous, but they are used in different situations: tags are viewed from the source code point of view, and elements are viewed from the programming point of view. For example, the <p> tag corresponds to the p of a web page. element.

The nested tags constitute the hierarchical relationship of web page elements.

<div><p>hello world</p></div>

In the above code, the div element contains a p element. The upper element is also called the "parent element", and the lower element is also called the "child element", that is, div is the parent element of p, and p is the child element of div.

Block-level elements, inline elements

All elements can be divided into two categories: block-level elements (block) and inline elements (inline).

Block-level elements occupies a separate area by default, and will automatically start a new line on the web page, occupying 100% of the width.

<p>hello</p>
<p>world</p>

In the above code, the p element is a block-level element, so the browser will display the content in two lines.

Inline elements are on the same line as other elements by default, and no line breaks are generated. For example, span is an inline element, usually used to specify a special style for certain text.

<span>hello</span> <span>world</span>

In the above code, the span element is an inline element, so the browser will display two lines of content on one line.

Attributes

Attributes are additional information of the label, separated by spaces from the label name and other attributes.

<img src="demo.jpg" width="500" />

In the above code, the <img> tag has two attributes: src and width.

Attributes can use the equal sign to specify the attribute value. For example, demo.jpg in the above example is the attribute value of src. Attribute values ​​are generally enclosed in double quotation marks, which is not required, but it is always recommended to use double quotation marks.

Note that the attribute name is not case sensitive, onclick and onClick are the same attribute.

HTML provides a large number of attributes to customize the behavior of tags. For details, please refer to the chapter "Attributes of Elements".

Basic tags for web pages

Web pages that comply with HTML grammar standards should meet the following basic structure.

<!DOCTYPE html>
<html lang="zh-CN">
  <head>
    <meta charset="utf-8" />
    <title></title>
  </head>
  <body></body>
</html>

No matter how complicated a web page is, it is derived from the basic structure above.

As mentioned earlier, the indentation and line wrapping of HTML code has no effect on the browser. Therefore, the above code can be written in one line, and the rendering result remains unchanged. The above is written in separate lines just to improve readability.

Here are the main tags of this basic structure. They form the skeleton of the web page.

<!doctype>

The first tag of a webpage is usually <!doctype>, which indicates the document type and tells the browser how to parse the webpage.

Generally speaking, just simply declare the doctype as html like the following. The browser will process the web page in accordance with the rules of HTML 5.

<!DOCTYPE html>

Sometimes, the tag is completely capitalized to distinguish it from normal HTML tags. Because <!doctype> is not essentially a tag, but more like a processing instruction.

<!DOCTYPE html>

<html>

The <html> tag is the top-level container of the web page, that is, the top node of the tag tree structure, also known as the root element, and other elements are its child elements. A webpage can only have one <html> tag.

The lang attribute of this tag indicates the default language of the web page content.

<html lang="zh-CN"></html>

The above code indicates that the webpage is Chinese content. If the content is in English, zh-CN should be changed to en. For a more detailed introduction, see the chapter "Element Properties".

The <head> tag is a container tag used to place the meta information of the web page. Its content does not appear on the web page, but provides additional information for web page rendering.

<!DOCTYPE html>
<html>
  <head>
    <title>Web page title</title>
  </head>
</html>

<head> is the first child element of <html>. If the page does not contain <head>, the browser will automatically create one.

The child elements of <head> generally have the following seven, which will be introduced one by one later.

-<meta>: Set the metadata of the web page. -<link>: Link to external style sheet. -<title>: Set the title of the web page. -<style>: Place the embedded style sheet. -<script>: Introduce a script. -<noscript>: The content to be displayed when the browser does not support scripts. -<base>: Set the calculation base of the relative URL inside the webpage.

<meta>

The <meta> tag is used to set or describe the metadata of the web page, and it must be placed in the <head>. A <meta> tag is a piece of metadata, and a web page can have multiple <meta>s. The <meta> tag is agreed to be placed at the top of the content of <head>.

No matter what kind of webpage, the following two <meta> tags can generally be placed.

<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <title>Page Title</title>
</head>

In the above example, the first <meta> tag indicates that the webpage is encoded in UTF-8 format, and the second <meta> tag indicates that the webpage can be automatically scaled on the mobile phone.

The <meta> tag has five attributes, which will be introduced in turn below.

(1) charset attribute

The charset attribute of the <meta> tag is used to specify the encoding method of the web page. This attribute is very important. If it is set incorrectly, the browser may not be able to decode it correctly and it will display garbled characters.

<meta charset="utf-8" />

The above code declares that the web page is encoded in UTF-8. Although developers can use other encoding methods, the correct approach is almost always UTF-8.

Note that the encoding method declared here should be consistent with the actual encoding method of the web page, that is, if utf-8 is declared, the web page should be saved in UTF-8 encoding. If utf-8 is declared here, it actually uses another encoding (such as GB2312), which will not cause the browser to automatically transcode, and the web page may be displayed as garbled.

(2) name attribute, content attribute

The name attribute of the <meta> tag indicates the name of the metadata, and the content attribute indicates the value of the metadata. Used together, you can specify a piece of metadata for a web page.

<head>
  <meta name="description" content="HTML Language Introduction" />
  <meta name="keywords" content="HTML,tutorial" />
  <meta name="author" content="张三" />
</head>

The above code contains three metadata: description is the description of the webpage content, keywords is the keyword of the webpage content, and author is the author of the webpage.

There are many kinds of metadata, most of which involve the internal working mechanism of the browser or specific usage scenarios, so I won't introduce them one by one here. Here are some examples.

<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="application-name" content="Application Name" />
<meta name="generator" content="program" />
<meta name="subject" content="your document's subject" />
<meta name="referrer" content="no-referrer" />

(3) http-equiv attribute, content attribute

The http-equiv attribute of the <meta> tag is used to override the header information field of the HTTP response, and the content attribute is the content of the corresponding field. These two attributes are related to the HTTP protocol and belong to advanced usage, so I won't introduce them in detail here.

<meta http-equiv="Content-Security-Policy" content="default-src'self'" />

The above code can override the Content-Security-Policy field of the HTTP response.

Here are some other examples.

<meta http-equiv="Content-Type" content="Type=text/html; charset=utf-8" />
<meta http-equiv="refresh" content="30" />
<meta http-equiv="refresh" content="30;URL='http://website.com'" />

<title>

The <title> tag is used to specify the title of the web page, which will be displayed in the title bar of the browser window.

<head>
  <title>Web page title</title>
</head>

Search engines display the title of each web page based on this tag. It has a great influence on the ranking of webpages in search engines, and should be carefully arranged to reflect the theme of the webpage.

Inside the <title> tag, no other tags can be placed, only plain text without formatting can be placed.

<body>

The <body> tag is a container tag used to place the main content of the web page. The content of the page displayed by the browser is placed inside it. It is the second child element of <html>, immediately after <head>.

<html>
  <head>
    <title>Web page title</title>
  </head>
  <body>
    <p>hello world</p>
  </body>
</html>

Spaces and line breaks

HTML language has its own space handling rules. Blank spaces at the head and tail of the tag content are ignored.

<p>hello world</p>

In the above code, the space before hello and the space after world are ignored by the browser.

Multiple consecutive spaces (including the tab character \t) in the tag content will be merged into one by the browser.

<p>hello world</p>

In the above code, there are multiple consecutive spaces between hello and world, and the browser will merge them into one. The result of web page rendering is that there is only one space between hello and world.

The browser will also replace line breaks (\n) and carriage returns (\r) in the text with spaces.

<p>hello world</p>

In the above code, there are multiple line breaks between hello and world. The browser will replace them with spaces, and then merge the multiple spaces into one. The result of web page rendering is that there is a space between hello and world.

This means that the line breaks in the HTML source code will not produce line breaks.

Comment

The HTML code can contain comments, and the browser will automatically ignore the comments. The comment starts with <!-- and ends with -->. The following is an example of a comment.

<!-- This is a comment -->

The comment can be multi-line, and the internal HTML is no longer valid.

<!--
  <p>hello world</p>
-->

The above code is a commented block, the internal codes are invalid, the browser will not parse them, let alone render them.

Comments help understand the meaning of the code, and it is best to add comments before complex code blocks.