String

This chapter introduces ES6's transformation and enhancement of strings, and the next chapter introduces new methods of string objects.

Unicode representation of characters

ES6 strengthens the support for Unicode, allowing the use of \uxxxx to represent a character, where xxxx represents the Unicode code point of the character.

"\u0061";
// "a"

However, this notation is limited to characters whose code points are between \u0000~\uFFFF. Characters outside this range must be represented in the form of two double bytes.

"\uD842\uDFB7";
// "𠮷"

"\u20BB7";
// "7"

The above code means that if you directly follow \u with a value exceeding 0xFFFF (such as \u20BB7), JavaScript will be interpreted as \u20BB+7. Since \u20BB is a non-printable character, only a space is displayed, followed by a 7.

ES6 has made improvements to this point, as long as the code point is placed in curly brackets, the character can be interpreted correctly.

"\u{20BB7}";
// "𠮷"

"\u{41}\u{42}\u{43}";
// "ABC"

let hello = 123;
hello; // 123

"\u{1F680}" === "\uD83D\uDE80";
// true

In the above code, the last example shows that the brace notation is equivalent to the four-byte UTF-16 encoding.

With this notation, JavaScript has 6 ways to represent a character.

"z" === "z"; // true
"\172" === "z"; // true
"\x7A" === "z"; // true
"\u007A" === "z"; // true
"\u{7A}" === "z"; // true

String iterator interface

ES6 adds an iterator interface to strings (see the chapter "Iterator" for details), so that strings can be traversed by for...of loops.

for (let codePoint of "foo") {
  console.log(codePoint);
}
// "f"
// "o"
// "o"

In addition to traversing the string, the biggest advantage of this traverser is that it can identify code points larger than 0xFFFF, which cannot be identified by the traditional for loop.

let text = String.fromCodePoint(0x20bb7);

for (let i = 0; i < text.length; i++) {
  console.log(text[i]);
}
// ""
// ""

for (let i of text) {
  console.log(i);
}
// "𠮷"

In the above code, the string text has only one character, but the for loop will think it contains two characters (neither can be printed), and the for...of loop will correctly recognize this character.

Enter U+2028 and U+2029 directly

JavaScript strings allow direct input of characters, as well as escaped forms of input characters. For example, the Unicode code point of "中" is U+4e2d, you can directly enter this Chinese character in the string, or you can enter its escaped form \u4e2d, the two are equivalent.

"" === "\u4e2d"; // true

However, JavaScript stipulates that there are 5 characters, which cannot be used directly in the string, only the escape form can be used.

-U+005C: Backslash (reverse solidus) -U+000D: carriage return -U+2028: line separator -U+2029: paragraph separator -U+000A: line feed (line feed)

For example, a backslash cannot be directly included in a string, and it must be escaped and written as \\ or \u005c.

There is no problem with this rule itself. The trouble is that the JSON format allows U+2028 (line separator) and U+2029 (paragraph separator) to be used directly in the string. In this way, the JSON output by the server is parsed by JSON.parse, and an error may be reported directly.

const json = '"\u2028"';
JSON.parse(json); // may report an error

The JSON format has been frozen (RFC 7159) and cannot be modified. In order to eliminate this error, ES2019 allows JavaScript strings to be directly input U+2028 (line separator) and U+2029 (paragraph separator).

const PS = eval("'\u2029'");

According to this proposal, the above code will not report an error.

Note that the template string now allows direct input of these two characters. In addition, regular expressions still do not allow direct input of these two characters, which is no problem, because JSON does not allow direct inclusion of regular expressions.

JSON.stringify() transformation

According to the standard, JSON data must be UTF-8 encoded. However, the current JSON.stringify() method may return a string that does not conform to the UTF-8 standard.

Specifically, the UTF-8 standard stipulates that the code points between 0xD800 and 0xDFFF cannot be used alone, but must be paired. For example, \uD834\uDF06 is two code points, but they must be paired together to represent the character 𝌆. This is to represent a workaround for characters whose code points are greater than 0xFFFF. It is illegal to use the two code points \uD834 and \uDFO6 alone, or to reverse the order, because there is no corresponding character in \uDF06\uD834.

The problem with JSON.stringify() is that it may return a single code point between 0xD800 and 0xDFFF.

JSON.stringify("\u{D834}"); // "\u{D834}"

In order to ensure that valid UTF-8 characters are returned, ES2019 changed the behavior of JSON.stringify(). If it encounters a single code point between 0xD800 and 0xDFFF, or a non-existent pairing form, it will return an escaped string and leave it to the application to decide the next step.

JSON.stringify("\u{D834}"); // ""\\uD834""
JSON.stringify("\uDF06\uD834"); // ""\\udf06\\ud834""

Template string

In the traditional JavaScript language, the output template is usually written like this (the jQuery method is used below).

$("#result").append(
  "There are <b>" +
    basket.count +
    "</b> " +
    "items in your basket, " +
    "<em>" +
    basket.onSale +
    "</em> are on sale!"
);

The above writing method is quite cumbersome and inconvenient. ES6 introduces template strings to solve this problem.

$("#result").append(`
  There are <b>${basket.count}</b> items
   in your basket, <em>${basket.onSale}</em>
  are on sale!
`);

The template string is an enhanced version of the string, identified by backticks (`). It can be used as an ordinary string, it can also be used to define a multi-line string, or embed variables in the string.

// ordinary string
`In JavaScript'\n' is a line-feed.` // Multi-line string
`In JavaScript this is
 not legal.`;

console.log(`string text line 1
string text line 2`);

// Embed variables in the string
let name = "Bob",
  time = "today";
`Hello ${name}, how are you ${time}?`;

The template strings in the above code are all represented by backticks. If you need to use backticks in the template string, use a backslash to escape it.

let greeting = `\`Yo\` World!`;

If you use a template string to represent a multi-line string, all spaces and indentation will be preserved in the output.

$("#list").html(`
<ul>
  <li>first</li>
  <li>second</li>
</ul>
`);

In the above code, all spaces and newlines of the template string are reserved, for example, there will be a newline before the <ul> tag. If you don't want this newline, you can use the trim method to eliminate it.

$("#list").html(
  `
<ul>
  <li>first</li>
  <li>second</li>
</ul>
`.trim()
);

To embed variables in the template string, you need to write the variable name in ${}.

function authorize(user, action) {
  if (!user.hasPrivilege(action)) {
    throw new Error(
      // The traditional way of writing is
      //'User'
      // + user.name
      // + 'is not authorized to do'
      // + action
      // +'.'
      `User ${user.name} is not authorized to do ${action}.`
    );
  }
}

Arbitrary JavaScript expressions can be placed inside the braces, operations can be performed, and object properties can be referenced.

let x = 1;
let y = 2;

`${x} + ${y} = ${
  x + y
}` // "1 + 2 = 3"
`${x} + ${y * 2} = ${x + y * 2}`;
// "1 + 4 = 5"

let obj = { x: 1, y: 2 };
`${obj.x + obj.y}`;
// "3"

Functions can also be called in the template string.

function fn() {
  return "Hello World";
}

`foo ${fn()} bar`;
// foo Hello World bar

If the value in the braces is not a string, it will be converted to a string according to the general rules. For example, if there is an object in the braces, the object's toString method will be called by default.

If the variable in the template string is not declared, an error will be reported.

// Variable place is not declared
let msg = `Hello, ${place}`;
// report an error

Since the inside of the braces of the template string is to execute JavaScript code, if there is a string inside the braces, it will be output as it is.

`Hello ${"World"}`;
// "Hello World"

Template strings can even be nested.

const tmpl = (addrs) => `
  <table>
  ${addrs
    .map(
      (addr) => `
    <tr><td>${addr.first}</td></tr>
    <tr><td>${addr.last}</td></tr>
  `
    )
    .join("")}
  </table>
`;

In the above code, another template string is embedded in the variable of the template string. The usage is as follows.

const data = [
  { first: "<Jane>", last: "Bond" },
  { first: "Lars", last: "<Croft>" },
];

console.log(tmpl(data));
// <table>
//
// <tr><td><Jane></td></tr>
// <tr><td>Bond</td></tr>
//
// <tr><td>Lars</td></tr>
// <tr><td><Croft></td></tr>
//
// </table>

If you need to quote the template string itself and execute it when needed, you can write it as a function.

let func = (name) => `Hello ${name}!`;
func("Jack"); // "Hello Jack!"

In the above code, the template string is written as the return value of a function. Executing this function is equivalent to executing this template string.

Example: template compilation

Next, let's look at an example of generating a formal template through a template string.

let template = `
<ul>
  <% for(let i=0; i <data.supplies.length; i++) {%>
    <li><%= data.supplies[i] %></li>
  <%} %>
</ul>
`;

The above code places a regular template in the template string. This template uses <%...%> to place JavaScript code and <%= ... %> to output JavaScript expressions.

How to compile this template string?

One idea is to convert it into a JavaScript expression string.

echo("<ul>");
for (let i = 0; i < data.supplies.length; i++) {
  echo("<li>");
  echo(data.supplies[i]);
  echo("</li>");
}
echo("</ul>");

This conversion uses regular expressions.

let evalExpr = /<%=(.+?)%>/g;
let expr = /<%([\s\S]+?)%>/g;

template = template
  .replace(evalExpr, "`); \n echo( $1 ); \n echo(`")
  .replace(expr, "`); \n $1 \n echo(`");

template = "echo(`" + template + "`);";

Then, wrap the template in a function and return it.

let script = `(function parse(data){
  let output = "";

  function echo(html){
    output += html;
  }

  ${template}

  return output;
})`;

return script;

Assemble the above content into a template compilation function compile.

function compile(template) {
  const evalExpr = /<%=(.+?)%>/g;
  const expr = /<%([\s\S]+?)%>/g;

  template = template
    .replace(evalExpr, "`); \n echo( $1 ); \n echo(`")
    .replace(expr, "`); \n $1 \n echo(`");

  template = "echo(`" + template + "`);";

  let script = `(function parse(data){
    let output = "";

    function echo(html){
      output += html;
    }

    ${template}

    return output;
  })`;

  return script;
}

The usage of the compile function is as follows.

let parse = eval(compile(template));
div.innerHTML = parse({ supplies: ["broom", "mop", "cleaner"] });
// <ul>
// <li>broom</li>
// <li>mop</li>
// <li>cleaner</li>
// </ul>

Label template

The function of template strings is not just the above. It can immediately follow the name of a function that will be called to process the template string. This is called the "tagged template" function.

alert`hello`;
// Equivalent to
alert(["hello"]);

The label template is not actually a template, but a special form of function call. "Label" refers to a function, and the template string immediately following it is its parameter.

However, if there are variables in the template characters, it is not a simple call. Instead, the template string will be processed into multiple parameters before calling the function.

let a = 5;
let b = 10;

tag`Hello ${a + b} world ${a * b}`;
// Equivalent to
tag(["Hello", "world", ""], 15, 50);

In the above code, there is a tag name tag in front of the template string, which is a function. The return value of the entire expression is the return value of the tag function after processing the template string.

The function tag receives multiple parameters in turn.

function tag(stringArr, value1, value2) {
  // ...
}

// Equivalent to

function tag(stringArr, ...values) {
  // ...
}

The first parameter of the tag function is an array. The members of the array are those parts of the template string that are not replaced by variables. That is to say, variable replacement only occurs between the first member and the second member of the array. Between the second member and the third member, and so on.

The other parameters of the tag function are the values ​​of each variable of the template string after being replaced. In this example, the template string contains two variables, so tag will receive two parameters, value1 and value2.

The actual values ​​of all parameters of the tag function are as follows.

-The first parameter: ['Hello', 'world',''] -Second parameter: 15 -The third parameter: 50

In other words, the tag function is actually called in the following form.

tag(["Hello", "world", ""], 15, 50);

We can write the code of the tag function as needed. The following is a way to write the tag function, and the result of the operation.

let a = 5;
let b = 10;

function tag(s, v1, v2) {
  console.log(s[0]);
  console.log(s[1]);
  console.log(s[2]);
  console.log(v1);
  console.log(v2);

  return "OK";
}

tag`Hello ${a + b} world ${a * b}`;
// "Hello"
// "world"
// ""
// 15
// 50
// "OK"

Here is a more complicated example.

let total = 30;
let msg = passthru`The total is ${total} (${total * 1.05} with tax)`;

function passthru(literals) {
  let result = "";
  let i = 0;

  while (i < literals.length) {
    result += literals[i++];
    if (i < arguments.length) {
      result += arguments[i];
    }
  }

  return result;
}

msg; // "The total is 30 (31.5 with tax)"

The above example shows how to put each parameter back together according to the original position.

The passthru function uses the rest parameter to be written as follows.

function passthru(literals, ...values) {
  let output = "";
  let index;
  for (index = 0; index < values.length; index++) {
    output += literals[index] + values[index];
  }

  output += literals[index];
  return output;
}

An important application of "tag templates" is to filter HTML strings to prevent users from entering malicious content.

let message = SaferHTML`<p>${sender} has sent you a message.</p>`;

function SaferHTML(templateData) {
  let s = templateData[0];
  for (let i = 1; i < arguments.length; i++) {
    let arg = String(arguments[i]);

    // Escape special characters in the substitution.
    s += arg.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">");

    // Don't escape special characters in the template.
    s += templateData[i];
  }
  return s;
}

In the above code, the sender variable is often provided by the user. After being processed by the SaferHTML function, the special characters inside will be escaped.

let sender = '<script>alert("abc")</script>'; // Malicious code
let message = SaferHTML`<p>${sender} has sent you a message.</p>`;

message;
// <p><script>alert("abc")</script> has sent you a message.</p>

Another application of label templates is multilingual conversion (internationalization processing).

i18n`Welcome to ${siteName}, you are visitor number ${visitorNumber}!`;
// "Welcome to xxx, you are the xxxx visitor!"

The template string itself cannot replace template libraries such as Mustache, because there is no conditional judgment and loop processing function, but through the label function, you can add these functions yourself.

// The following hashTemplate function
// is a custom template processing function
let libraryHtml = hashTemplate`
  <ul>
    #for book in ${myBooks}
      <li><i>#{book.title}</i> by #{book.author}</li>
    #end
  </ul>
`;

In addition, you can even use tag templates to embed other languages ​​in the JavaScript language.

jsx`
  <div>
    <input
      ref='input'
      onChange='${this.handleChange}'
      defaultValue='${this.state.value}' />
      ${this.state.value}
   </div>
`;

The above code uses the jsx function to convert a DOM string into a React object. You can find the [specific implementation] of the jsx function on GitHub (https://gist.github.com/lygaret/a68220defa69174bdec5).

The following is a hypothetical example of running Java code in JavaScript code through the java function.

java`
class HelloWorldApp {
  public static void main(String[] args) {
    System.out.println("Hello World!"); // Display the string.
  }
}
`;
HelloWorldApp.main();

The first parameter of the template processing function (the template string array), there is also a raw attribute.

console.log`123`;
// ["123", raw: Array[1]]

In the above code, the parameters accepted by console.log are actually an array. The array has a raw attribute, which stores the original string after escape.

Please see the example below.

tag`First line\nSecond line`;

function tag(strings) {
  console.log(strings.raw[0]);
  // strings.raw[0] is "First line\\nSecond line"
  // Print out "First line\nSecond line"
}

In the above code, the first parameter strings of the tag function has a raw attribute, which also points to an array. The members of this array are exactly the same as the strings array. For example, the strings array is ["First line\nSecond line"], then the strings.raw array is ["First line\\nSecond line"]. The only difference between the two is that the slashes in the string are all escaped. For example, the strings.raw array will treat \n as the two characters \\ and n instead of a newline character. This is designed to facilitate access to the original template before escaping.

Limitations of template strings

As mentioned earlier, other languages ​​can be embedded in the label template. However, the template string will escape the string by default, making it impossible to embed other languages.

For example, LaTEX language can be embedded in the label template.

function latex(strings) {
  // ...
}

let document = latex`
\newcommand{\fun}{\textbf{Fun!}} // works normally
\newcommand{\unicode}{\textbf{Unicode!}} // report an error
\newcommand{\xerxes}{\textbf{King!}} // report an error

Breve over the h goes \u{h}ere // report an error
`;

In the above code, the template string embedded in the variable document is completely legal for the LaTEX language, but the JavaScript engine will report an error. The reason lies in the escaping of the string.

The template string will escape \u00FF and \u{42} as Unicode characters, so an error will be reported when \unicode is parsed; and \x56 will be converted as a hexadecimal string Righteousness, so \xerxes will report an error. In other words, \u and \x have special meanings in LaTEX, but JavaScript escapes them.

In order to solve this problem, ES2018 relaxed restricts escaping of strings in label templates. If an illegal string escape is encountered, it will return undefined instead of reporting an error, and the original string can be obtained from the raw property.

function tag(strs) {
  strs[0] === undefined;
  strs.raw[0] === "\\unicode and \\u{55}";
}
tag`\unicode and \u{55}`;

In the above code, the template string was originally supposed to report an error, but since the restriction on string escaping was relaxed, no error was reported. The JavaScript engine sets the first character to undefined, but the raw property still works Get the original string, so the tag function can still process the original string.

Note that this relaxation of string escaping takes effect only when the label template parses the string. If it is not a label template, an error will still be reported.

let bad = `bad escape sequence: \unicode`; // report an error