Generator function asynchronous application

Asynchronous programming is too important for the JavaScript language. The execution environment of the JavaScript language is "single-threaded". If there is no asynchronous programming, it will not work at all and it must be stuck. This chapter mainly introduces how the Generator function completes asynchronous operations.

Traditional method

Before ES6 was born, there were about four methods for asynchronous programming.

  • Callback -Event monitoring -Publish/Subscribe -Promise object

The Generator function brings JavaScript asynchronous programming to a whole new stage.

basic concepts

Asynchronous

The so-called "asynchronous" simply means that a task is not completed continuously. It can be understood that the task is artificially divided into two segments. The first segment is executed first, and then other tasks are performed. When you are ready, you can go back and perform the first segment. Second paragraph.

For example, a task is to read a file for processing, and the first stage of the task is to send a request to the operating system to read the file. Then, the program performs other tasks, waits until the operating system returns the file, and then executes the second stage of the task (processing the file). This kind of discontinuous execution is called asynchronous.

Correspondingly, continuous execution is called synchronization. Because it is executed continuously and other tasks cannot be inserted, the program can only wait while the operating system reads files from the hard disk.

Callback

The realization of asynchronous programming in JavaScript language is the callback function. The so-called callback function is to write the second paragraph of the task separately in a function, and call this function directly when the task is re-executed. The English name of the callback function is callback, which literally translates to "recall".

Read the file for processing, it is written like this.

fs.readFile("/etc/passwd", "utf-8", function (err, data) {
  if (err) throw err;
  console.log(data);
});

In the above code, the third parameter of the readFile function is the callback function, which is the second paragraph of the task. The callback function will not be executed until the operating system returns the file /etc/passwd.

An interesting question is, why does Node agree that the first parameter of the callback function must be the error object err (if there is no error, the parameter is null)?

The reason is that the execution is divided into two stages. After the first stage is executed, the context of the task has ended. The errors thrown after this can no longer be captured in the original context and can only be used as parameters and passed into the second paragraph.

Promise

There is no problem with the callback function itself. Its problem arises in the nesting of multiple callback functions. Suppose that after reading the A file, then reading the B file, the code is as follows.

fs.readFile(fileA, "utf-8", function (err, data) {
  fs.readFile(fileB, "utf-8", function (err, data) {
    // ...
  });
});

It is not difficult to imagine that if more than two files are read in sequence, multiple nesting will occur. The code is not developed vertically, but horizontally, and it will soon become messy and unmanageable. Because multiple asynchronous operations form a strong coupling, as long as one operation needs to be modified, its upper callback function and lower callback function may have to be modified accordingly. This situation is called "callback hell" (callback hell).

The Promise object was proposed to solve this problem. It is not a new grammatical function, but a new way of writing that allows the nesting of callback functions to be changed into chained calls. Use Promise to read multiple files continuously, written as follows.

var readFile = require("fs-readfile-promise");

readFile(fileA)
  .then(function (data) {
    console.log(data.toString());
  })
  .then(function () {
    return readFile(fileB);
  })
  .then(function (data) {
    console.log(data.toString());
  })
  .catch(function (err) {
    console.log(err);
  });

In the above code, I used the fs-readfile-promise module, and its function is to return a Promise version of the readFile function. Promise provides the then method to load the callback function, and the catch method catches errors thrown during execution.

As you can see, the writing of Promise is just an improvement of the callback function. After using the then method, the two-stage execution of the asynchronous task can be seen more clearly. Other than that, there is nothing new.

The biggest problem with Promise is code redundancy. The original task is wrapped by Promise. No matter what the operation is, it is a pile of then at first glance, and the original semantics becomes very unclear.

So, is there a better way to write it?

Generator function

Coroutine

Traditional programming languages ​​have long had solutions for asynchronous programming (in fact, they are multitasking solutions). One of them is called "coroutine", which means that multiple threads cooperate with each other to complete asynchronous tasks.

A coroutine is a bit like a function and a bit like a thread. Its running process is roughly as follows.

-In the first step, the coroutine A starts to execute. -In the second step, the coroutine A executes halfway, enters a pause, and the execution right is transferred to the coroutine B. -The third step, (after a period of time) the coroutine B returns the right of execution. -In the fourth step, the coroutine A resumes execution.

The coroutine A of the above process is an asynchronous task because it is executed in two (or more) stages.

For example, the coroutine to read the file is written as follows.

function* asyncJob() {
  // ... other code
  var f = yield readFile(fileA);
  // ... other code
}

The function asyncJob in the above code is a coroutine, and its secret lies in the yield command. It means that execution ends here, and execution rights will be handed to other coroutines. In other words, the yield command is the dividing line between the two phases of asynchronous.

The coroutine suspends when it encounters the yield command, waits until the execution right returns, and then continues execution from the place where it was suspended. Its biggest advantage is that the code is written very much like a synchronous operation, if you remove the yield command, it will be exactly the same.

The generator function implementation of the coroutine

The Generator function is the implementation of the coroutine in ES6. The biggest feature is that it can hand over the execution rights of the function (that is, suspend execution).

The entire Generator function is an encapsulated asynchronous task, or a container for asynchronous tasks. Where asynchronous operations need to be suspended, use the yield statement to indicate. The execution method of the Generator function is as follows.

function* gen(x) {
  var y = yield x + 2;
  return y;
}

var g = gen(1);
g.next(); // {value: 3, done: false}
g.next(); // {value: undefined, done: true}

In the above code, calling the Generator function will return an internal pointer (ie iterator) g. This is another difference between the Generator function and the normal function, that is, executing it will not return a result, it will return a pointer object. Calling the next method of the pointer g will move the internal pointer (that is, the first paragraph of the asynchronous task) to point to the first yield statement encountered. The above example is executed until x + 2.

In other words, the role of the next method is to execute the Generator function in stages. Each time the next method is called, an object will be returned, representing the current stage information (the value attribute and the done attribute). The value attribute is the value of the expression after the yield statement, which represents the value of the current stage; the done attribute is a boolean value, which indicates whether the Generator function has been executed, that is, whether there is another stage.

Generator function data exchange and error handling

The Generator function can suspend execution and resume execution, which is the fundamental reason why it can encapsulate asynchronous tasks. In addition, it has two features that make it a complete solution for asynchronous programming: data exchange inside and outside the function body and error handling mechanism.

The value attribute of the return value of next is the output data of the Generator function; the next method can also accept parameters and input data into the body of the Generator function.

function* gen(x) {
  var y = yield x + 2;
  return y;
}

var g = gen(1);
g.next(); // {value: 3, done: false}
g.next(2); // {value: 2, done: true}

In the above code, the value attribute of the first next method returns the value 3 of the expression x + 2. The second next method has a parameter 2, this parameter can be passed into the Generator function, as the return result of the asynchronous task of the previous stage, received by the variable y in the function body. Therefore, the value property of this step returns 2 (the value of the variable y).

Error handling code can also be deployed inside the Generator function to capture errors thrown outside the function.

function* gen(x) {
  try {
    var y = yield x + 2;
  } catch (e) {
    console.log(e);
  }
  return y;
}

var g = gen(1);
g.next();
g.throw("Something went wrong");
// error

In the last line of the above code, outside the Generator function, the error thrown by the throw method of the pointer object can be caught by the try...catch code block in the function body. This means that the error code and the error-handling code are separated in time and space, which is undoubtedly very important for asynchronous programming.

Asynchronous task encapsulation

Let's take a look at how to use the Generator function to perform a real asynchronous task.

var fetch = require("node-fetch");

function* gen() {
  var url = "https://api.github.com/users/github";
  var result = yield fetch(url);
  console.log(result.bio);
}

In the above code, the Generator function encapsulates an asynchronous operation, which first reads a remote interface, and then parses the information from the data in JSON format. As mentioned earlier, this code is very similar to a synchronous operation, except for the addition of the yield command.

The method to execute this code is as follows.

var g = gen();
var result = g.next();

result.value
  .then(function (data) {
    return data.json();
  })
  .then(function (data) {
    g.next(data);
  });

In the above code, first execute the Generator function to obtain the iterator object, and then use the next method (second line) to execute the first stage of the asynchronous task. Since the Fetch module returns a Promise object, the next next method must be called with the then method.

As you can see, although the Generator function expresses asynchronous operations very concisely, the process management is not convenient (that is, when to execute the first stage and when to execute the second stage).

Thunk function

The Thunk function is a way to automatically execute the Generator function.

Parameter evaluation strategy

The Thunk function was born in the 1960s.

At that time, programming languages ​​had just started and computer scientists were still studying how to write compilers better. One point of contention is the "evaluation strategy", that is, when the parameters of a function should be evaluated.

var x = 1;

function f(m) {
  return m * 2;
}

f(x + 5);

The above code first defines the function f, and then passes the expression x + 5 to it. Excuse me, when should this expression be evaluated?

One opinion is "call by value", that is, before entering the function body, calculate the value of x + 5 (equal to 6), and then pass this value to the function f. The C language uses this strategy.

f(x + 5);
// When calling by value, it is equivalent to
f(6);

Another opinion is "call by name", that is, the expression x + 5 is directly passed into the function body, and it is evaluated only when it is used. The Haskell language uses this strategy.

f(x + 5)(
  // When called by name, it is equivalent to
  x + 5
) * 2;

Which one is better, call by value or call by name?

The answer is that each has its pros and cons. Call by value is relatively simple, but when evaluating the parameter, the parameter is not actually used, which may cause performance loss.

function f(a, b) {
  return b;
}

f(3 * x * x - 2 * x - 1, x);

In the above code, the first parameter of the function f is a complex expression, but the function body is not used at all. Evaluating this parameter is actually unnecessary. Therefore, some computer scientists tend to "call by name", that is, only evaluate during execution.

The meaning of Thunk function

The implementation of "call by name" of the compiler is usually to put the parameters in a temporary function, and then pass the temporary function into the function body. This temporary function is called the Thunk function.

function f(m) {
  return m * 2;
}

f(x + 5);

// Equivalent to

var thunk = function () {
  return x + 5;
};

function f(thunk) {
  return thunk() * 2;
}

In the above code, the parameter x + 5 of the function f is replaced by a function. Wherever the original parameters are used, just evaluate the Thunk function.

This is the definition of the Thunk function, which is an implementation strategy of "call by name" to replace a certain expression.

Thunk function of JavaScript language

The JavaScript language is called by value, and its Thunk function has a different meaning. In the JavaScript language, the Thunk function replaces not an expression, but a multi-parameter function. Replace it with a single-parameter function that only accepts a callback function as a parameter.

// The normal version of readFile (multi-parameter version)
fs.readFile(fileName, callback);

// Thunk version of readFile (single parameter version)
var Thunk = function (fileName) {
  return function (callback) {
    return fs.readFile(fileName, callback);
  };
};

var readFileThunk = Thunk(fileName);
readFileThunk(callback);

In the above code, the readFile method of the fs module is a multi-parameter function, and the two parameters are the file name and the callback function. After the converter is processed, it becomes a single-parameter function, which only accepts the callback function as a parameter. This single-parameter version is called the Thunk function.

Any function, as long as the parameter has a callback function, can be written in the form of a Thunk function. Below is a simple Thunk function converter.

// ES5 version
var Thunk = function (fn) {
  return function () {
    var args = Array.prototype.slice.call(arguments);
    return function (callback) {
      args.push(callback);
      return fn.apply(this, args);
    };
  };
};

// ES6 version
const Thunk = function (fn) {
  return function (...args) {
    return function (callback) {
      return fn.call(this, ...args, callback);
    };
  };
};

Use the above converter to generate the Thunk function of fs.readFile.

var readFileThunk = Thunk(fs.readFile);
readFileThunk(fileA)(callback);

Here is another complete example.

function f(a, cb) {
  cb(a);
}
const ft = Thunk(f);

ft(1)(console.log); // 1

Thunkify module

For converters in the production environment, it is recommended to use the Thunkify module.

The first is installation.

$ npm install thunkify

The usage is as follows.

var thunkify = require("thunkify");
var fs = require("fs");

var read = thunkify(fs.readFile);
read("package.json")(function (err, str) {
  // ...
});

The source code of Thunkify is very similar to the simple converter in the previous section.

function thunkify(fn) {
  return function () {
    var args = new Array(arguments.length);
    var ctx = this;

    for (var i = 0; i < args.length; ++i) {
      args[i] = arguments[i];
    }

    return function (done) {
      var called;

      args.push(function () {
        if (called) return;
        called = true;
        done.apply(null, arguments);
      });

      try {
        fn.apply(ctx, args);
      } catch (err) {
        done(err);
      }
    };
  };
}

Its source code mainly has a checking mechanism, and the variable called ensures that the callback function is run only once. This design is related to the Generator function below. Please see the example below.

function f(a, b, callback) {
  var sum = a + b;
  callback(sum);
  callback(sum);
}

var ft = thunkify(f);
var print = console.log.bind(console);
ft(1, 2)(print);
// 3

In the above code, because thunkify only allows the callback function to be executed once, only one line of result is output.

Generator function process management

You may ask, what is the use of the Thunk function? The answer is that it was really useless before, but ES6 has the Generator function, and the Thunk function can now be used for automatic process management of the Generator function.

The Generator function can be executed automatically.

function* gen() {
  // ...
}

var g = gen();
var res = g.next();

while (!res.done) {
  console.log(res.value);
  res = g.next();
}

In the above code, the Generator function gen will automatically perform all steps.

However, this is not suitable for asynchronous operations. If it is necessary to ensure that the previous step is executed before the next step can be executed, the above automatic execution is not feasible. At this time, the Thunk function can come in handy. Take reading a file as an example. The following Generator function encapsulates two asynchronous operations.

var fs = require("fs");
var thunkify = require("thunkify");
var readFileThunk = thunkify(fs.readFile);

var gen = function* () {
  var r1 = yield readFileThunk("/etc/fstab");
  console.log(r1.toString());
  var r2 = yield readFileThunk("/etc/shells");
  console.log(r2.toString());
};

In the above code, the yield command is used to move the execution right of the program out of the Generator function, so a method is needed to return the execution right to the Generator function.

This method is the Thunk function, because it can return the execution power to the Generator function in the callback function. For ease of understanding, let's first look at how to manually execute the above Generator function.

var g = gen();

var r1 = g.next();
r1.value(function (err, data) {
  if (err) throw err;
  var r2 = g.next(data);
  r2.value(function (err, data) {
    if (err) throw err;
    g.next(data);
  });
});

In the above code, the variable g is the internal pointer of the Generator function, which indicates the current execution step. The next method is responsible for moving the pointer to the next step and returning the information of that step (the value attribute and the done attribute).

Looking closely at the above code, you can find that the execution process of the Generator function is actually passing the same callback function repeatedly to the value attribute of the next method. This allows us to use recursion to automate this process.

Automatic process management of Thunk functions

The real power of the Thunk function lies in the automatic execution of the Generator function. The following is a Generator executor based on the Thunk function.

function run(fn) {
  var gen = fn();

  function next(err, data) {
    var result = gen.next(data);
    if (result.done) return;
    result.value(next);
  }

  next();
}

function* g() {
  // ...
}

run(g);

The run function in the above code is an automatic executor of the Generator function. The internal next function is Thunk's callback function. The next function first moves the pointer to the next step of the Generator function (gen.next method), and then judges whether the Generator function is over (result.done attribute), if it is not over, it passes the next function again Enter the Thunk function (the result.value attribute), otherwise it will exit directly.

With this executor, it is much more convenient to execute the Generator function. No matter how many asynchronous operations are inside, just pass the Generator function directly to the run function. Of course, the premise is that every asynchronous operation must be a Thunk function, that is, the Thunk function that follows the yield command must be.

var g = function* () {
  var f1 = yield readFileThunk("fileA");
  var f2 = yield readFileThunk("fileB");
  // ...
  var fn = yield readFileThunk("fileN");
};

run(g);

In the above code, the function g encapsulates n asynchronous file reading operations, as long as the run function is executed, these operations will be completed automatically. In this way, asynchronous operations can not only be written like synchronous operations, but also one line of code can be executed.

The Thunk function is not the only solution for automatic execution of the Generator function. Because the key to automatic execution is that there must be a mechanism to automatically control the flow of the Generator function, receive and return the execution rights of the program. Callback functions can do this, as can Promise objects.

co module

Basic usage

co module is a small tool released by the famous programmer TJ Holowaychuk in June 2013 for the automatic execution of Generator functions.

Below is a Generator function to read two files in sequence.

var gen = function* () {
  var f1 = yield readFile("/etc/fstab");
  var f2 = yield readFile("/etc/shells");
  console.log(f1.toString());
  console.log(f2.toString());
};

The co module allows you to avoid writing the executor of the Generator function.

var co = require("co");
co(gen);

In the above code, the Generator function will be automatically executed as long as the co function is passed in.

The co function returns a Promise object, so you can use the then method to add a callback function.

co(gen).then(function () {
  console.log("Generator function execution completed");
});

In the above code, when the Generator function is finished, a line of prompt will be output.

Principle of co module

Why can co automatically execute the Generator function?

As mentioned earlier, Generator is a container for asynchronous operations. Its automatic execution requires a mechanism, when the asynchronous operation has a result, the execution right can be automatically returned.

There are two ways to do this.

(1) Callback function. Wrap the asynchronous operation into a Thunk function, and return the execution right in the callback function.

(2) Promise object. Wrap the asynchronous operation into a Promise object, and use the then method to return the right of execution.

The co module actually wraps two automatic executors (Thunk function and Promise object) into one module. The prerequisite for using co is that after the yield command of the Generator function, there can only be a Thunk function or a Promise object. If the members of the array or object are all Promise objects, you can also use co. See the example below for details.

The previous section has introduced the automatic executor based on the Thunk function. Let's look at the automatic executor based on Promise objects. This is necessary to understand the co module.

Automatic execution based on Promise objects

Still use the above example. First, wrap the readFile method of the fs module into a Promise object.

var fs = require("fs");

var readFile = function (fileName) {
  return new Promise(function (resolve, reject) {
    fs.readFile(fileName, function (error, data) {
      if (error) return reject(error);
      resolve(data);
    });
  });
};

var gen = function* () {
  var f1 = yield readFile("/etc/fstab");
  var f2 = yield readFile("/etc/shells");
  console.log(f1.toString());
  console.log(f2.toString());
};

Then, manually execute the above Generator function.

var g = gen();

g.next().value.then(function (data) {
  g.next(data).value.then(function (data) {
    g.next(data);
  });
});

Manual execution is actually using the then method to add callback functions layer by layer. Understand this, you can write an automatic actuator.

function run(gen) {
  var g = gen();

  function next(data) {
    var result = g.next(data);
    if (result.done) return result.value;
    result.value.then(function (data) {
      next(data);
    });
  }

  next();
}

run(gen);

In the above code, as long as the Generator function has not been executed to the last step, the next function calls itself to achieve automatic execution.

Source code of co module

co is an extension of the auto-executor above. Its source code is only dozens of lines, which is very simple.

First, the co function accepts the Generator function as a parameter and returns a Promise object.

function co(gen) {
  var ctx = this;

  return new Promise(function (resolve, reject) {});
}

In the returned Promise object, co first checks whether the parameter gen is a Generator function. If it is, execute the function and get an internal pointer object; if it is not, return and change the state of the Promise object to resolved.

function co(gen) {
  var ctx = this;

  return new Promise(function (resolve, reject) {
    if (typeof gen === "function") gen = gen.call(ctx);
    if (!gen || typeof gen.next !== "function") return resolve(gen);
  });
}

Next, co wraps the next method of the internal pointer object of the Generator function into an onFulfilled function. This is mainly to be able to catch the thrown error.

function co(gen) {
  var ctx = this;

  return new Promise(function (resolve, reject) {
    if (typeof gen === "function") gen = gen.call(ctx);
    if (!gen || typeof gen.next !== "function") return resolve(gen);

    onFulfilled();
    function onFulfilled(res) {
      var ret;
      try {
        ret = gen.next(res);
      } catch (e) {
        return reject(e);
      }
      next(ret);
    }
  });
}

Finally, there is the key next function, which calls itself repeatedly.

function next(ret) {
  if (ret.done) return resolve(ret.value);
  var value = toPromise.call(ctx, ret.value);
  if (value && isPromise(value)) return value.then(onFulfilled, onRejected);
  return onRejected(
    new TypeError(
      "You may only yield a function, promise, generator, array, or object, " +
        'but the following object was passed: "' +
        String(ret.value) +
        '"'
    )
  );
}

In the above code, the internal code of the next function has only four lines of commands in total.

In the first line, check whether it is the last step of the Generator function, and return if it is.

The second line ensures that the return value of each step is a Promise object.

In the third line, use the then method to add a callback function to the return value, and then call the next function again through the onFulfilled function.

In the fourth line, when the parameters do not meet the requirements (parameters are not Thunk functions and Promise objects), the state of the Promise object is changed to rejected, thereby terminating the execution.

Handling concurrent asynchronous operations

co supports concurrent asynchronous operations, which allows certain operations to be performed at the same time and waits until they are all completed before proceeding to the next step.

At this time, all concurrent operations should be placed in the array or object, following the yield statement.

// How to write an array
co(function* () {
  var res = yield [Promise.resolve(1), Promise.resolve(2)];
  console.log(res);
}).catch(onerror);

// How to write the object
co(function* () {
  var res = yield {
    1: Promise.resolve(1),
    2: Promise.resolve(2),
  };
  console.log(res);
}).catch(onerror);

Here is another example.

co(function* () {
  var values ​​= [n1, n2, n3];
  yield values.map(somethingAsync);
});

function* somethingAsync(x) {
  // do something async
  return y
}

The above code allows three concurrent somethingAsync asynchronous operations, and waits until they are all completed before proceeding to the next step.

Example: Processing Stream

Node provides Stream mode to read and write data, which is characterized by only processing a part of the data at a time, and the data is divided into blocks and processed sequentially, just like a "data stream". This is very beneficial for processing large-scale data. Stream mode uses EventEmitter API, which releases three events.

-data event: The next data block is ready. -end event: The entire "data stream" has been processed. -error event: an error has occurred.

Using the Promise.race() function, you can determine which of these three events occurred first, and only when the data event occurs first, can the next data block be processed. Thus, we can complete the reading of all data through a while loop.

const co = require("co");
const fs = require("fs");

const stream = fs.createReadStream("./les_miserables.txt");
let valjeanCount = 0;

co(function* () {
  while (true) {
    const res = yield Promise.race([
      new Promise((resolve) => stream.once("data", resolve)),
      new Promise((resolve) => stream.once("end", resolve)),
      new Promise((resolve, reject) => stream.once("error", reject)),
    ]);
    if (!res) {
      break;
    }
    stream.removeAllListeners("data");
    stream.removeAllListeners("end");
    stream.removeAllListeners("error");
    valjeanCount += (res.toString().match(/valjean/gi) || []).length;
  }
  console.log("count:", valjeanCount); // count: 1120
});

The above code uses Stream mode to read the text file of "Les Miserables", and uses the stream.once method for each data block, adding a one-time callback function to the three events of data, end, and error . The variable res has a value only when the data event occurs, and then the number of occurrences of the word valjean in each data block is accumulated.