FrankTheDevop FrankTheDevop FrankTheDevop FrankTheDevop FrankTheDevop

Node.js

Script in Node.js to iterate a directory and extract information from it´s files

150 150 Frank

Hi everyone,

after we did the template last time, I want to show you how to put the single pieces together.
Based on a task at hand I choose the example of iterating and working with files in a directory.
The exact task was:
– Iterate a directory
– find all JSON files in it
– read them
– extract all objects in them
– extract the property email from them
– extract the unique domains of the email addresses
– count how often each domain occurs
– write this information to a summary file for further processing / display

'use strict'

const Promise = require('bluebird')
const fs = require('fs');
const path = require('path');
const util = require('util');

// Promisify only readdir as we don´t need more
const readdirAsync = Promise.promisify(fs.readdir);
const writeFileAsync = Promise.promisify(fs.writeFile);

// Commandline handling
const optionDefinitions = [
  { name: 'folder', alias: 'f', type: String }
]
const commandLineArgs = require('command-line-args')
const options = commandLineArgs(optionDefinitions)

// Add the path to your files
const folder = options.folder
// e.g. '/Users/$Yourusername/Downloads/customerdata';

// This will hold all entries from all files
//  Not unique
const all = []

// Read all files in our directory
return readdirAsync(folder)
  .then(files => {
    files
      .map(entry => {
        if(entry.indexOf('.') > 0 && path.extname(entry) === '.json') {
          // In case there are .json files in the folder that are not in JSON format
          try {
            const temp = require(path.join(folder, entry))

	    // I know for sure that all entries have an filled email column so I can just split here
            // and extract the domain name without checking
            temp.map(entry => (all.push(entry.email.split('@')[1])))
          } catch (e) {}
          return null
        }
      })

      console.log(all)
  })
  .then(() => {
    // Create a unique array
    // Use the new Set feature of ES 6
   return Promise.resolve([...new Set(all)])
  })
  .then(allUnique => {
    // Get a list of unique entries with the number of times it appears
    let newList = []

    allUnique.map(entry => {
      const t = all.filter(innerEntry => innerEntry === entry)
      newList.push({name: entry, count: t.length})
    })

    return Promise.resolve(newList)
  })
  .then(allUnique => {
    allUnique.sort((a,b) => b.count - a.count)
    return Promise.resolve(allUnique)
   })
  .then(allUnique => {
    let content = allUnique.reduce((a, b) => a + `${b.name};${b.count}\n`, '')
    return writeFileAsync(path.join(folder, 'all.txt'), content)
  })
  .then(data => {
    console.log('Wrote file successfully')
  })
  .catch(err => {
    console.log('An error occurred:', err)
  })

You find the repository for it here.

If you are looking for the explanation, continue to read. Otherwise be happy with the template and change it to your hearts desire ;).

This one is a bit longer but stay with me, we will go through it together.
At first we have the standard block where we import all required libraries in Line 3-6.

Then we convert the async, callback-based functions for readdir and writeFile to Promises (we promisify them) for easier
and more elegant handling in Line 9-10.

Next comes the handling of command line (CLI) parameters as we did before in Line 12 – 21.

We define an array all which will receive all email domains from the read files (not unique) in Line 25 .

Now we have everything together to start:
We read the directory content in Line 28.
In Line 29 it returns an array of all found files
With Line 30-31 we start iterating all elements of the array and with Line 32 with make sure that only files ending with .json are accepted, all others are ignored
Line 34-41 is a bit of cheating, Node.js is able to require an JSON file. So instead of reading the file, parsing it and having to handle it all myself, I use the functionality of require.
In case there is a JSON file that can not be parsed I wrap it into a try error, so that it continues on an error
Line 39 does a few things at once:
– With .map I iterate over all entries in the file
– I know each object contains an email property, therefore I act on it without checking
– An email address is in the form username@domainname.domainextension, I need the domain name and extension, so I split the email property and take the second half of the email address, which is the domain part
– Each of these domain parts is pushed into the array for further processing

After all the processing I make a debug output in Line 45.

JavaScript ES 6 introduced some nice new features, one is a Set (an “array” of unique values) and the decomposition operator. In Line 50 I return an new array that is created by decomposing the Set,
so in short: In one line I get an unique array of domains.

In the next function we create a new array of object with the domain name and the number of occurrences. For that we iterate over each entry of the unique array in Line 56,
use the filter method of the not unique array with all entries in Line 57. The filter method returns an array, so I can create the JSON object with the number of occurrences easily by using array.length in Line 58.

After I have the array with the number of occurrences I want to see it sorted. The sort function allows use to provide a function how to sort. And thanks to (5) & (6) I found an short way to do in as you see
in Line 64.

In the last function I use the array.reduce function to create a string from the JSON objects. You can see this post-processing step in Line 68.

All that is left is to write the data to a file as you see in Line 69.

This is followed by a simple message to signal that the script has successfully finished (Line 72) or the output of the error if one occurred in Line 75.

I hope I could help you save time again in your race against the clock and you found the explanations useful.

Yours sincerely,
Frank

Sources:
(1) How to escape Callback Hell
(2) Explanation of Node.js CLI Argument handling
(3) Explanation of Node.js CLI Argument handling II
(4) My own short example of an template for Node.js CLI Argument handling
(5) Sorting an array
(6) Sorting an array of objects by their property
(7) How to write to a file in Node.js
(8) How to avoid making mistakes with Promises
(9) Repository for the scrip
(10) Escape Callback Hell with Promises
(11) My article about how to convert (promisify) an async function with callback to a Promise based one

Commandline tools with Nodejs

150 150 admin

Hi everyone,

sometimes you need a small tool like but you might be working for an extended period of time in Node.js so that you don´t want to
switch languages and loose time and momentum on it. You want to do it quickly, but correctly to be able to reuse it in one way or another at a later point.
This is what today is about about.

I will show you quickly how to do a template for commandline arguments and handling them comfortably, so that you have this off your plate.

Here is the code for it:

'use strict'

const commandLineArgs = require('command-line-args')

// Commandline handling
const optionDefinitions = [
{ name: 'folder', alias: 'f', type: String }
]

const options = commandLineArgs(optionDefinitions)

// Add the path to your files
const folder = options.folder
// '/Users/$YourUsername/Downloads/customerdata';
console.log(`Given folder:${folder}`)

Explanation:

I use the npm package command-line-args to be able to handle commandline arguments easily.
Line 3: At first we import the command-line-args package.
Line 6: Then we define the options we want to be able to use. I chose an option folder with the type string.

Line 10: After we defined them we feed then to commandLineArgs and it parses them for use and returns a json document with the result.

Line 13: In that result we have properties with the name we defined in our options and we can extract them like we are used to.

If you save it as template.js, the following syntax is supported on the commandline:

node template.js --folder $YourFolder
node template.js --folder=$YourFolder
node template.js -f $YourFolder

As you can see we do have defined long and short form of the parameter that is required.

You can find the file on https://github.com/FrankTheDevop/cli-template too.
The npm package you find on https://www.npmjs.com/package/command-line-args and it´s repo on https://github.com/75lb/command-line-args.

I hope I could help you save some time i research and trial & error with this short nugget.

Yours Sincerely,
Frank

Sources:
(1) https://flaviocopes.com/node-cli-args/
(2) https://code-maven.com/argv-raw-command-line-arguments-in-nodejs
(3) https://codeburst.io/need-for-promises-and-rookie-mistakes-to-avoid-when-using-promises-9cabba215e04

How to promisify with bluebird

150 150 Frank

Hey @everyone,

this will be a quick tip / reference. Sometimes I get asked about the syntax to promisify only one function with the bluebird Promise Library. I will show you an example to promisify the readdir method of the fs package:

'use strict'
const fs = require('fs')
const Promise = require('bluebird')

const readdirAsync = Promise.promisify(fs.readdir)

That´s it already. Before your code looked somewhat like this:

fs.readdir(myPath, (err, files) => {
  // Handle the files
})

If you needed to do further asynchronous operations you came into the callback hell (1).

Promisifying it make the syntax clearer and more elegant:

readdirAsync(myPath)
.then(files => {
  // Handle the files
})

You see, now you are in the promise chain, which makes elegant and readable code easier to achieve.
And you don´t have to be extra careful about using Promises and callbacks at the same time.

Yours sincerely,
Frank

(1) // https://blog.syntonic.io/2017/07/07/escaping-callback-hell-util-promisify/

Node.js Tooling I – Processmanager PM2

150 150 Frank

The purpose of this post is to help you get started with tools for Node.js in general and Loopback specifically to ease you life as Developer and Operator.

Prerequisite

You need a Node.js based API (to follow through this article. PM2 supports other languages too).

PM2 Processmanager

The PM2 Processmanager is a mighty one with many different options, has various integrations and even supports an online monitoring.

In this Article I present you the basic usage of it to get started fast. In a later Article I will help you migrate to Docker and add the online monitoring of your processes.

Installation

The Installation is as easy as npm install pm2 -g. It is important that you install it globally.

How to use it

You can manually start an api by issuing pm2 start app.js, but then you have to specify all parameter on every start. So it could be fast to test, but I recommend writing a small configuration file for it name process.yml.

Configuration Syntax

PM2 offers multiple syntax variants for the configuration file. Currently they support Javascript, JSON and YAML format. I prefer to create this configuration in YAML so I will present it to you in this syntax. For the others please have a look at Process Configuration File Syntax (1)

process.yml

Most often my configuration looks like this:

apps:
  - script: path_to_startup_script
    name: api_name
    exec_mode: cluster
    instances: 1
    env:
      DEBUG_FD: 1
      DEBUG_COLORS: 1

 
apps is the root element.

Each script entry refers to an app or api that should be started. Here you define the path to the file that starts your api.

You can either define one app per app/api or you can even define a stack you want to start with multiple entries.

Personally I use one configuration file per api in development and one configuration for a stack in the staging environment before I deploy to docker.

The name describe the name you will see in the process manager when you list the running processes.

With exec_mode and the following instances things get interesting. If you define the mode fork you can only start up one instance of this app/api. But if you define cluster then you´re able to scale the app/api with one single command and pm2 will load balance it for you!

instances defines how many concurrent instances you want to launch of this app/api at startup. I set this normally to 1 and adjust on the fly according to the needs.
This way I can already get a first idea of the need to scale.

With the env you can specify environment variables you want to set.
DEBUG_FD: 1 tells Node.js to change the output stream to process.stdout.
DEBUG_COLORS: 1 will add colors to the pm2 log output. This is handy because you see on the first glance if the logs message is an error or not.

These have not been all possibly Attributes for the configuration file. If you want a tighter control have a look at the Configuration File Attributes (2).

After this explanation you will find my configurations for a Express based API and a Loopback based API.

Express Example

apps:
  - script: bin/www
    name: api
    exec_mode: cluster
    instances: 1
    env:
      DEBUG_FD: 1
      DEBUG_COLORS: 1
      NODE_ENV: staging

Loopback Example

apps:
  - script: server/server.js
    name: loopback_api
    exec_mode: cluster
    instances: 1
    env:
      DEBUG_FD: 1
      DEBUG_COLORS: 1

Commands

After we defined the configuration we need to interact with the Processmanager to start and stop our APIs and check the log output.

Listing currently running Processes

To view all currently running Processes issue pm2 list at a console.

Start an API with an process.x

Starting your API with an configuration file is as easy as the command pm2 start process.x (x is config.js, json or yml).
After this command PM2 starts your API with the specified configuration and outputs it list of currently running processes.

Stopping an API

You can stop your API with pm2 stop process.x.
Important to know is that PM2 just stops your API then, but won´t remove it from the prepared to run process list. If you want to remove it cleanly and make sure on the next start you have a clean plate you have to destroy it.

Destroying an prepared to run API entry

To remove a API entry from the prepared to run list you issue pm2 delete process.x

Check the Logs

To check all logs without filtering to one API you issue pm2 logs.

If you want to filter the logs to one specific API you can add it to the command like this pm2 logs name_of_your_api

If you have any question post them into the comments. Or feel free to send my an email.

Yours sincerely,

Frank

Sources:
(1) http://pm2.keymetrics.io/docs/usage/application-declaration/
(2) http://pm2.keymetrics.io/docs/usage/application-declaration/#attributes-available