'socket.io failed connections when testing with Artillery

I'm setting up a node.js socket.io server and need to support around 10k users. This guide says it should be possible to get up to 55k concurrent connections.

I'm hosting on an Ubuntu 20.04 EC2 with 4 vcpus and 16b RAM. Socket.io JavaScript (reduced for clarity) for node.js is:

import * as express from "express";
import { createServer } from "http";
import * as fs from 'fs';
import * as https from 'https';
import * as path from 'path';
import { Server } from "socket.io";
import * as os from 'os';
import * as sticky from 'sticky-session'; // Required to make use of multiple processors

let httpServer;
const app = express();

port = 443;

// Certificate
const privateKey = fs.readFileSync('MYPATH/privkey.pem', 'utf8');
const certificate = fs.readFileSync('MYPATH/cert.pem', 'utf8');
const ca = fs.readFileSync('MYPATH/chain.pem', 'utf8');

const credentials = {
  key: privateKey,
  cert: certificate,
  ca: ca
};

httpServer = https.createServer(credentials, app);

const io = new Server(httpServer, {
  cors: {
    origin: [
      MYORIGINS
    ]
  }
});


io.on('connection', (socket) => {
  // MY ENDPOINTS
});


if (!sticky.listen(httpServer, port)) {
  // Master code
  httpServer.once('listening', function() {
    console.log('server started on port port');
  });
} else {
  // Worker code
}

I've followed the steps in the guide above doing:

sudo nano /etc/security/limits.d/custom.conf and adding the following lines:

* soft nofile 1048576
* hard nofile 1048576

And sudo nano /etc/sysctl.d/net.ipv4.ip_local_port_range.conf adding:

net.ipv4.ip_local_port_range = 10000 65535

I've also read through and tried the settings from this article.

I am testing with Artillery with the following YAML config:

config:
  target: "MYURL"
  phases:
    - duration: 20
      arrivalRate: 250
  engines:
   socketio-v3: {
     transports: ["websocket"]
   }

scenarios:
  - name: My load testing
    engine: socketio-v3
    flow:
      - think: 20

I am getting really varied results. With exactly those settings I am getting 0 vusers.failed which is good, but that only gives me 5000 users.

If I increase either the duration or the arrival rate to give a higher overall number of users I start to get significant numbers of vusers.failed. e.g. Creating 20k users I get 9k failed. Creating 50k users I get 49k failed!

I'm not quite sure exactly what vusers.failed means (I couldn't find documentation).

I'm using sticky-session on advice from this article as I want all 4 CPU cores to be used. That seems to run 4x instances of the httpServer. I am not 100% sure if this is needed as I am never seeing CPU use go above 20% or so however hard I hit it with users.

What does vusers.failed mean exactly?

Why am I getting vusers.failed errors like this?

What do I need to change?

Do I need the sticky sessions and is that what's required to use all 4 processor cores on the server?

Thanks very much!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source