🚀 How Node.js Handles Millions of Requests: The Architecture Behind the Magic

faizan.rasheed
Faizan Rasheed
Published on Feb, 02 2026 4 min read 0 comments
image

Node.js isn’t just “fast” — it’s smart by design.
While many believe raw multithreading is the only path to scale, Node.js demonstrates that a well-architected, single-threaded event loop can handle >1 million requests per second by leveraging a suite of powerful, behind-the-scenes concepts.

In this deep dive, we’ll explore exactly how Node.js achieves this — not through magic, but through intentional engineering.

🔗 Keep-Alive: Reusing Connections for Speed

When you open an HTTP connection, there’s overhead. Node.js uses HTTP Keep-Alive to reuse TCP connections across multiple requests, drastically reducing latency and system resource usage.

Example:
Without Keep-Alive, each HTTP request performs a TCP handshake:

[Client] ---- SYN ---> [Server]
[Client] <-- SYN+ACK -- [Server]
[Client] ---- ACK ---> [Server]
[Request/Response]
[FIN, ACK] (Connection closed)

With Keep-Alive:

const http = require('http');
const server = http.createServer((req, res) => {
    res.end('Hello, Keep-Alive!');
});

server.keepAliveTimeout = 5000; // 5 seconds
server.headersTimeout = 60000;
server.listen(3000);

Now, the same connection handles multiple requests — ideal for API servers and microservices.

⏳ Backpressure: Preventing Memory Overload

In stream-based operations, if data arrives faster than it can be processed, memory can balloon. Node.js handles backpressure naturally using streams and the .pipe() method.

Example: Reading a large file and sending it over HTTP

const fs = require('fs');
const server = require('http').createServer();

server.on('request', (req, res) => {
    const src = fs.createReadStream('./bigfile.csv');
    src.pipe(res); // Backpressure handled automatically
});

server.listen(3000);

If the network is slow, fs.ReadStream pauses until res is ready for more data — no buffering explosions.

✈️ Single-Flight: Deduplicating Concurrent Work

Also known as request coalescing, this pattern ensures that if multiple users request the same expensive operation simultaneously, Node.js runs it once and shares the result.

Example: Caching with single-flight

const cache = new Map();

async function fetchUserData(userId) {
    if (cache.has(userId)) return cache.get(userId);
    
    console.log(`Fetching ${userId} from DB...`);
    const data = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
    cache.set(userId, data);
    return data;
}

// Multiple concurrent calls only trigger ONE DB query
Promise.all([
    fetchUserData(1),
    fetchUserData(1),
    fetchUserData(1)
]).then(console.log);
// Logs: "Fetching 1 from DB..." only once.

Libraries like async-cache-dedupe or graphql-dataloader formalize this pattern.

📦 Zero-Copy: Minimizing Memory Copies

In network operations, copying data between buffers costs CPU and memory. Node.js uses zero-copy techniques like sendfile() to serve files directly from disk to network without intermediate buffers.

Example: Serving a file with zero-copy

const http = require('http');
const fs = require('fs');

http.createServer((req, res) => {
    const file = fs.createReadStream('./video.mp4');
    file.pipe(res); // Uses stream backpressure + minimal copying
}).listen(3000);

For advanced use, response.socket.sendFile() in newer Node.js versions uses sendfile syscall where supported.

🧵 Workers / Cluster: Scaling Across CPU Cores

Though Node.js runs on a single thread, the cluster module enables fork-like multiprocessing, and worker_threads allow CPU-intensive tasks to run in parallel.

Example: Using cluster to utilize all CPU cores

const cluster = require('cluster');
const os = require('os');

if (cluster.isMaster) {
    const numCPUs = os.cpus().length;
    for (let i = 0; i < numCPUs; i++) cluster.fork();
} else {
    require('./app'); // Your HTTP server
}

Each fork handles its share of connections, multiplying throughput.

⚙️ Microtasks & Macrotasks: Event Loop Scheduling

Node.js event loop prioritizes tasks:

  • Microtasks (Promise callbacks, queueMicrotask, process.nextTick) run between each phase.
  • Macrotasks (timers, I/O, setImmediate) run in loop phases.

Example: Execution order

setTimeout(() => console.log('timer'), 0);
Promise.resolve().then(() => console.log('promise'));
process.nextTick(() => console.log('nextTick'));

// Output order:
// nextTick
// promise
// timer

Understanding this helps avoid starvation and prioritize urgent callbacks.

✂️ AbortController: Canceling Unnecessary Operations

Introduced in Node.js 15+, AbortController lets you cancel pending requests, avoiding wasted work.

Example: Timeout for a fetch request

const { AbortController } = require('abort-controller');
const fetch = require('node-fetch');

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);

fetch('https://slow.api/resource', { signal: controller.signal })
    .then(res => res.json())
    .finally(() => clearTimeout(timeout))
    .catch(err => {
        if (err.name === 'AbortError') console.log('Request timed out');
    });

Great for user navigation interruptions or SLA-bound services.

🎯 The Single-Threaded Event Loop: Knowing When and How to Work

Node.js’s secret weapon isn’t raw parallelism — it’s non-blocking I/O managed by an event loop that never sleeps unnecessarily. It offloads heavy I/O to the OS/kernel (via libuv) and only runs JavaScript when there’s real work.

This architecture reduces context-switching overhead, memory per connection, and synchronization complexity — making it ideal for:

  • High-traffic APIs
  • Real-time systems (WebSocket servers)
  • Scalable SaaS backends
  • Data-intensive streaming applications

💡 Conclusion: Architecture > Raw Threads

Node.js proves that with a well-designed concurrency model, you can achieve massive scale without traditional multithreading headaches. By combining Keep-Alive, backpressure, single-flight, zero-copy, clustering, and smart scheduling, Node.js handles millions of requests per second efficiently — all on a single-threaded foundation.

Master these concepts, and you’ll not only build faster applications — you’ll understand why they’re fast.

0 Comments