Node.js Application Monitoring: A Comprehensive Implementation Guide

Farouk Ben. - Founder at OdownFarouk Ben.()
Node.js Application Monitoring: A Comprehensive Implementation Guide - Odown - uptime monitoring and status page

Node.js has become a cornerstone technology for building high-performance, scalable web applications. However, its event-driven, non-blocking architecture introduces unique monitoring challenges that differ from traditional server environments. While understanding the business impact of website reliability provides the rationale for investing in monitoring, this guide focuses on the technical implementation specific to Node.js applications.

Key Performance Metrics for Node.js Applications

Effective Node.js monitoring requires tracking metrics that reflect its unique runtime characteristics. Unlike traditional thread-based servers, Node.js operates on a single-threaded event loop model with asynchronous I/O operations, necessitating specialized monitoring approaches.

Core Runtime Metrics

1. CPU Usage

Node.js applications are single-threaded for JavaScript execution but use a thread pool for certain operations. Monitoring CPU usage helps identify processing bottlenecks:

javascript

// Basic CPU usage monitoring implementation
const os = require('os');
function getCpuUsage() {
const cpus = os.cpus();
let totalIdle = 0;
let totalTick = 0;
for (const cpu of cpus) {
for (const type in cpu.times) {
totalTick += cpu.times[type];
}
totalIdle += cpu.times.idle;
}
return {
idle: totalIdle / cpus.length,
total: totalTick / cpus.length,
usage: 100 - (totalIdle / totalTick * 100)
};
}
// Track CPU usage over time
let lastCpuUsage = getCpuUsage();
setInterval(() => {
const currentCpuUsage = getCpuUsage();
const idleDiff = currentCpuUsage.idle - lastCpuUsage.idle;
const totalDiff = currentCpuUsage.total - lastCpuUsage.total;
const usagePercentage = 100 - (idleDiff / totalDiff * 100);

console.log(`CPU Usage: ${usagePercentage. toFixed(2)}%`);

// Alert on high CPU usage
if (usagePercentage > 85) {
notifyHighCpuUsage (usagePercentage);
}
lastCpuUsage = currentCpuUsage;
}, 5000);

Recommended Thresholds:

  • Warning: >70% sustained CPU usage
  • Critical: >85% sustained CPU usage
  • Alert trigger: >80% for 3+ consecutive measurement intervals

2. Memory Consumption

Node.js applications have a default memory limit that can cause instability if exceeded. Track both total memory usage and the breakdown of different memory types:

javascript

// Memory usage monitoring
function getMemoryUsage() {
const memoryUsage = process.memoryUsage();
return {
rss: memoryUsage.rss / 1024 / 1024, // Resident Set Size in MB
heapTotal: memoryUsage. heapTotal / 1024 / 1024, // Total size of the allocated heap
heapUsed: memoryUsage .heapUsed / 1024 / 1024, // Actual memory used during execution
external: memoryUsage. external / 1024 / 1024, // Memory used by C++ objects bound to JavaScript
arrayBuffers: memoryUsage. arrayBuffers / 1024 / 1024 // ArrayBuffers and SharedArrayBuffers
};
}
setInterval(() => {
const memory = getMemoryUsage();
console.log(`Memory RSS: ${memory.rss. toFixed(2)}MB | Heap Used: ${memory.heapUsed. toFixed(2)}MB / ${memory.heapTotal. toFixed(2)}MB`);
// Check for potential memory leaks based on heap growth pattern
memoryHistoryArray. push (memory.heapUsed);
if (memoryHistoryArray. length > 10) {
memoryHistoryArray. shift();
const isConstantlyGrowing = memoryHistoryArray .every ((value, index, array) =>
index === 0 || value >= array[index - 1] * 1.01 // 1% growth threshold
);
if (isConstantlyGrowing && memory.heapUsed > 500) { // 500MB threshold
notifyPotential MemoryLeak (memory, memoryHistoryArray);
}
}
}, 30000);

Recommended Thresholds:

  • Warning: >70% of max old space size
  • Critical: >85% of max old space size
  • Heap growth pattern: Alert on consistent upward trend without garbage collection drops
  • RSS growth: Alert when exceeding 3x the initial RSS after application warmup

3. Event Loop Lag Monitoring

The event loop is the heart of a Node.js application. Monitoring its lag helps identify when the application is becoming unresponsive:

javascript

// Event loop lag monitoring
function monitorEventLoopLag() {
let lastCheck = Date.now();
setInterval(() => {
const now = Date.now();
const lag = now - lastCheck - 100; // We expect ~100ms between checks
console.log (`Event Loop Lag: ${lag}ms`);
if (lag > 200) { // More than 100ms of lag
notifyEvent LoopLag(lag);
}
lastCheck = now;
}, 100);
}
monitorEvent LoopLag();

For more accurate monitoring, consider using specialized libraries like loopbench or toobusy-js:

javascript

// Node.js event loop monitoring with toobusy-js
const toobusy = require ('toobusy-js');
// Set maximum lag to 100ms
toobusy.maxLag(100);
// Express middleware to track event loop lag and respond with 503 when overloaded
app.use((req, res, next) => {
if (toobusy()) {
// Record the overload incident
recordEvent LoopOverload (toobusy.lag());
// Respond with 503 Service Unavailable
res.status(503) .send ("Server is too busy right now. Please try again later.");
return;
}
next();
});
// Periodically log event loop lag
setInterval(() => {
console.log (`Current event loop lag: ${toobusy.lag()}ms`);
}, 1000);

Recommended Thresholds:

  • Warning: >100ms event loop lag
  • Critical: >200ms event loop lag
  • Request rejection threshold: Typically 500-1000ms, depending on application type

Application-Level Metrics

1. HTTP Request Metrics

For web applications, tracking request metrics provides insights into performance and usage patterns:

javascript

// Express middleware for request monitoring
const requestMonitoring = (req, res, next) => {
// Track request start time
const startTime = process.hrtime();
// Track response size
let responseSize = 0;
const originalWrite = res.write;
const originalEnd = res.end;
res.write = function (chunk) {
if (chunk) {
responseSize += chunk.length;
}
return originalWrite .apply (res, arguments);
};
res.end = function (chunk) {
if (chunk) {
responseSize += chunk.length;
}
// Calculate duration
const duration = getDurationInMs (startTime);
// Log request details
const requestLog = {
method: req.method,
url: req.url,
statusCode: res.statusCode,
duration: duration,
responseSize: responseSize,
userAgent: req.get ('User-Agent'),
timestamp: new Date().t oISOString()
};
// Record request metrics
recordRequest Metrics (requestLog);
// Track slow requests
if (duration > 1000) {
notifySlowRequest (requestLog);
}
return originalEnd .apply(res, arguments);
};
next();
};
// Helper function to calculate duration in milliseconds
function getDurationInMs (startTime) {
const diff = process.hrtime (startTime);
return (diff[0] * 1e3) + (diff[1] * 1e-6);
}
// Apply middleware to Express app
app.use (requestMonitoring);

Recommended Thresholds:

  • Average response time: Alert on 50% increase from baseline
  • Slow requests: >1000ms for standard APIs, >3000ms for complex operations
  • Error rate: >1% of total requests
  • Status code anomalies: Sudden increase in non-200 responses

2. Database Connection Metrics

Most Node.js applications rely heavily on databases, making connection monitoring critical:

javascript

// MongoDB connection pool monitoring example
const mongoose = require ('mongoose');
// Monitor connection events
mongoose.connection.on ('connected', () => {
console.log ('MongoDB connected');
startMongo ConnectionMonitoring();
});
mongoose. connection.on ('error', (err) => {
console.error ('MongoDB connection error:', err);
notifyDatabase ConnectionError(err);
});
mongoose.connection.on ('disconnected', () => {
console.log ('MongoDB disconnected');
notifyDatabase Disconnection();
});
// Monitor connection pool stats
function startMongo ConnectionMonitoring() {
setInterval(async () => {
try {
const adminDb = mongoose. connection.db.admin();
const serverStatus = await adminDb. serverStatus();
const connectionStats = {
current: serverStatus. connections.current,
available: serverStatus. connections.available,
totalCreated: serverStatus. connections.totalCreated,
utilization: serverStatus. connections.current /
(serverStatus. connections.current + serverStatus. connections.available) * 100
};
console.log (`MongoDB Connections: ${connectionStats.current} active, ${connectionStats.available} available (${connectionStats. utilization. toFixed(2)}% utilization)`);
// Alert on high connection utilization
if (connectionStats. utilization > 85) {
notifyHighConnection Utilization (connectionStats);
}
} catch (error) {
console.error('Error monitoring MongoDB connections:', error);
}
}, 30000);
}

Recommended Thresholds:

  • Connection pool utilization: >85%
  • Connection errors: >0 over 5-minute period
  • Connection reset frequency: >3 reconnects per hour
  • Query timeout rate: >0.1% of total queries

3. External Service Dependencies

Track performance and availability of external services your application depends on:

javascript

// HTTP client instrumentation example
const axios = require('axios');
const originalRequest = axios.request;
// Wrap axios requests with monitoring
axios.request = function monitoredRequest (config) {
const startTime = process.hrtime();
const service = extractServiceName (config.url);
return originalRequest .call(this, config)
.then(response => {
const duration = getDurationInMs (startTime);
// Record successful dependency call
recordDependency Call({
service,
url: config.url,
method: config.method,
statusCode: response.status,
duration,
successful: true
});
// Alert on slow dependencies
if (duration > 1000) {
notifySlow Dependency (service, config.url, duration);
}
return response;
})
.catch(error => {
const duration = getDurationInMs (startTime);
// Record failed dependency call
recordDependencyCall({
service,
url: config.url,
method: config.method,
statusCode: error.response ? error.response.status : 0,
duration,
successful: false,
errorMessage: error.message
});
// Alert on dependency failures
notifyDependency Failure (service, config.url, error);
throw error;
});
};
// Extract service name from URL
function extractService Name(url) {
try {
const parsedUrl = new URL(url);
return parsedUrl.hostname;
} catch (e) {
return 'unknown';
}
}

Recommended Thresholds:

  • Response time: >1000ms average
  • Error rate: >1% of requests to a specific service
  • Availability: <99.5% over 5-minute window
  • Circuit breaker trigger: 5 consecutive failures

Memory Leak Detection Strategies

Memory leaks are among the most common issues in long-running Node.js applications. Implement these strategies to detect and address them:

1. Heap Snapshot Analysis

Use the heapdump module to capture heap snapshots for analysis:

javascript

// Heap snapshot management
const heapdump = require('heapdump');
const fs = require('fs');
// Enable heap snapshot creation on signal
process.on ('SIGUSR2', () => {
const filename = `${process.cwd()}/ heapdump-${Date.now()}. heapsnapshot`;
heapdump.writeSnapshot (filename, (err) => {
if (err) console.error ('Failed to create heapdump:', err);
else console.log(`Heap snapshot written to ${filename}`);
});
});
// Automatically create snapshots on threshold breach
let lastHeapUsed = 0;
const heapGrowthThreshold = 100; // MB
function checkHeapGrowth() {
const memoryUsage = process.memoryUsage();
const heapUsedMB = memoryUsage.heapUsed / 1024 / 1024;
if (lastHeapUsed > 0 && heapUsedMB - lastHeapUsed > heapGrowthThreshold) {
console.warn (`Significant heap growth detected: ${heapUsedMB. toFixed(2)}MB (increased by ${(heapUsedMB - lastHeapUsed) .toFixed(2)}MB)`);
const filename = `${process.cwd()}/ heapdump-growth-$ {Date.now()}. heapsnapshot`;
heapdump.writeSnapshot (filename, (err) => {
if (err) console.error ('Failed to create automatic heapdump:', err);
else {
console.log (`Growth-triggered heap snapshot written to ${filename}`);
notifyAutomaticHeapdump (filename, heapUsedMB, lastHeapUsed);
}
});
}

lastHeapUsed = heapUsedMB;
}
// Check heap every 15 minutes in production environments
if (process.env. NODE_ENV === 'production') {
setInterval (checkHeapGrowth, 15 * 60 * 1000);
}

2. Garbage Collection Metrics

Track garbage collection frequency and duration to identify potential issues:

javascript

// Tracking garbage collection with gc-stats
const gcStats = require('gc-stats')();

let gcMetrics = {
totalTime: 0,
count: 0,
scavengeCount: 0,
markSweepCount: 0,
compactCount: 0,
incrementalMarkingCount: 0
};
gcStats.on('stats', (stats) => {
// Update metrics
gcMetrics.totalTime += stats.pause;
gcMetrics.count += 1;
// Update specific GC type counts
switch (stats.gctype) {
case 1: // Scavenge (Minor GC)
gcMetrics. scavengeCount += 1;
break;
case 2: // Mark/Sweep/Compact (Major GC)
gcMetrics. markSweepCount += 1;
break;
case 3: // Incremental Marking
gcMetrics. incrementalMarkingCount += 1;
break;
case 4: // Incremental Marking
gcMetrics. incrementalMarkingCount += 1;
break;
}
// Log GC activity
console.log (`GC: type=${stats.gctype}, pause=${stats.pause}ms, heapBefore=${ (stats.before.totalHeapSize /1024/1024). toFixed(2)}MB, heapAfter=$ {(stats.after. totalHeapSize/1024/1024) .toFixed(2)}MB`);
// Alert on concerning GC patterns
if (stats.pause > 200) {
notifyLongGCPause (stats);
}
});
// Report GC metrics periodically
setInterval(() => {
const gcPercentage = gcMetrics.totalTime / (5 * 60 * 1000) * 100;
console.log(`GC Summary: ${gcMetrics.count} collections in last 5 minutes, total time: ${gcMetrics. totalTime.toFixed (2)}ms (${gcPercentage. toFixed(2)}% of time)`);
// Alert if garbage collection is taking too much time
if (gcPercentage > 10) {
notifyExcessiveGC (gcMetrics);
}
// Reset metrics
gcMetrics = {
totalTime: 0,
count: 0,
scavengeCount: 0,
markSweepCount: 0,
compactCount: 0,
incremental MarkingCount: 0
};
}, 5 * 60 * 1000);

3. Memory Growth Pattern Analysis

Implement trend analysis to detect consistent memory growth patterns:

javascript

// Memory trend analysis
const memoryHistory = {
timestamps: [],
measurements: [],
maxSize: 60 // Store an hour of data at 1-minute intervals
};
function recordMemoryUsage() {
const memoryUsage = process.memoryUsage ();
const heapUsedMB = memoryUsage.heapUsed / 1024 / 1024;
memoryHistory. timestamps.push (Date.now());
memoryHistory. measurements.push (heapUsedMB);
// Keep history within size limit
if (memoryHistory. timestamps.length > memoryHistory.maxSize) {
memoryHistory. timestamps.shift();
memoryHistory. measurements.shift();
}
// Only analyze after collecting enough data points
if (memoryHistory. measurements.length >= 10) {
analyzeMemoryTrend();
}
}
function analyzeMemoryTrend() {
// Calculate linear regression to detect growth pattern
const n = memoryHistory. measurements.length;
let sumX = 0, sumY = 0, sumXY = 0, sumXX = 0;
for (let i = 0; i < n; i++) {
sumX += i;
sumY += memoryHistory. measurements[i];
sumXY += i * memoryHistory. measurements[i];
sumXX += i * i;
}
const slope = (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
const growthRatePerHour = slope * 60; // Convert to MB/hour
// Detect consistent growth patterns that indicate potential leaks
if (growthRatePerHour > 10) { // 10MB/hour growth threshold
notifyMemoryGrowthPattern (growthRatePerHour, memoryHistory);
}
}
// Record memory usage every minute
setInterval (recordMemoryUsage, 60 * 1000);

Setting Up Effective Node.js Monitoring with Odown

Implementing a comprehensive Node.js monitoring solution with Odown involves integrating several components to track both external availability and internal health metrics.

Application Health Check Endpoint

Start by implementing a health check endpoint that provides internal status information:

javascript

// health.js - Express health check route
const os = require('os');
const router = require ('express').Router();

// Basic health status
router.get('/', (req, res) => {
res.status(200).json({
status: 'UP',
timestamp: new Date()
});
});
// Detailed health metrics
router.get('/details', (req, res) => {
const memoryUsage = process.memoryUsage ();
res.status(200).json({
status: 'UP',
timestamp: new Date(),
uptime: process.uptime(),
memory: {
rss: memoryUsage.rss / 1024 / 1024,
heapTotal: memoryUsage.heapTotal / 1024 / 1024,
heapUsed: memoryUsage.heapUsed / 1024 / 1024,
external: memoryUsage.external / 1024 / 1024,
memoryUtilization: memoryUsage.heapUsed / memoryUsage.heapTotal
},
cpu: {
count: os.cpus().length,
load: os.loadavg()
},
system: {
platform: process.platform,
arch: process.arch,
nodeVersion: process.version
}
});
});
// Dependency health checks
router.get ('/dependencies', async (req, res) => {
try {
const dependencyChecks = await Promise.allSettled([
checkDatabase Connection(),
checkRedisConnection(),
checkExternalAPI()
]);
const dependencies = {
database: dependencyChecks[0].status === 'fulfilled' ? dependencyChecks[0] .value : { status: 'DOWN', error: dependencyChecks [0].reason },
cache: dependencyChecks[1].status === 'fulfilled' ? dependencyChecks[1] .value : { status: 'DOWN', error: dependencyChecks [1].reason },
externalApi: dependencyChecks[2].status === 'fulfilled' ? dependencyChecks[2] .value : { status: 'DOWN', error: dependencyChecks [2].reason }
};
// Overall status is UP only if all critical dependencies are UP
const status = dependencies. database.status === 'UP' ? 'UP' : 'DOWN';
res.status (status === 'UP' ? 200 : 503).json({
status,
timestamp: new Date(),
dependencies
});
} catch (error) {
res.status (500).json({
status: 'ERROR',
error: error.message
});
}
});
// Add the health routes to your Express app
app.use('/health', router);

Odown HTTP Check Configuration

Configure Odown to monitor your application's health endpoints:

javascript

// Example Odown monitor configuration
const nodejsMonitor = {
name: "Node.js Application Monitoring",
type: "http",
url: "https://api.example.com /health",
method: "GET",
checkFrequency: 60, // Check every 60 seconds
locations: ["us-east", "eu-west", "asia-east"],
alertThreshold: 2, // Alert after two consecutive failures
assertions: [
{ type: "statusCode", comparison: "equals", value: 200 },
{ type: "responseTime", comparison: "lessThan", value: 500 },
{ type: "jsonBody", path: "$.status", comparison: "equals", value: "UP" }
]
};

// Dependency health check
const dependencyMonitor = {
name: "Node.js Dependencies Health",
type: "http",
url: "https://api.example.com /health/dependencies",
method: "GET",
checkFrequency: 120, // Check every 2 minutes
locations: ["us-east"],
assertions: [
{ type: "statusCode", comparison: "equals", value: 200 },
{ type: "jsonBody", path: "$.dependencies. database.status", comparison: "equals", value: "UP" },
{ type: "jsonBody", path: "$.dependencies. cache.status", comparison: "equals", value: "UP" }
]
};

Microservice Dependency Tracking

For applications with microservice architectures, implement dependency tracking between services:

javascript

// microservice-tracking.js
const { Tracer } = require ('opentracing');
const axios = require ('axios');
// Initialize tracer (using a specific implementation like Jaeger)
const tracer = initTracer ('user-service');
// Track outgoing HTTP requests
function instrumentAxios() {
const originalRequest = axios.request;
axios.request = function tracedRequest (config) {
const span = tracer.startSpan ('http_request');
// Add span context to headers
const tracingHeaders = {};
tracer.inject (span.context(), opentracing. FORMAT_HTTP_HEADERS, tracingHeaders);
config.headers = {
...config.headers,
...tracingHeaders
};
// Add request details to span
span.setTag ('http.url', config.url);
span.setTag ('http.method', config.method);
span.setTag ('service.name', extractServiceName (config.url));
const startTime = Date.now();
return originalRequest .call(this, config)
.then(response => {
const duration = Date.now() - startTime;
span.setTag ('http.status_code', response.status);
span.setTag ('response.time', duration);
span.finish();
return response;
})
.catch(error => {
const duration = Date.now() - startTime;
span.setTag ('http.status_code', error.response ? error.response.status : 0);
span.setTag ('response.time', duration);
span.setTag ('error', true);
span.setTag ('error.message', error.message);
span.finish();

throw error;
});
};
}
// Express middleware to extract and create spans
function traceMiddleware (req, res, next) {
let span;
// Try to extract parent span context from request headers
const parentSpanContext = tracer.extract (opentracing. FORMAT_HTTP_HEADERS, req.headers);
if (parentSpanContext) {
span = tracer.startSpan ('http_server', { childOf: parentSpanContext });
} else {
span = tracer.startSpan ('http_server');
}
// Add request details to span
span.setTag ('http.url', req.url);
span.setTag ('http.method', req.method);
span.setTag ('service.name', 'user-service');
// Store span in request for later use
req.span = span;
// Finish span on response completion
const finishSpan = () => {
span.setTag ('http.status_code', res.statusCode);
span.finish();
};
res.on ('finish', finishSpan);
res.on ('close', finishSpan);
next();
}
// Apply middleware to Express app
app.use (traceMiddleware);

This distributed tracing implementation allows you to visualize service dependencies and track performance across service boundaries.

Troubleshooting Common Node.js Performance Issues

Event Loop Blocking Detection

Identify operations that block the event loop and cause application unresponsiveness:

javascript

// event-loop-monitor.js
const blocked = require('blocked');

// Monitor event loop blocking
blocked((ms) => {
console.warn (`Event loop blocked for ${ms}ms`);

if (ms < 100) {
// Minor blocking, log and record metrics
recordEventLoop Blocking('minor', ms);
} else if (ms < 500) {
// Moderate blocking, alert for investigation
recordEventLoop Blocking('moderate', ms);
notifyEventLoop Blocking(ms);
} else {
// Severe blocking, high-priority alert
recordEventLoop Blocking('severe', ms);
notifyEventLoop BlockingEmergency(ms);
// Capture diagnostic information
captureDiagnostics();
}
}, { threshold: 50 }); // Detect blocks over 50ms
// Capture diagnostic information during severe event loop blocks
function captureDiagnostics() {
// Record CPU profile for 5 seconds
const profiler = require ('v8-profiler-node8');
const profileName = `cpu-profile-$ {Date.now()}`;
profiler. startProfiling (profileName);
setTimeout(() => {
const profile = profiler.stopProfiling (profileName);
// Save profile to disk
const fs = require('fs');
fs.writeFileSync (`${profileName}. cpuprofile`, JSON.stringify(profile));
// Cleanup
profile.delete();
// Notify about profile creation
notifyDiagnostics Created (profileName);
}, 5000);
}

Memory Leak Identification

Implement more advanced memory leak detection beyond basic growth tracking:

javascript

// memory-leak-detectors.js
const memwatch = require('' @airbnb/node-memwatch');
const heapdump = require('' heapdump');
// Listen for memory leak events
memwatch.on ('leak', (info) => {
console.warn ('Memory leak detected:', info);

// Generate heap snapshot
const filename = `${process.cwd()} /heapdump-leak-${Date.now()} .heapsnapshot`;
heapdump. writeSnapshot (filename, (err) => {
if (err) {
console.error ('Failed to create leak heapdump:', err);
} else {
console.log (`Leak-triggered heap snapshot written to ${filename}`);
// Notify about leak detection
notifyMemoryLeak (info, filename);
}
});
});
// Track memory stats between garbage collections
memwatch.on('stats', (stats) => {
console.log ('Memory stats:', stats);
// Alert on concerning memory patterns
if (stats.current_base > stats.estimated_base * 1.5) {
notifyMemoryBaseDrift (stats);
}
});
// Leak detection using object growth tracking
const objectCounts = new Map();
let lastSampled = Date.now();
function trackObject Counts() {
try {
// Sample object counts
const objects = getObjectCounts();
const now = Date.now();
// Check for significant increases
for (const [type, count] of Object.entries (objects)) {
const previous = objectCounts.get(type) || 0;
// Calculate growth rate per hour
const growthRate = (count - previous) / ((now - lastSampled) / 3600000);
// Alert on high growth rates for significant object counts
if (previous > 1000 && count > previous * 1.2 && growthRate > 1000) {
notifyObject TypeGrowth (type, previous, count, growthRate);
}
// Update counts
objectCounts.set (type, count);
}
lastSampled = now;
} catch (error) {
console.error ('Error tracking object counts:', error);
}
}
function getObject Counts() {
// V8 heap statistics - note that this is a simplified version
// In real implementations, use v8-heap-snapshot or similar libraries
const v8 = require('v8');
const stats = v8.getHeap Statistics();
return {
total_heap_size: stats.total_heap_size,
used_heap_size: stats.used_heap_size,
heap_size_limit: stats.heap_size_limit
};
}
// Sample object counts every 15 minutes
setInterval (trackObjectCounts, 15 * 60 * 1000);

CPU Profiling for Performance Bottlenecks

Implement on-demand CPU profiling to identify performance bottlenecks:

javascript

// cpu-profiler.js
const v8Profiler = require ('v8-profiler-node8');
const fs = require('fs');

// Set up prof
// cpu-profiler.js (continued)
const v8Profiler = require ('v8-profiler-node8');
const fs = require('fs');
// Set up profiling endpoint for on-demand CPU profiling
function setupProfiling Endpoints(app) {
// Secure with API key to prevent unauthorized profiling
const API_KEY = process.env. PROFILING_API_KEY || 'development-only-key';
app.post ('/debug/cpu-profile', (req, res) => {
// Verify API key
if (req.headers ['x-api-key'] !== API_KEY) {
return res.status(401). json({ error: 'Unauthorized' });
}
const duration = parseInt (req.query.duration || '30', 10);
const profileName = `cpu-profile-$ {Date.now()}`;
// Start CPU profiling
console.log(`Starting CPU profile: ${profileName} for ${duration} seconds`);
v8Profiler. startProfiling (profileName, true);
// Stop profiling after specified duration
setTimeout(() => {
const profile = v8Profiler.stopProfiling (profileName);
// Save profile to disk
const profilePath = `${process.cwd()} /profiles/${profileName}. cpuprofile`;
// Ensure directory exists
fs.mkdirSync (`${process.cwd()} /profiles`, { recursive: true });
// Write profile to file
fs.writeFileSync (profilePath, JSON.stringify (profile));
// Clean up
profile.delete();
console.log(`CPU profile saved to ${profilePath}`);
// Return profile info
res.json({
profile: profileName,
path: profilePath,
duration: duration
});
}, duration * 1000);
// Respond immediately
res.json({
status: 'Profiling started',
profile: profileName,
duration: duration
});
});
// Endpoint to list available profiles
app.get ('/debug/cpu-profiles', (req, res) => {
// Verify API key
if (req.headers ['x-api-key'] !== API_KEY) {
return res.status (401).json({ error: 'Unauthorized' });
}
const profilesDir = `${process.cwd ()}/profiles`;
// Create directory if it doesn't exist
if (!fs.existsSync (profilesDir)) {
fs.mkdirSync (profilesDir, { recursive: true });
}
// Read directory and filter for CPU profiles
const files = fs.readdirSync (profilesDir)
.filter(file => file.endsWith ('.cpuprofile'))
.map(file => ({
name: file.replace ('.cpuprofile', ''),
path: `${profilesDir} /${file}`,
size: fs.statSync (`${profilesDir}/ ${file}`).size,
created: fs.statSync (`${profilesDir}/ ${file}`).mtime
}));
res.json(files);
});
}

Handling Uncaught Exceptions and Promise Rejections

Proper error handling is essential for reliable Node.js applications:

javascript

// error-handling.js
const fs = require('fs');

// Create error log directory
const errorLogDir = `${process.cwd()} /logs`;
fs.mkdirSync (errorLogDir, { recursive: true });

// Track uncaught exceptions
process.on ('uncaughtException', (error) => {
// Get stack trace
const stack = error.stack || new Error().stack;
// Create detailed error log
const errorLog = {
timestamp: new Date(). toISOString(),
type: 'uncaughtException',
error: {
message: error.message,
name: error.name,
stack: stack
},
process: {
pid: process.pid,
uptime: process.uptime(),
memory: process.memoryUsage ()
}
};
// Log to console
console.error ('Uncaught Exception:', error);
// Write to log file
const logFile = `${errorLogDir}/ uncaught-exception- ${Date.now()}.json`;
fs.writeFileSync (logFile, JSON.stringify (errorLog, null, 2));
// Report error to monitoring system
reportFatalError (errorLog);
// Graceful shutdown
gracefulShutdown ('uncaughtException')
.catch (shutdownError => {
console.error ('Error during graceful shutdown:', shutdownError);
process.exit(1);
});
});
// Track unhandled promise rejections
process.on ('unhandledRejection', (reason, promise) => {
// Create error log
const errorLog = {
timestamp: new Date(). toISOString(),
type: 'unhandledRejection',
error: {
message: reason instanceof Error ? reason.message : String(reason),
stack: reason instanceof Error ? reason.stack : 'No stack trace available',
reason: reason instanceof Error ? reason : String(reason)
},
process: {
pid: process.pid,
uptime: process.uptime(),
memory: process.memoryUsage ()
}
};
// Log to console
console.error ('Unhandled Promise Rejection:', reason);
// Write to log file
const logFile = `${errorLogDir}/ unhandled-rejection- ${Date.now()}.json`;
fs.writeFileSync (logFile, JSON.stringify (errorLog, null, 2));
// Report error to monitoring system
reportNonFatalError (errorLog);
});
// Graceful shutdown function
async function gracefulShutdown (reason) {
console.log(`Initiating graceful shutdown due to ${reason}...`);
// Log shutdown event
const shutdownLog = {
timestamp: new Date(). toISOString(),
type: 'shutdown',
reason: reason,
process: {
pid: process.pid,
uptime: process.uptime(),
memory: process.memoryUsage ()
}
};
// Write shutdown log
const logFile = `${errorLogDir} /shutdown- ${Date.now()}.json`;
fs.writeFileSync (logFile, JSON.stringify (shutdownLog, null, 2));
// Close database connections
try {
console.log ('Closing database connections...');
await closeDatabase Connections();
} catch (error) {
console. ('Error closing database connections:', error);
}
// Close other resources (Redis, etc.)
try {
console.log('Closing other resources...');
await closeOtherResources ();
} catch (error) {
console.error ('Error closing other resources:', error);
}
// Let existing requests finish (for HTTP servers)
if (global.server) {
console.log ('Closing HTTP server...');
await new Promise ((resolve) => {
global.server. close(resolve);
});
}
console.log ('Graceful shutdown complete.');
process.exit(1);
}

Integrating with Alerting Systems

Configure alerts for different monitoring metrics based on their severity and impact:

javascript

// alerting.js
const axios = require('axios');
// Alert levels
const ALERT_LEVELS = {
INFO: 'info',
WARNING: 'warning',
ERROR: 'error',
CRITICAL: 'critical'
};
// Configure alert destinations
const alertConfig = {
slack: {
webhookUrl: process.env. SLACK_WEBHOOK_URL,
enabled: true
},
email: {
apiKey: process.env. EMAIL_API_KEY,
recipients: process.env. ALERT_EMAIL_RECIPIENTS? .split(',') || [],
enabled: true
},
pagerDuty: {
serviceKey: process.env. PAGERDUTY_SERVICE_KEY,
enabled: process.env. NODE_ENV === 'production'
}
};
// Send alert to configured destinations
async function sendAlert (level, title, details) {
const timestamp = new Date(). toISOString();
const environment = process.env. NODE_ENV || 'development';
const serviceName = process.env. SERVICE_NAME || 'nodejs-app';
console.log(`ALERT [${level}]: ${title}`);
const alertPromises = [];
// Send to Slack
if (alertConfig. slack.enabled && alertConfig. slack.webhookUrl) {
alertPromises. push(
axios.post (alertConfig.slack. webhookUrl, {
text: `[${environment. toUpperCase()}] [${level. toUpperCase()}] ${title}`,
attachments: [
{
color: getColorFor Level(level),
fields: [
{
title: 'Service',
value: serviceName,
short: true
},
{
title: 'Environment',
value: environment,
short: true
},
{
title: 'Timestamp',
value: timestamp,
short: true
},
{
title: 'Details',
value: typeof details === 'object' ? JSON.stringify (details, null, 2) : details
}
]
}
]
}).catch(error => {
console.error('Error sending Slack alert:', error.message);
})
);
}
// Send to PagerDuty for critical alerts
if (alertConfig. pagerDuty. enabled && alertConfig. pagerDuty.serviceKey && level === ALERT_LEVELS.CRITICAL) {
alertPromises.push(
axios.post ('https://events. pagerduty.com/v2 /enqueue', {
routing_key: alertConfig. pagerDuty.serviceKey,
event_action: 'trigger',
payload: {
summary: `[${environment. toUpperCase()}] ${title}`,
source: serviceName,
severity: 'critical',
timestamp: timestamp,
custom_details: details
}
}).catch (error => {
console.error('Error sending PagerDuty alert:', error.message);
})
);
}
// Wait for all alerts to be sent
await Promise.all (alertPromises);
}
// Helper function to get color based on alert level
function getColorFor Level(level) {
switch (level) {
case ALERT_LEVELS. INFO:
return '#3498db';
case ALERT_LEVELS. WARNING:
return '#f39c12';
case ALERT_LEVELS. ERROR:
return '#e74c3c';
case ALERT_LEVELS. CRITICAL:
return '#c0392b';
default:
return '#95a5a6';
}
}
// Alert functions for different scenarios
function alertHighMemoryUsage (memoryUsage) {
const title = `High Memory Usage: ${memoryUsage. heapUsed.toFixed (2)}MB / ${memoryUsage. heapTotal. (2)}MB (${(memoryUsage. heapUsed / memoryUsage. heapTotal * 100).toFixed (2)}%)`;
sendAlert(
memoryUsage.heapUsed / memoryUsage.heapTotal > 0.85 ? ALERT_LEVELS.CRITICAL : ALERT_LEVELS.WARNING,
title,
memoryUsage
);
}
function alertEventLoopLag(lag) {
const title = `Event Loop Lag: ${lag.toFixed(2)}ms`;
const level = lag > 500 ? ALERT_LEVELS.CRITICAL : (lag > 100 ? ALERT_LEVELS.WARNING : ALERT_LEVELS.INFO);
sendAlert(level, title, { lag });
}
// Export alert functions
module.exports = {
alertHighMemoryUsage,
alertEventLoopLag,
alertHighCpuUsage: (usage) => {
sendAlert(
usage > 90 ? ALERT_LEVELS.CRITICAL : ALERT_LEVELS.WARNING,
`High CPU Usage: ${usage.toFixed(2)}%`,
{ cpuUsage: usage }
);
},
alertMemoryLeak: (info, snapshotPath) => {
sendAlert(
ALERT_LEVELS.CRITICAL,
'Memory Leak Detected',
{ ...info, snapshotPath }
);
},
alertHighErrorRate: (rate, timeWindow) => {
sendAlert(
rate > 0.1 ? ALERT_LEVELS.CRITICAL : ALERT_LEVELS.ERROR,
`High Error Rate: ${(rate * 100).toFixed(2)}%`,
{ rate, timeWindow }
);
},
alertDatabase ConnectionIssue: (error) => {
sendAlert(
ALERT_LEVELS.CRITICAL,
'Database Connection Issue',
{ error: error.message, stack: error.stack }
);
}
};

Complete Node.js Monitoring Implementation

Combining all the components creates a comprehensive monitoring solution:

javascript

// monitoring.js - Main monitoring integration module
const os = require('os');
const process = require('process');
const express = require('express');
const memwatch = require ('@airbnb/node-memwatch');
const blocked = require('blocked');
const gcStats = require('gc-stats')();
// Import custom modules
const alerts = require('./alerting');
const { setupProfiling Endpoints } = require ('./cpu-profiler');
// Initialize monitoring
function initialize Monitoring(app) {
// Set up health check endpoints
setupHealthChecks (app);
// Set up profiling endpoints
setupProfiling Endpoints(app);
// Set up metrics collection
setupMetrics Collection();
// Configure error tracking
setupError Tracking();
// Set up memory monitoring
setupMemory Monitoring();
// Set up event loop monitoring
setupEventLoop Monitoring();
// Set up garbage collection monitoring
setupGC Monitoring();
// Log initialization
console.log ('Node.js monitoring initialized');
}
// Set up health check endpoints
function setupHealthChecks (app) {
// Import and use health check router
const healthRouter = require('./health');
app.use('/health', healthRouter);
}
// Set up metrics collection
function setupMetrics Collection() {
// Basic system metrics
setInterval(() => {
const memoryUsage = process.memoryUsage ();
const cpuUsage = getCpuUsagePercentage ();
// Record metrics
recordMetrics({
timestamp: Date.now(),
memory: {
rss: memoryUsage.rss / 1024 / 1024,
heapTotal: memoryUsage.heapTotal / 1024 / 1024,
heapUsed: memoryUsage.heapUsed / 1024 / 1024,
external: memoryUsage.external / 1024 / 1024,
memoryUtilization: memoryUsage.heapUsed / memoryUsage.heapTotal
},
cpu: {
usage: cpuUsage,
load: os.loadavg()
},
system: {
uptime: process.uptime()
}
});
// Check for high resource usage
if (memoryUsage.heapUsed / memoryUsage.heapTotal > 0.8) {
alerts. alertHighMemoryUsage (memoryUsage);
}
if (cpuUsage > 80) {
alerts. alertHighCpuUsage (cpuUsage);
}
}, 30000); // Every 30 seconds
}
// Calculate CPU usage percentage
function getCpuUsage Percentage() {
// This is a simplified implementation
// For production, use a more sophisticated approach with multiple samples
return os.loadavg()[0] * 100 / os.cpus().length;
}
// Set up error tracking
function setupError Tracking() {
// Track global unhandled errors
process.on ('uncaughtException', (error) => {
console.error ('Uncaught Exception:', error);
// Record error
recordError('uncaughtException', error);
// Send alert
alerts.alertError(error);
// Attempt graceful shutdown
process.exit(1);
});
process.on ('unhandledRejection', (reason, promise) => {
console.error ('Unhandled Rejection:', reason);
// Record error
recordError ('unhandledRejection', reason);
// Send alert
alerts.alertError (reason);
});
}
// Set up memory monitoring
function setupMemory Monitoring() {
// Monitor for memory leaks
memwatch.on ('leak', (info) => {
console.warn ('Memory leak detected:', info);
// Record leak
recordMemoryLeak (info);
// Send alert
alerts.alertMemoryLeak (info);
});
// Monitor heap stats
memwatch.on ('stats', (stats) => {
recordMemoryStats (stats);
});
}
// Set up event loop monitoring
function setupEvent LoopMonitoring() {
// Monitor event loop blocking
blocked((ms) => {
console.warn(`Event loop blocked for ${ms}ms`);
// Record blocking
recordEventLoop Blocked(ms);
// Alert on significant blocking
if (ms > 100) {
alerts.alertEvent LoopLag(ms);
}
}, { threshold: 50 });
}
// Set up garbage collection monitoring
function setupGC Monitoring() {
gcStats.on ('stats', (stats) => {
// Record GC stats
recordGCStats (stats);
// Alert on long GC pauses
if (stats.pause > 200) {
alerts.alertLongGC Pause(stats);
}
});
}
// Record metrics (implement based on your metrics storage)
function recordMetrics (metrics) {
// This would connect to your metrics storage system
// Example: Prometheus, InfluxDB, etc.
console.log ('Metrics recorded:', metrics);
}
// Record errors
function recordError (type, error) {
// Log error to your error tracking system
console.error (`Error recorded (${type}):`, error);
}
// Record memory leak
function recordMemory Leak(info) {
console.warn ('Memory leak recorded:', info);
}
// Record memory stats
function recordMemory Stats(stats) {
console.log('Memory stats recorded:', stats);
}
// Record event loop blocking
function recordEventLoop Blocked(duration) {
console.warn('Event loop blocked:', duration);
}
// Record garbage collection stats
function recordGCStats (stats) {
console.log('GC stats recorded:', stats);
}
// Export the monitoring initialization function
module.exports = {
initializeMonitoring
};

Conclusion

Monitoring Node.js applications effectively requires a multifaceted approach that addresses the unique characteristics of the Node.js runtime. By implementing the strategies outlined in this guide, you can gain comprehensive visibility into your application's health, performance, and resource utilization.

Key takeaways from this guide include:

  1. Focus on Node.js-specific metrics like event loop lag, memory patterns, and garbage collection behavior that directly impact application performance.
  2. Implement both external availability monitoring with Odown and internal health metrics collection to get a complete picture of application reliability.
  3. Use specialized monitoring for memory leak detection, CPU profiling, and event loop blocking to address common Node.js performance challenges.
  4. Configure intelligent alerting thresholds based on application characteristics and business impact to ensure appropriate response to issues.
  5. Integrate monitoring with your CI/CD pipeline and development workflow to catch performance issues before they reach production.

By combining these approaches, you can build a robust monitoring system that helps maintain optimal performance for your Node.js applications while enabling rapid troubleshooting when issues arise.