How to Monitor LEMP

For many years, your typical WordPress, Drupal and Joomla stack comprised a Linux server and an Apache web server. PHP ran as a module inside Apache whilst MySQL hummed along in the background.

Things have progressed since then. PHP moved outside Apache and into a FastCGI Process Manager. This came with several advantages, including adaptive process spawning and advanced process management. Different pools of workers can now run with different permissions, configuration and chrooted. Last but not least, Apache is often replaced by the increasingly popular Nginx in what’s now called a LEMP stack. We’re going to look at how to monitor LEMP.

How to monitor LEMP stacks

Taken collectively, those changes have “dismantled” the stack. In the past, all services ran from a single server. Now we can split various components into separate virtual machines or containers (for example, a stack could contain 1 web server, 2 PHP servers and 2 MySQL servers).

With newer and more dispersed options, come different—and greater—monitoring requirements. In light of that, we wanted to share our very own LEMP (Linux, Nginx, MySQL and PHP stack) monitoring checklist. That’s what we use here at Server Density, and we hope sysadmins out there find it useful. Please let us know if we’ve missed something.

Monitor the Linux Server

As Linux sits at the core of our stack, we need to ensure all basic system metrics (load, disk usage and swap usage) and all default system services are running as expected. These are the alerts I configure in Server Density for all my Linux servers:

Monitor LEMP

Metric	Comment	Suggested Alert
Device availability	If the device is not sending any metrics, it might have crashed, or network might be down.	No data over 300s
Load	If load is too high, you will experience performance degradation on the service.	Load > 4
Memory	If free memory is running low, server will start swapping, causing performance degradation	Free mem < 32MB
Swap	If swap usage is high (see line above) you will experience performance degradation	Swap usage > 256MB
Disk usage	Don’t let you partitions get full	Disk usage > 90%
NTP offset	If the clock gets out of sync we get all sort of inconsistencies. We should get an alert should that happen.	NTP offset > 0.5s
Process cron	Some default basic services are running, like cron	process count < 1
Process atd	atd for scheduled jobs	process count != 1
Process ntpd	ntpd for time synchronization	process count != 1
Process sshd	sshd for remote access	process count != 1
Process rsyslogd	rsyslogd for system logs	process count != 1

Note that some metrics like CPU and network don’t have any alerts. These metrics can change a lot, depending on time of the day and our usage patterns. I do include them as graphs in the dashboards for troubleshooting purposes, but creating an alert for them would be somewhat counter-productive.

Monitor Nginx

Next on the list is Nginx. We’ve already covered Nginx (see How to Monitor Nginx) and even held a webinar with them (see Deep dive tutorial: Monitoring Nginx). I won’t be recommending any specific alerts here as these depend on your website.

Here are some graphs I use for monitoring Nginx on a personal (and very much forlorn!) WordPress blog:

monitor LEMP

Nginx connections (per second) shows:

Requests per second received
Connections opened (multiple requests can happen within the same connection thanks to HTTP KeepAlive)
Connections dropped (because we hit the limit or don’t have enough resources)

Nginx requests status, shows what Nginx is doing at any given moment:

Total number of connections
Connections in reading status (Nginx is reading the request)
Connections in writing status (Nginx is writing the response)
Connections in waiting status (Nginx is waiting for the backend response, either PHP or MySQL are being slow here)

Reminder: Nginx configuration

Configure Nginx mod status in your site configuration file (probably /etc/nginx/sites-enabled/default) and restart the service to apply changes:

location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow ::1/128;
deny all;
}

Retrieve the metrics on the configured URL:

$ curl http://localhost/nginx_status\?auto
Active connections: 1
server accepts handled requests
1157920 1157920 1257226
Reading: 0 Writing: 1 Waiting: 0

This is the exact same method our Server Density Nginx plugin uses.

Monitor PHP FPM performance

As we saw earlier, Nginx graphs can tell us if the web server is waiting for the backend. But how do we keep an eye on our PHP performance? PHP FPM engine provides some metrics. Configuring this is a bit more cumbersome than Nginx. Here are the steps we use:

Configure PHP FPM status (probably /etc/php5/fpm/pool.d/www.conf):

pm.status_path = /phpfpm_status

ping.path = /phpfpm_ping

;ping.response = pong

Expose these metrics through Nginx using your site configuration file (probably /etc/nginx/sites-enabled/default) and restart the service to apply changes:

location /phpfpm_status {
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_pass unix:/var/run/php5-fpm.sock;
include fastcgi_params;
access_log off;
allow 127.0.0.1;
allow ::1/128;
deny all;
}

If you run PHP FPM on a different server (or in localhost but through a TCP socket) then change accordingly: fastcgi_pass 127.0.0.1:9000; and if you have multiple PHP servers, you can expose their metrics under different URLs.

Retrieve the metrics on the configured URL:

$ curl http://localhost/phpfpm_status
pool:                 www
process manager:      dynamic
start time:           02/Nov/2015:12:42:01 +0000
start since:          5456390
accepted conn:        1025508
listen queue:         0
max listen queue:     0
listen queue len:     0
idle processes:       2
active processes:     1
total processes:      3
max active processes: 5
max children reached: 3
slow requests:        0

It’d be risky to recommend specific alerts for PHP. That’s because metric values tend to range depending on the intricacies of each environment. Still, configuring these might be a good idea:

Metric	Comment	Suggested Alert
Uptime	If the process is respawning or crashing too often we want to be notified. Any server restarts will cause alerts, of course.	Uptime < 300
Max children reached	If we have more than a set number of PHP processes handling requests (over a period of time) we will experience a performance degradation.	Max reached per minute > 10
Listen queue	Queued up requests from Nginx indicate performance degradation.	Listen queue > 10

I was tempted to add an alert on slow requests here, but refrained from doing so as that would steer this checklist into application monitoring territory (a separate topic). Here is a detail of a PHP-FPM requests graph in Server Density:

php_fpm

Monitor MySQL database

The final component in the stack is MySQL (some folks out there use PostgreSQL). We think MySQL deserves an article in its own right, so we recently published this comprehensive list of our favourite MySQL alerts (see How to Monitor MySQL). Let us know what you think!

Monitoring LEMP

How to Monitor LEMP