Skip to main content

Command Palette

Search for a command to run...

Monitoring VFS Slab Cache Pressure on Lightweight WP Nodes

Updated
7 min read

Optimizing Directory Traversal for Ideko Theme Asset Loading

I recently moved a set of production instances to a new kernel branch, specifically migrating from the 5.15 LTS to the 6.1 series on Debian 12. The hardware is consistent across the fleet: AMD EPYC 7313P 16-Core processors, 128GB DDR4 ECC RAM, and a RAID 1 array consisting of two Samsung PM9A3 NVMe drives. One specific site, utilizing the Ideko - Modern & Lightweight WordPress Theme, began showing a subtle drift in Time to First Byte (TTFB). The baseline TTFB was 120ms; post-migration, it hovered around 165ms. This was not a failure, but a measurable degradation in efficiency.

The environment runs PHP 8.3 via FPM, with Nginx 1.24 as the reverse proxy. Ideko is, by design, a minimalist theme, which makes any performance variance more visible. It does not suffer from the bloat common in multipurpose frameworks, but it does rely on the standard WordPress template hierarchy. This means for every request, the PHP engine performs multiple filesystem lookups to determine if a specific template part exists in the child theme before falling back to the parent theme.

The Diagnostic Path: Slab Cache and I/O Wait

I began by checking for resource contention. CPU utilization was under 5%, and memory pressure was non-existent. Standard tools like top or htop provided no insight. I turned to iostat to monitor the NVMe performance.

iostat -xz 1

The %util for the nvme0n1 and nvme1n1 devices remained near 0.1%. Average service time (svctm) was in the microsecond range. This confirmed that the underlying block storage was not the bottleneck. The issue was higher up the stack, within the Linux Virtual File System (VFS) layer.

I executed slabtop to examine the kernel’s memory allocation for filesystem objects. In Linux, the VFS uses the slab allocator to manage caches for dentries (directory entries) and inodes.

slabtop -o

I noticed the dentry cache was consuming over 4GB of RAM, with millions of active objects. While the Ideko theme itself is small, the overall filesystem on this node contains several hundred thousand files due to multiple legacy staging environments. When a WooCommerce Theme integration is present, the number of directory traversals increases significantly because of the deep nesting of template overrides in /wp-content/themes/ideko/woocommerce/.

Every call to locate_template() triggers a series of stat() or access() syscalls. In the 6.1 kernel, the default behavior for slab reclamation seemed less aggressive, leading to a fragmented dentry cache. When PHP-FPM workers attempted to resolve paths, the kernel spent more cycles traversing these dentry structures than in the previous kernel version.

Deconstructing the Directory Lookup

In a WordPress context, the theme engine executes file_exists() checks frequently. For a theme like Ideko, which prioritizes modularity, a single page load might involve checking for 40 to 60 different file paths. If the kernel's dentry cache is fragmented, the lookup time for each file increases.

Consider the following execution flow within the WordPress core when the Ideko theme is active:

  1. Check child-theme/header.php
  2. If not found, check parent-theme/header.php
  3. Check child-theme/template-parts/content-page.php
  4. If not found, check parent-theme/template-parts/content-page.php

This is amplified when a WooCommerce Theme component is invoked. WooCommerce requires checking several dozen directories for potential overrides. If each stat() call takes 0.5ms instead of 0.05ms due to cache misses or slow traversal, the cumulative effect on TTFB is 20-30ms.

I used vmstat -m to look at the memory usage of the dentry and inode_cache specifically. The num_objs for dentries was disproportionately high compared to the actual number of files being accessed by the active PHP-FPM workers.

Kernel Tuning and VFS Cache Pressure

The Linux kernel has a parameter called vfs_cache_pressure, which controls the tendency of the kernel to reclaim memory used for caching of directory and inode objects. The default value is 100. Increasing this value makes the kernel more likely to reclaim these objects, keeping the cache lean but potentially increasing disk I/O. Decreasing it keeps more dentries in memory.

Given the 128GB of RAM available, memory was not the issue; the efficiency of the cache traversal was. I experimented with lowering the pressure to keep the relevant dentries for the Ideko theme and WooCommerce assets "hot" in the cache.

sysctl -w vm.vfs_cache_pressure=50

However, this did not yield the expected improvement. The issue was the sheer volume of "cold" dentries from other sites on the server polluting the cache. I needed the kernel to be more aggressive in clearing out unused dentries to keep the search space small. I increased the pressure.

sysctl -w vm.vfs_cache_pressure=200

This reduced the total dentry count significantly over several hours. Monitoring with slabtop showed a much tighter allocation. The TTFB for the Ideko-based site dropped back to 135ms.

PHP-FPM Path Optimization

The next step was at the application layer. PHP’s realpath_cache is designed to store the resolved paths of files to avoid repeated VFS lookups. For a site using the Ideko theme, the realpath_cache_size needs to be large enough to accommodate all template parts and plugin files.

In php.ini, the default is often 4096K. I increased this to 16M.

realpath_cache_size = 16M
realpath_cache_ttl = 600

Furthermore, I addressed the opcache.revalidate_path setting. When enabled, OPcache checks the file's existence even if the script is already cached. Disabling this reduces syscalls, but it requires a manual cache clear when files are updated. In a stable production environment for a lightweight theme, this is a preferred trade-off.

opcache.revalidate_path = 0
opcache.file_update_protection = 0

After applying these changes, I used strace on a single PHP-FPM worker (briefly, as to not impact production) to count the number of lstat calls.

strace -p [PID] -e trace=lstat 2>&1 | wc -l

Before the optimization, a single request triggered 114 lstat calls. After the path cache adjustments and kernel tuning, this was reduced to 28. The TTFB settled at 118ms, slightly better than the pre-migration baseline.

NVMe Interrupt Coalescing

Finally, I looked at the NVMe interrupt handling. High-performance drives like the PM9A3 can generate a high volume of interrupts, which can cause CPU "jitter" that affects latency-sensitive PHP processes. I verified the interrupt distribution across CPU cores.

cat /proc/interrupts | grep nvme

The interrupts were mostly handled by Core 0. I used irqbalance to distribute these across the EPYC’s physical cores to ensure that the PHP-FPM workers, which were affinity-bound to specific cores, were not being interrupted by I/O completion events from the NVMe controller.

By aligning the kernel's VFS management with the PHP application's directory lookup patterns, the performance of the Ideko theme was stabilized. The key takeaway is that "lightweight" software still depends on the underlying kernel's ability to navigate the filesystem efficiently. Metadata overhead is often more significant than raw data throughput in a WordPress environment.

To maintain this performance, monitor the slab cache regularly. If the dentry slab grows excessively without a corresponding increase in active file handles, it is time to force a reclamation or adjust the cache pressure.

# Force a one-time reclamation of dentries and inodes
sync; echo 2 > /proc/sys/vm/drop_caches

Avoid relying on default kernel parameters for high-memory systems. The defaults are often tuned for general-purpose desktop use, not for high-concurrency PHP workers performing thousands of filesystem checks per second. Keep the path resolution cache large and the kernel dentry slab lean.