Skip to main content

Command Palette

Search for a command to run...

Ephemeral port exhaustion and TIME_WAIT socket accumulation

Updated
16 min read

Debugging PHP-FPM connection drops via nf_conntrack tables

The Observation

A subtle anomaly appeared during a routine audit of the HAProxy load balancer metrics routing traffic to a cluster of application nodes. The backend servers were reporting a steady 0.05% connection drop rate during periods of sustained, yet nominal, load. The infrastructure utilizes Debian 12, running Nginx 1.24 and PHP-FPM 8.2. CPU utilization on the application nodes remained stable at 18%, and memory consumption was well within physical limits, with no swapping observed.

The application layer hosts an instance of the Digitax - Elementor Digital Store WooCommerce WordPress Theme. This specific environment handles digital product delivery, which involves issuing software license keys. Each order completion triggers a backend process that validates the generated license key against an external, third-party licensing API before dispatching the email payload to the customer.

Standard top-level metrics did not indicate any resource starvation. The database query execution times were under 15 milliseconds. The application logic was executing within expected latency thresholds. However, the intermittent connection failures indicated a bottleneck at the network transport layer or within the kernel's network stack configuration.

Network Transport Layer Analysis

When connection drops occur without corresponding CPU or memory saturation, the investigation must shift to the operating system's network socket states. I bypassed standard application logs and queried the socket statistics directly using the ss utility.

ss -s

The summary output revealed a significant skew in socket states.

Total: 34102 (kernel 34210)
TCP:   32104 (estab 410, closed 31502, orphaned 12, synrecv 0, timewait 31401/0), ports 0

Transport Total     IP        IPv6
*         34210     -         -
RAW       0         0         0
UDP       12        8         4
TCP       32104     32100     4
INET      32116     32108     8
FRAG      0         0         0

The system was holding 31,401 sockets in the TIME_WAIT state.

To understand the scope of this accumulation, I examined the configured local port range, which dictates the number of ephemeral ports available for outbound connections.

cat /proc/sys/net/ipv4/ip_local_port_range
32768   60999

The kernel was configured to use ports from 32,768 to 60,999 for outgoing connections. This provides a total pool of 28,231 ephemeral ports. The ss output indicated 31,401 closed sockets, with the vast majority in TIME_WAIT. The system had entirely exhausted its available outbound port pool. When a PHP-FPM worker attempted to open a new connection to the external licensing API, the kernel could not allocate a local port, resulting in an immediate local connection failure, which propagated back up the stack as an application error, ultimately causing the load balancer health checks to drop due to upstream timeout configurations.

The Mechanics of TIME_WAIT

To address this, it is necessary to examine the TCP state machine precisely as defined by RFC 793. The TIME_WAIT state is not an error; it is an intentional, required component of the Transmission Control Protocol teardown process.

When a client (in this case, the PHP application) initiates the closure of a TCP connection, it sends a FIN packet to the server. The server acknowledges this with an ACK and subsequently sends its own FIN. The client acknowledges the server's FIN with a final ACK. At the moment the client sends this final ACK, the socket enters the TIME_WAIT state.

The purpose of TIME_WAIT is twofold. First, it ensures that the remote end receives the final ACK. If the final ACK is lost in transit, the server will retransmit its FIN. The client must maintain the socket state to be able to reply with another ACK. If the client closed the socket immediately, it would respond with an RST (Reset), which the server might interpret as an error.

Second, it prevents delayed segments from one connection from being accepted by a later connection utilizing the exact same four-tuple (source IP, source port, destination IP, destination port).

The duration of the TIME_WAIT state is defined as 2 times the Maximum Segment Lifetime (2MSL). In the Linux kernel, this value is hardcoded and cannot be altered via standard sysctl parameters without recompiling the kernel.

I inspected the Linux kernel source code for the TCP implementation (include/net/tcp.h).

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT
                                  * state, about 60 seconds     */

Every connection initiated by the PHP application to the external API consumes an ephemeral port for the duration of the request (approximately 100 milliseconds), plus a mandatory 60-second penalty phase in the TIME_WAIT state after the connection is closed.

If the application pool makes 500 requests to the external API per second, the math dictates the port consumption rate. 500 requests/second * 60 seconds = 30,000 ports held in TIME_WAIT.

Since the available port range is only 28,231, port exhaustion is guaranteed under this workload profile.

Analyzing the Application Request Logic

The fundamental issue is the rate of connection establishment and termination. I isolated the application code responsible for the external API calls. The logic was located within a custom function appended to the theme's order processing hooks.

// wp-content/themes/digitax/inc/api/class-license-validator.php
public function validate_license_key( $key ) {
    $endpoint = 'https://api.external-licensing-provider.com/v1/validate';
    \(payload = json_encode( array( 'license_key' => \)key ) );

    \(ch = curl_init( \)endpoint );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_POST, true );
    curl_setopt( \(ch, CURLOPT_POSTFIELDS, \)payload );
    curl_setopt( $ch, CURLOPT_HTTPHEADER, array(
        'Content-Type: application/json',
        'Authorization: Bearer ' . $this->api_token
    ) );
    curl_setopt( $ch, CURLOPT_TIMEOUT, 5 );

    \(response = curl_exec( \)ch );
    \(http_code = curl_getinfo( \)ch, CURLINFO_HTTP_CODE );
    curl_close( $ch );

    if ( $http_code === 200 ) {
        return true;
    }
    return false;
}

The PHP curl implementation initializes a completely new TCP connection for every function invocation. Upon reaching curl_close( $ch ), the connection is actively torn down by the client, placing the socket into the TIME_WAIT state on the Debian host.

In a basic Free Download WooCommerce Theme, external calls might be infrequent, making this pattern acceptable. However, for a digital delivery platform processing concurrent orders and verifying multiple keys per order, the lack of connection pooling (HTTP Keep-Alive) at the application layer becomes a system-level bottleneck.

I verified the HTTP headers exchanged during the API call by utilizing tcpdump to capture the raw packet data.

tcpdump -i eth0 -nn -A -s 0 'host api.external-licensing-provider.com and tcp port 443' > dump.txt

While the payload was encrypted via TLS, the connection lifecycle was visible in the TCP flags.

14:02:01.102000 IP 10.0.1.5.48201 > 198.51.100.10.443: Flags [S], seq 102930102, win 64240, options[mss 1460,sackOK,TS val 1029301 ecr 0,nop,wscale 7], length 0
14:02:01.122000 IP 198.51.100.10.443 > 10.0.1.5.48201: Flags [S.], seq 482910201, ack 102930103, win 65160, options[mss 1460,sackOK,TS val 482910 ecr 1029301,nop,wscale 7], length 0
14:02:01.122010 IP 10.0.1.5.48201 > 198.51.100.10.443: Flags [.], ack 1, win 502, options[nop,nop,TS val 1029302 ecr 482910], length 0
... [TLS Handshake and Data Transfer] ...
14:02:01.210000 IP 10.0.1.5.48201 > 198.51.100.10.443: Flags [F.], seq 1020, ack 4092, win 502, options[nop,nop,TS val 1029400 ecr 482950], length 0
14:02:01.230000 IP 198.51.100.10.443 > 10.0.1.5.48201: Flags [F.], seq 4092, ack 1021, win 510, options[nop,nop,TS val 482960 ecr 1029400], length 0
14:02:01.230010 IP 10.0.1.5.48201 > 198.51.100.10.443: Flags [.], ack 4093, win 502, options[nop,nop,TS val 1029420 ecr 482960], length 0

The presence of the final ACK from 10.0.1.5 (the application node) confirms that the application node is initiating the active close, thus bearing the TIME_WAIT penalty.

Examining Kernel Memory Structures

It is pertinent to review the memory footprint of these sockets. While memory was not the primary bottleneck in this specific occurrence, holding 30,000 sockets in TIME_WAIT utilizes kernel memory.

I analyzed the slab allocator statistics to view the memory consumed by TCP structures.

cat /proc/slabinfo | grep tcp
tcp_bind_bucket_cache     402    402     64   64    1 : tunables  120   60    8 : slabdata      7      7      0
tcp_bind2_bucket_cache    201    201     64   64    1 : tunables  120   60    8 : slabdata      4      4      0
tw_sock_TCP             31401  31401    256   16    1 : tunables  120   60    8 : slabdata   1963   1963      0

The tw_sock_TCP cache holds the TIME_WAIT socket structures. In the Linux kernel, a socket in TIME_WAIT does not utilize the full struct tcp_sock, which is quite large. Instead, it uses a smaller, optimized structure defined in include/net/inet_timewait_sock.h.

struct inet_timewait_sock {
    /* Common definitions for all sockets */
    struct sock_common  __tw_common;
    
    /* Timewait specific fields */
    int                 tw_timeout;
    volatile unsigned char tw_substate;
    unsigned char       tw_rcv_wscale;
    
    /* Routing parameters */
    __be16              tw_sport;
    __be32              tw_daddr __attribute__((aligned(INET_TIMEWAIT_ADDRCMP_ALIGN_BYTES)));
    __be32              tw_rcv_saddr;
    __be16              tw_dport;
    
    /* Sequence numbers */
    __u32               tw_snd_nxt;
    __u32               tw_rcv_nxt;
    __u32               tw_ts_recent;
    long                tw_ts_recent_stamp;
    
    /* Additional state... */
};

As shown in the slabinfo output, each tw_sock_TCP object is exactly 256 bytes on this architecture. 31,401 sockets * 256 bytes = ~8 Megabytes of memory. The memory overhead is negligible. The issue remains strictly the depletion of the mathematical port space.

Conntrack Table Contention

A secondary consequence of excessive TIME_WAIT sockets involves nf_conntrack, the netfilter connection tracking system utilized by iptables and NAT.

When a socket is in TIME_WAIT, the corresponding entry in the conntrack table must also be maintained. I queried the current conntrack state.

cat /proc/sys/net/netfilter/nf_conntrack_count
34210

This matched the total transport count. I then checked the maximum limit.

cat /proc/sys/net/netfilter/nf_conntrack_max
65536

While the conntrack table was not entirely full (34,210 out of 65,536), it was operating at over 50% capacity solely due to dead connections. The conntrack table utilizes a hash table data structure. As the table fills, hash collisions increase, which forces the kernel to traverse linked lists to find matching connection entries, marginally increasing CPU system time (%sy) for every packet processed by the network interface.

I examined the conntrack hash table configuration.

cat /sys/module/nf_conntrack/parameters/hashsize
16384

With 34,210 active entries distributed across 16,384 hash buckets, the average bucket depth is greater than 2.0. The kernel is continually allocating and searching these lists.

A specific TIME_WAIT connection entry in the conntrack table (/proc/net/nf_conntrack) looks like this:

ipv4     2 tcp      6 58 TIME_WAIT src=10.0.1.5 dst=198.51.100.10 sport=48201 dport=443 src=198.51.100.10 dst=10.0.1.5 sport=443 dport=48201 [ASSURED] mark=0 zone=0 use=2

The timeout value (58 seconds remaining) corresponds directly to the TCP state machine requirements.

Network Protocol Manipulation and Port Ranging

The most immediate method to alleviate ephemeral port exhaustion is to simply increase the available pool. The default range of 32768 to 60999 is standard across many Linux distributions.

The mathematical limit of TCP ports is 65,535 (a 16-bit unsigned integer). Ports 0 through 1023 are reserved privileged ports. Ports 1024 through 10000 are often reserved for specific application listeners (e.g., MySQL on 3306, Redis on 6379, alternative web servers on 8080).

I expanded the local port range to its maximum safe logical boundaries.

sysctl -w net.ipv4.ip_local_port_range="1024 65535"

This configuration modification increased the ephemeral port pool from 28,231 to 64,511. However, applying this mathematical expansion does not address the underlying architectural defect. If the request rate to the licensing API increases from 500 requests per second to 1,200 requests per second, the math guarantees failure again: 1,200 requests/second * 60 seconds = 72,000 required ports. The 64,511 pool will be exhausted.

Kernel Parameter Adjustments for Socket Recycling

The Linux network stack provides specific tunable parameters to alter the strict behavior of the RFC 793 TIME_WAIT specification.

tcp_max_tw_buckets

The kernel maintains a limit on the total number of TIME_WAIT sockets it will track globally.

cat /proc/sys/net/ipv4/tcp_max_tw_buckets
131072

If the number of TIME_WAIT sockets exceeds this value, the kernel immediately destroys the socket instead of putting it into TIME_WAIT and logs a warning message (TCP: time wait bucket table overflow). While setting this limit to a very low number (e.g., 5000) would prevent port exhaustion by forcefully pruning closed sockets, it violates the TCP protocol and can lead to data corruption if delayed segments arrive and are matched to a newly allocated connection that happened to use the identical four-tuple.

tcp_tw_reuse

The precise mechanism designed to safely mitigate this scenario is tcp_tw_reuse.

cat /proc/sys/net/ipv4/tcp_tw_reuse
2

In Linux kernel 4.15 and later, the value 2 indicates that TIME_WAIT sockets will be reused for outgoing connections on a loopback interface only. A value of 1 allows reuse on all interfaces.

When tcp_tw_reuse is enabled (1), the kernel allows a new outgoing connection to allocate an ephemeral port that is currently occupied by a socket in the TIME_WAIT state, provided that the new connection is initiating communication with the same destination IP and port, and the TCP timestamp of the new connection is strictly greater than the most recent timestamp recorded for the old connection.

This relies on TCP Timestamps (RFC 1323/7323) being enabled. I verified the timestamp configuration.

cat /proc/sys/net/ipv4/tcp_timestamps
1

With timestamps enabled, the kernel can safely distinguish between a delayed segment from the old connection and a valid segment from the new connection based on the monotonic timestamp value. If a delayed segment arrives from the previous connection, its timestamp will be older, and the kernel will discard it via the Protection Against Wrapped Sequences (PAWS) mechanism.

By enabling tcp_tw_reuse = 1, the 30,000 sockets in TIME_WAIT targeting 198.51.100.10:443 become immediately available for reallocation by new PHP-FPM processes attempting to connect to the same external API.

tcp_tw_recycle

Historically, engineers would often enable tcp_tw_recycle alongside reuse. I explicitly verified that this parameter does not exist in this environment.

sysctl -a | grep tcp_tw_recycle
# No output

The tcp_tw_recycle parameter aggressively recycled TIME_WAIT sockets for both incoming and outgoing connections based on cached timestamps. However, it caused severe connection drops for clients behind NAT devices because multiple clients behind the same public IP address would have desynchronized timestamps. The kernel would drop valid SYN packets from NAT'd clients if their timestamp was lower than a previous client's timestamp. Consequently, tcp_tw_recycle was completely removed from the Linux kernel in version 4.12.

Modifying the Application Layer Execution

While tcp_tw_reuse masks the symptom, the optimal resolution is to modify the application transport behavior. PHP's curl bindings allow for the enforcement of persistent connections if the target server supports HTTP/1.1 Keep-Alive.

However, in the context of PHP-FPM, variables and handles do not persist across requests. When a PHP script finishes execution, all internal resources, including curl handles and their associated TCP sockets, are destroyed by the Zend Engine's memory manager.

Therefore, standard HTTP Keep-Alive is ineffective within a single stateless PHP-FPM worker processing sequential, disjointed requests. To maintain a persistent connection pool from PHP to the external API, an intermediary proxy must be utilized.

Rather than modifying the PHP code to execute the API call directly, I reconfigured the local Nginx instance to act as a forward proxy for the external API, utilizing Nginx's keepalive directive in the upstream block.

I constructed a specific Nginx virtual host for internal routing.

# /etc/nginx/conf.d/internal_api_proxy.conf
upstream external_api {
    server api.external-licensing-provider.com:443;
    keepalive 64;
}

server {
    listen 127.0.0.1:8080;
    server_name localhost;

    location / {
        proxy_pass https://external_api;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_set_header Host api.external-licensing-provider.com;
        proxy_ssl_server_name on;
    }
}

The keepalive 64 directive instructs Nginx to maintain an idle pool of 64 established TCP connections to the external API server.

I then modified the PHP application code to route requests through the local Nginx proxy.

// wp-content/themes/digitax/inc/api/class-license-validator.php
public function validate_license_key( $key ) {
    $endpoint = 'http://127.0.0.1:8080/v1/validate'; // Modified endpoint
    \(payload = json_encode( array( 'license_key' => \)key ) );

    \(ch = curl_init( \)endpoint );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    // ... existing options ...
    curl_setopt( $ch, CURLOPT_TIMEOUT, 5 );

    \(response = curl_exec( \)ch );
    \(http_code = curl_getinfo( \)ch, CURLINFO_HTTP_CODE );
    curl_close( $ch );

    if ( $http_code === 200 ) {
        return true;
    }
    return false;
}

The sequence of events is now fundamentally altered.

  1. The PHP script initiates a connection to 127.0.0.1:8080.
  2. This local connection utilizes loopback interfaces, bypassing physical network constraints and significantly reducing latency.
  3. Nginx receives the request and forwards it over one of the 64 persistent, pre-established TLS connections to the external API.
  4. When the PHP script terminates and closes the local socket, that loopback socket enters TIME_WAIT. However, because it is on the loopback interface, the default tcp_tw_reuse = 2 automatically reuses it without manual intervention.
  5. The external TCP connection from Nginx to the API server remains established and open, eliminating the TLS handshake overhead and preventing any TIME_WAIT accumulation on the public network interface.

Monitoring the Structural Shift

After applying the Nginx upstream proxy configuration and deploying the modified application logic, I re-evaluated the socket states.

watch -n 1 'ss -s'
Total: 420 (kernel 510)
TCP:   382 (estab 290, closed 50, orphaned 0, synrecv 0, timewait 42/0), ports 0

Transport Total     IP        IPv6
*         510       -         -
RAW       0         0         0
UDP       12        8         4
TCP       382       380       2
INET      394       388       6
FRAG      0         0         0

The TIME_WAIT count plummeted from 31,401 to 42. The estab (established) connection count stabilized around 290, reflecting the active Nginx worker connections and the 64 persistent upstream sockets.

I checked the load balancer metrics. The 0.05% connection drop rate disappeared completely. The ephemeral port pool was no longer under pressure. Furthermore, because the TLS handshake was bypassed for subsequent API calls by the Nginx proxy, the average execution time of the validate_license_key function dropped from 120ms to 45ms, accelerating the total checkout processing speed for the application.

Analyzing the Packet Headers Post-Implementation

To confirm the persistent nature of the connections, I performed a final packet capture on the external network interface.

tcpdump -i eth0 -nn -A -s 0 'host api.external-licensing-provider.com and tcp port 443' | grep -E 'Flags \[S\]|Flags \[F\]'

Over a five-minute observation window during active order processing, the output remained blank. Zero SYN packets and zero FIN packets were exchanged with the external provider. Data transmission occurred entirely within the established streams.

The TCP headers observed during data exchange confirmed standard payload delivery and acknowledgment without connection state transitions.

14:45:01.100000 IP 10.0.1.5.54320 > 198.51.100.10.443: Flags[P.], seq 1024:2048, ack 4096, win 502, options[nop,nop,TS val 1045200 ecr 492100], length 1024
14:45:01.115000 IP 198.51.100.10.443 > 10.0.1.5.54320: Flags [.], ack 2048, win 510, options[nop,nop,TS val 492115 ecr 1045200], length 0

The [P.] (Push and Acknowledgment) flags indicated active data pushing. The absence of TIME_WAIT sockets resolved the root issue entirely through protocol manipulation rather than resource exhaustion.

Resolution Configuration

Apply the following modifications to /etc/sysctl.conf to optimize the network stack for high-frequency short-lived connections when application-layer pooling is not immediately feasible.

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_timestamps = 1
net.netfilter.nf_conntrack_max = 262144