Linux network tuning: TCP BBR, NIC ring buffers, and SFTP throughput

The server is on a Gbit link. ethtool confirms 1000 Mbps on the interface. And yet every SFTP transfer caps out somewhere around 800 KB/s. Not 80 MB/s — 800 kilobytes per second. Less than 1% of the theoretical capacity.

The problem isn't bandwidth. It's a stack of bad defaults: a congestion control algorithm designed for year-2000 networks, microscopically small NIC ring buffers, and application socket buffers sized for DSL lines. Here are the five tweaks that fixed it.

A packet's journey from the application down to the wire: at each stage, the kernel or daemon knob that unlocks throughput. Switching from CUBIC to BBR is the core tweak.

TCP BBR: replacing CUBIC

CUBIC has been Linux's default congestion control algorithm since 2006. It works by growing the congestion window (cwnd) until it detects a packet loss, then cutting it in half. On modern low-latency, high-bandwidth networks, this loss-triggered reaction is a problem: a single dropped packet (thermal noise, NIC buffer overflow) causes a 50% throughput drop followed by a slow recovery.

BBR (Bottleneck Bandwidth and Round-trip propagation time), developed by Google in 2016, doesn't react to loss — it models the actual link throughput by continuously measuring bandwidth and minimum RTT. It holds the window at the optimal level without waiting for drops. On a Gbit link under real traffic, the difference is immediately noticeable.

# /etc/sysctl.d/99-bbr.conf
net.ipv4.tcp_congestion_control = bbr        # BBR replaces CUBIC
net.ipv4.tcp_slow_start_after_idle = 0       # keep cwnd between download pauses
net.ipv4.tcp_no_metrics_save = 1             # ignore TCP metrics from previous sessions
net.ipv4.tcp_fastopen = 3                    # TCP Fast Open client+server (-1 RTT on reconnects)
net.ipv4.tcp_mtu_probing = 1                 # MTU probing if ICMP is blocked upstream
net.ipv4.tcp_wmem = 4096 262144 16777216     # TCP write buffer: default 256 KB, max 16 MB
net.core.netdev_max_backlog = 5000           # packet queue before kernel processing at 1 Gbit/s
net.core.netdev_budget = 600                 # packets processed per NAPI cycle

# Apply immediately (persists across reboots via the file)
sysctl -p /etc/sysctl.d/99-bbr.conf

# Verify
sysctl net.ipv4.tcp_congestion_control
# → net.ipv4.tcp_congestion_control = bbr

tcp_slow_start_after_idle deserves special attention for SFTP. By default, Linux resets cwnd to 10 after a period of inactivity on a TCP connection. An interactive SFTP session alternates activity and pauses — every time you start a new transfer after browsing a directory, TCP restarts in slow start. Disabling this avoids that unnecessary reset.

lsmod | grep bbr
# → tcp_bbr               20480  1
# If absent: modprobe tcp_bbr

NIC ring buffers: the main cause of drops

This is probably the highest-impact change, and the least documented one. A NIC ring buffer is a memory region between the network card and the kernel. The NIC deposits received packets there. If the buffer fills before the kernel processes them, new packets are silently dropped.

The default on most Ethernet cards is 256 or 512 entries RX. At 1 Gbit/s with 1500-byte frames, that's ~83,000 packets per second to absorb. A 256-entry buffer can saturate in under a millisecond if the kernel is temporarily busy elsewhere. The result: invisible drops, a collapsing CUBIC cwnd, and stalled throughput.

# Check current state
ethtool -g enp2s0f0
# Ring parameters for enp2s0f0:
# Pre-set maximums:
# RX:     4096
# TX:     4096
# Current hardware settings:
# RX:     256    ← too small
# TX:     256    ← too small

# Apply (takes effect immediately, lost on reboot)
ethtool -G enp2s0f0 rx 4096 tx 4096

# Verify
ethtool -g enp2s0f0
# Current hardware settings:
# RX:     4096
# TX:     4096

To make the setting persistent, add it to /etc/network/interfaces after the interface configuration:

# In /etc/network/interfaces, after dns-search (or at the end of the iface block)
    post-up ethtool -G enp2s0f0 rx 4096 tx 4096

Replace enp2s0f0 with your actual interface name (ip link show to list). Not all NICs support 4096 — ethtool -g shows the maximums under "Pre-set maximums".

# Monitor NIC drops in real time
watch -n1 ethtool -S enp2s0f0 | grep -i drop
# rx_missed_errors and rx_fifo_errors indicate ring buffer drops

ProFTPD and Apache: application-level buffers

Once the kernel and NIC layers are tuned, application buffers become the next bottleneck. ProFTPD and Apache defaults are sized for modest connections — they haven't been meaningfully updated since a time when Gbit/s was reserved for carrier backbones.

ProFTPD — socket buffers

ProFTPD exposes SocketOptions to control socket buffer sizes at the TCP level. Default values (rcvbuf and sndbuf) are typically 87 KB or less. Bumping to 1 MB lets the kernel hold more data in flight without waiting on client ACKs:

# In /etc/proftpd/proftpd.conf, after UseSendFile on
SocketOptions rcvbuf 1048576 sndbuf 1048576

systemctl reload proftpd

Apache — send buffer

Apache has its own SendBufferSize directive that controls the TCP send buffer size for HTTP connections. Without it, Apache uses the system default, often 87 KB. On a Gbit link serving large static files or substantial API responses, this is a real constraint.

# In /etc/apache2/apache2.conf (global level)
SendBufferSize 1048576

apache2ctl configtest && systemctl reload apache2

TLS: session cache and OCSP Stapling

A full TLS handshake costs 1 to 2 extra RTTs. On HTTPS, a client that reconnects frequently — browser reopening connections, transfer tool with reconnects, monitoring — multiplies these RTTs unnecessarily. TLS session cache allows resuming an existing session without a full handshake.

OCSP Stapling solves a different problem: without stapling, the browser must contact the CA's OCSP server to verify certificate validity. That's an additional external round-trip, potentially slow, on every new connection. With stapling, Apache prefetches the OCSP response and includes it in the TLS handshake — the client doesn't need to contact the CA.

# In /etc/apache2/mods-enabled/ssl.conf

# Shared memory session cache: 10 MB (~40,000 simultaneous sessions)
SSLSessionCache         shmcb:${APACHE_RUN_DIR}/ssl_scache(10485760)
SSLSessionCacheTimeout  3600

# OCSP Stapling: Apache prefetches and caches the OCSP response
SSLUseStapling          on
SSLStaplingCache        shmcb:/var/run/apache2/ssl_stapling(2097152)
SSLStaplingReturnResponderErrors off

apache2ctl configtest && systemctl reload apache2

SSLStaplingReturnResponderErrors off matters: without it, if the upstream OCSP server is temporarily unreachable, Apache returns the error to the client instead of falling back gracefully. Not what you want in production.

Results and verification

After applying all these changes, SFTP throughput went from ~800 KB/s to several tens of MB/s on the same Gbit connection. The bottleneck was never the network — it was the accumulation of bad defaults at each layer of the stack.

# BBR active
sysctl net.ipv4.tcp_congestion_control
# → bbr

# NIC ring buffers
ethtool -g enp2s0f0 | grep -A5 "Current hardware"
# → RX: 4096 / TX: 4096

# TCP stats (retransmissions, drops)
ss -s
# → retrans: 0/0 ideally

# SFTP throughput test (1 GB file)
sftp user@server <<< $'put /tmp/testfile /tmp/testfile'

# Verify OCSP Stapling
openssl s_client -connect your-domain.com:443 -status < /dev/null 2>&1 | grep -i "OCSP"
# → OCSP Response Status: successful (0x0)

# Verify TLS session resumption
openssl s_client -connect your-domain.com:443 -reconnect 2>&1 | grep "Reused"
# → Reused, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384

Key takeaway

These settings should have been the defaults for the kernel and daemons long ago. BBR has been stable since 2016. A 256-entry ring buffer on a Gbit card makes no sense. An 87 KB socket buffer on a modern connection is folklore.

The technical difficulty is low — each change is a single line. The problem is that these defaults produce no visible error: throughput is just "disappointing", logs show nothing, and you spend hours looking at the application before checking the network stack.

One caveat: adapt the values to your actual topology. tcp_wmem at 16 MB on a machine with 512 MB RAM and 1000 concurrent connections may create other problems. These parameters are calibrated for a dedicated server with few simultaneous connections on a local or datacenter network.