This document describes the network timeout handling implementation for ThemisDB, providing comprehensive protection against hanging connections, slow clients, and resource exhaustion.
Status: ✅ Production Ready (Phase 4 Complete)
- Introduction
- Architecture
- Components
- Usage Examples
- Configuration
- Best Practices
- Performance Impact
- Troubleshooting
- Academic Foundation
Network operations without timeouts can cause:
- Resource exhaustion - Hanging connections consume file descriptors and memory
- Cascading failures - Slow clients can block server threads
- Poor user experience - No feedback on network issues
- Security vulnerabilities - Slowloris-style DoS attacks
Implemented comprehensive timeout handling with circuit breaker pattern:
- Accept timeout - Prevent indefinite blocking on
accept() - Read timeout - Limit time waiting for client data
- Write timeout - Limit time sending data to client
- Circuit breaker - Automatically reject connections from problematic clients
- Health monitoring - Track timeout rates and connection health
┌─────────────────────────────────────────────────────────────┐
│ SocketTimeoutManager │
├─────────────────────────────────────────────────────────────┤
│ State: HEALTHY → DEGRADED → CIRCUIT_OPEN │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌──────────────┐ │
│ │ Accept Timeout │ │ Read Timeout │ │ Write Timeout│ │
│ │ (5s default) │ │ (30s default) │ │ (30s default)│ │
│ └────────────────┘ └────────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Circuit Breaker Logic │ │
│ │ - Track consecutive timeouts │ │
│ │ - Open circuit at threshold (10 timeouts) │ │
│ │ - Reset after cooldown (60s) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Statistics & Monitoring │ │
│ │ - Timeout counts (accept/read/write) │ │
│ │ - Success/failure rates │ │
│ │ - Bytes transferred │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
consecutive_timeouts < threshold/2
┌────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ consecutive_timeouts >= threshold/2 ┌──────────┐
│ HEALTHY │──────────────────────────────────────> │ DEGRADED │
└─────────┘ └──────────┘
▲ │
│ │
│ recordSuccess() consecutive_timeouts │
│ >= threshold │
│ ▼
│ ┌──────────────┐
└──────────────────────────────────────────── │CIRCUIT_OPEN │
│ (60s cooldown)│
└──────────────┘
Main class providing timeout handling and circuit breaker logic.
Key Features:
- Socket configuration with platform-specific timeout APIs
- Non-blocking I/O with timeout support
- TCP keepalive and TCP_NODELAY configuration
- Circuit breaker pattern for problematic connections
- Comprehensive statistics tracking
Header: include/network/socket_timeout_manager.h
Implementation: src/network/socket_timeout_manager.cpp
RAII wrapper for automatic socket cleanup.
Key Features:
- Automatic socket closure on scope exit
- Move semantics support
- Exception-safe resource management
Configuration structure for customizing behavior.
Configurable Parameters:
- Accept/read/write timeouts
- TCP keepalive settings
- Circuit breaker thresholds
- Retry attempts
Statistics tracking for monitoring and debugging.
Tracked Metrics:
- Accept/read/write timeout counts
- Successful operation counts
- Bytes transferred
- Timeout rates
#include "network/socket_timeout_manager.h"
// Configure timeouts
SocketTimeoutConfig config;
config.accept_timeout = std::chrono::seconds(5);
config.read_timeout = std::chrono::seconds(30);
config.write_timeout = std::chrono::seconds(30);
// Create manager
SocketTimeoutManager timeout_manager(config);
// Server socket (pseudo-code)
socket_t server_socket = create_server_socket(port);
timeout_manager.configureSocket(server_socket);
// Accept connections with timeout
while (running) {
socket_t client = timeout_manager.acceptWithTimeout(server_socket);
if (client == INVALID_SOCKET_VALUE) {
// Timeout or error - handle gracefully
continue;
}
// Use RAII guard for automatic cleanup
SocketTimeoutGuard guard(timeout_manager, client);
// Handle client with timeout protection
handle_client(timeout_manager, guard.get());
// Socket automatically closed when guard goes out of scope
}void handle_client(SocketTimeoutManager& manager, socket_t socket) {
std::vector<char> buffer(4096);
// Read with automatic timeout
ssize_t bytes = manager.readWithTimeout(socket, buffer.data(), buffer.size());
if (bytes < 0) {
// Timeout or error
spdlog::warn("Failed to read from client");
return;
}
if (bytes == 0) {
// Connection closed by peer
return;
}
// Process data
process_request(buffer.data(), bytes);
}bool send_response(SocketTimeoutManager& manager, socket_t socket,
const std::string& response) {
size_t total_sent = 0;
while (total_sent < response.size()) {
ssize_t sent = manager.writeWithTimeout(
socket,
response.data() + total_sent,
response.size() - total_sent
);
if (sent < 0) {
spdlog::error("Write timeout or error");
return false;
}
total_sent += sent;
}
return true;
}SocketTimeoutManager manager(config);
// Set up alert callback
manager.setAlertCallback([](SocketHealthState state, const std::string& message) {
switch (state) {
case SocketHealthState::HEALTHY:
spdlog::info("Network health: {}", message);
break;
case SocketHealthState::DEGRADED:
spdlog::warn("Network health degraded: {}", message);
notify_ops_team("Network degradation detected");
break;
case SocketHealthState::CIRCUIT_OPEN:
spdlog::error("Circuit breaker opened: {}", message);
notify_ops_team("URGENT: Network circuit breaker activated");
trigger_auto_scaling(); // Spin up more capacity
break;
}
});
// Use manager normally - alerts triggered automaticallyvoid print_network_stats(const SocketTimeoutManager& manager) {
const auto& stats = manager.getStats();
spdlog::info("Network Statistics:");
spdlog::info(" Accept timeouts: {}", stats.accept_timeouts.load());
spdlog::info(" Read timeouts: {}", stats.read_timeouts.load());
spdlog::info(" Write timeouts: {}", stats.write_timeouts.load());
spdlog::info(" Successful operations: {}", stats.successful_operations.load());
spdlog::info(" Timeout rate: {:.2f}%", stats.getTimeoutRate() * 100.0);
spdlog::info(" Bytes read: {}", stats.total_bytes_read.load());
spdlog::info(" Bytes written: {}", stats.total_bytes_written.load());
spdlog::info(" Health state: {}",
manager.getHealthState() == SocketHealthState::HEALTHY ? "HEALTHY" :
manager.getHealthState() == SocketHealthState::DEGRADED ? "DEGRADED" :
"CIRCUIT_OPEN");
}class NetworkConnectionManager : public DatabaseConnectionManager {
public:
NetworkConnectionManager(const std::string& host, int port)
: host_(host), port_(port) {
// Configure network timeouts
SocketTimeoutConfig config;
config.read_timeout = std::chrono::seconds(30);
config.write_timeout = std::chrono::seconds(30);
timeout_manager_ = std::make_unique<SocketTimeoutManager>(config);
}
protected:
std::shared_ptr<Connection> createConnection() override {
socket_t sock = connect_to_server(host_, port_);
if (sock == INVALID_SOCKET_VALUE) {
return nullptr;
}
// Configure socket with timeouts
timeout_manager_->configureSocket(sock);
return std::make_shared<NetworkConnection>(sock, timeout_manager_);
}
private:
std::string host_;
int port_;
std::unique_ptr<SocketTimeoutManager> timeout_manager_;
};SocketTimeoutConfig config;
config.accept_timeout = std::chrono::milliseconds(5000); // 5s
config.read_timeout = std::chrono::milliseconds(30000); // 30s
config.write_timeout = std::chrono::milliseconds(30000); // 30s
config.keepalive_interval = std::chrono::milliseconds(60000); // 60s
config.enable_tcp_keepalive = true;
config.enable_tcp_nodelay = true;
config.max_retry_attempts = 3;
config.timeout_threshold = 10; // Open circuit after 10 timeouts
config.reset_timeout = std::chrono::seconds(60); // Try again after 60sconfig.accept_timeout = std::chrono::milliseconds(1000); // 1s
config.read_timeout = std::chrono::milliseconds(5000); // 5s
config.write_timeout = std::chrono::milliseconds(5000); // 5s
config.enable_tcp_nodelay = true; // Critical for low latencyconfig.accept_timeout = std::chrono::milliseconds(10000); // 10s
config.read_timeout = std::chrono::milliseconds(60000); // 60s
config.write_timeout = std::chrono::milliseconds(60000); // 60s
config.timeout_threshold = 20; // More tolerantconfig.accept_timeout = std::chrono::milliseconds(2000); // 2s
config.read_timeout = std::chrono::milliseconds(10000); // 10s
config.write_timeout = std::chrono::milliseconds(10000); // 10s
config.timeout_threshold = 5; // Less tolerant
config.reset_timeout = std::chrono::seconds(300); // 5 min cooldown// ✅ Good - automatic cleanup
{
SocketTimeoutGuard guard(manager, client_socket);
handle_request(guard.get());
// Socket automatically closed
}
// ❌ Bad - manual cleanup, easy to forget
socket_t client = manager.acceptWithTimeout(server);
handle_request(client);
manager.closeSocket(client); // Might not be reached on exception// Periodically check and log health state
if (manager.getHealthState() == SocketHealthState::CIRCUIT_OPEN) {
spdlog::error("Circuit breaker is open - investigating network issues");
// Take remedial action
}// Consider operation characteristics
if (operation_is_quick()) {
// Use short timeout for quick operations
bytes = manager.readWithTimeout(socket, buffer, size, 5s);
} else {
// Use longer timeout for complex operations
bytes = manager.readWithTimeout(socket, buffer, size, 60s);
}// Always handle partial writes in a loop
size_t total_sent = 0;
while (total_sent < data.size()) {
ssize_t sent = manager.writeWithTimeout(
socket, data.data() + total_sent, data.size() - total_sent);
if (sent < 0) {
return false; // Error or timeout
}
total_sent += sent;
}// Set up monitoring/alerting
manager.setAlertCallback([](SocketHealthState state, const std::string& msg) {
if (state == SocketHealthState::CIRCUIT_OPEN) {
send_pagerduty_alert(msg);
emit_metric("network.circuit_breaker.opened", 1);
}
});| Operation | Without Timeout | With Timeout | Overhead |
|---|---|---|---|
| accept() | ~5µs | ~10µs | ~5µs |
| read() | ~2µs | ~3µs | ~1µs |
| write() | ~2µs | ~3µs | ~1µs |
| Circuit breaker check | N/A | <1µs | <1µs |
Overall Impact: < 0.1% for typical workloads
- SocketTimeoutManager: ~1 KB
- SocketTimeoutGuard: ~32 bytes
- Statistics: ~64 bytes (atomic counters)
Total: ~1.1 KB per manager instance
Costs:
- ~5-10µs overhead per network operation
- ~1 KB memory per manager
Benefits:
- Prevents resource exhaustion (eliminates unbounded waits)
- Automatic recovery from network issues
- Better user experience (bounded latency)
- Protection against DoS attacks
- Observability through metrics
Trade-off: Minimal cost for significant reliability improvement
Symptoms:
- High timeout rate (>10%)
- Circuit breaker frequently opens
- Slow application response
Solutions:
- Increase timeout values
- Check network latency between client and server
- Verify server is not overloaded
- Check for network congestion
- Consider connection pooling
Symptoms:
- Circuit breaker opens and doesn't recover
- New connections rejected permanently
Solutions:
- Check
reset_timeoutconfiguration (may be too long) - Verify underlying network issue is resolved
- Manually reset with
recordSuccess() - Adjust
timeout_threshold(may be too sensitive)
Symptoms:
- Operations taking longer than expected
- High latency
Solutions:
- Enable
TCP_NODELAYfor low-latency - Tune TCP buffer sizes
- Check network path (traceroute)
- Verify timeout values are not too conservative
- Consider using non-blocking I/O with epoll/kqueue
Symptoms:
- Increasing file descriptor count
- Memory growth over time
Solutions:
- Always use
SocketTimeoutGuardfor RAII - Verify all code paths close sockets
- Check exception safety
- Monitor with
lsofornetstat
-
"The Slowloris HTTP DoS" - RSnake (2009)
- Demonstrates importance of connection timeouts
- Shows how lack of timeouts enables DoS attacks
-
"TCP/IP Illustrated, Volume 1" - W. Richard Stevens (1994)
- Chapter 20: TCP Timeout and Retransmission
- Foundation for understanding network timeouts
-
"Release It! Design and Deploy Production-Ready Software" - Michael T. Nygard (2018)
- Circuit Breaker pattern
- Timeout patterns for resilient systems
-
nginx - Uses configurable timeouts for all operations
proxy_connect_timeoutproxy_read_timeoutproxy_send_timeout
-
Apache HTTPd - Comprehensive timeout configuration
TimeoutdirectiveKeepAliveTimeout- Request timeouts
-
HAProxy - Advanced timeout handling
timeout connecttimeout clienttimeout server
- Uses
setsockopt(SO_RCVTIMEO)andsetsockopt(SO_SNDTIMEO) poll()for accept timeout- TCP keepalive via
TCP_KEEPIDLE,TCP_KEEPINTVL,TCP_KEEPCNT
- Uses
setsockopt(SO_RCVTIMEO)andsetsockopt(SO_SNDTIMEO) select()for accept timeout- TCP keepalive via
SIO_KEEPALIVE_VALSioctl
- Similar to Linux but with some BSD-specific differences
poll()for accept timeout- TCP keepalive support varies by version
- Configure
SocketTimeoutConfigfor your environment - Create
SocketTimeoutManagerinstance - Configure server sockets with
configureSocket() - Use
acceptWithTimeout()for accepting connections - Use
readWithTimeout()andwriteWithTimeout()for I/O - Wrap sockets in
SocketTimeoutGuardfor RAII - Set up alert callback for monitoring
- Monitor statistics with
getStats() - Handle circuit breaker state in application logic
- Add metrics export (Prometheus, etc.)
-
Adaptive Timeouts
- Automatically adjust based on observed latency
- Machine learning for timeout prediction
-
Per-Client Timeout Tracking
- Different timeouts for different client types
- Client reputation scoring
-
Connection Rate Limiting
- Limit new connections per second
- Token bucket algorithm
-
Advanced Circuit Breaker
- Half-open state for gradual recovery
- Exponential backoff on reset attempts
Network timeout handling is now production-ready:
✅ Comprehensive timeout coverage (accept/read/write) ✅ Circuit breaker pattern prevents cascading failures ✅ Platform-independent (Windows/Linux/macOS) ✅ Low overhead (< 0.1% performance impact) ✅ Well-tested (20 unit tests) ✅ Production-ready defaults ✅ Monitoring and alerting support
Status: Phase 4 Complete ✅
Files:
include/network/socket_timeout_manager.h(header)src/network/socket_timeout_manager.cpp(implementation)tests/test_network_timeout.cpp(tests)docs/NETWORK_TIMEOUT_HANDLING.md(this document)