A high-performance, fault-tolerant distributed file system written in C with support for data replication, TLS encryption, and web-based management.
- Data Replication: Automatic replication across multiple data servers for fault tolerance
- TLS Encryption: Secure communication between all components with optional TLS support
- Web Interface: Modern web-based client for file management with user authentication
- Heartbeat Monitoring: Automatic health checking of data servers
- Connection Pooling: Efficient connection management for improved performance
- Block-based Storage: Efficient storage using configurable block sizes
- Thread Pool: Concurrent request handling for improved throughput
┌─────────────────┐
│ Web Client │
│ (HTTP 8080) │
└────────┬────────┘
│
┌────────▼────────┐ ┌──────────────────┐
│ Client/HTTP │◄────►│ Metadata Server │
│ Server │ │ (Port 9000) │
└────────┬────────┘ └────────┬─────────┘
│ │
│ Heartbeat
│ (Port 9001)
│ │
┌────▼────────────────────────▼────┐
│ Data Servers (8000+) │
│ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ DS 1 │ │ DS 2 │ │ DS 3 │ │
│ └──────┘ └──────┘ └──────┘ │
└──────────────────────────────────┘
- Metadata Server (
metaser): Manages file metadata, block locations, and coordinates writes - Data Servers (
ser): Store actual file data blocks with replication - CLI Client (
cli): Command-line interface for file operations - HTTP Server (
client_http): Web interface with authentication
- GCC compiler
- OpenSSL development libraries
- pthreads support
- Make
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install build-essential libssl-devRHEL/CentOS:
sudo yum install gcc openssl-develmacOS:
brew install openssl# Build all components
make all
# Build individual components
make cli # Build CLI client
make metaser # Build metadata server
make ser # Build data server
make client_http # Build HTTP server
# Clean build artifacts
make cleanBinaries are created in the build/ directory.
Edit config/dfs.conf to configure the system:
[global]
block_size=1024 # Block size in bytes
metadata_file=database/metadata.txt
last_seen_file=database/lastseen.csv
[metadata]
listen_addr=0.0.0.0
listen_port=9000
heartbeat_port=9001
tls_enabled=false
[data]
bind_addr=0.0.0.0
port=8000 # Base port (increment for multiple servers)
data_file=database/my_file.txt
log_dir=database/log
metadata_host=127.0.0.1
metadata_port=9000
[client]
metadata_host=127.0.0.1
metadata_port=9000
output_file=out/cli/myfile.txtEdit config/users.csv to add users for the web interface:
username,password,root_path
alice,password123,database/users/alice
bob,password456,database/users/bobchmod 600 config/users.csv./build/metaser config/dfs.confStart multiple data servers on different ports for replication:
# Terminal 1
./build/ser config/dfs.conf 8000
# Terminal 2
./build/ser config/dfs.conf 8001
# Terminal 3
./build/ser config/dfs.conf 8002./build/cli config/dfs.confOptions:
- Lookup block: Query block location
- Write block: Upload data to replicas
- Read entire file: Download complete file
./build/client_http config/dfs.confAccess at http://localhost:8080
Lookup Block:
LOOKUP <filename> <block_id>
Get File Map:
GET_FILE_MAP <filename>
Write Block:
WRITE_BLOCK <filename> <block_id>
Store Block:
PUT BLOCK <block_id>
<data>
Retrieve Block:
GET BLOCK <block_id>
.
├── build/ # Compiled binaries
├── config/
│ ├── dfs.conf # System configuration
│ └── users.csv # User credentials
├── database/
│ ├── metadata.txt # File metadata
│ ├── lastseen.csv # Server health status
│ ├── log/ # Data server block storage
│ └── users/ # User file storage
├── include/ # Header files
│ ├── common/ # Shared utilities
│ ├── clint/ # Client headers
│ ├── dataserver/ # Data server headers
│ └── metadata/ # Metadata server headers
├── src/ # Source files
│ ├── common/ # Shared code
│ ├── clint_res/ # Client implementation
│ ├── dataserver_res/# Data server implementation
│ └── metadata_res/ # Metadata server implementation
└── webclient/ # Web interface files
To enable TLS encryption:
- Generate certificates:
# Create CA
openssl genrsa -out ca-key.pem 4096
openssl req -new -x509 -days 365 -key ca-key.pem -out ca.pem
# Create server certificate
openssl genrsa -out server-key.pem 4096
openssl req -new -key server-key.pem -out server.csr
openssl x509 -req -days 365 -in server.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem- Update
config/dfs.conf:
[metadata]
tls_enabled=true
tls_cert_file=config/certs/metadata_cert.pem
tls_key_file=config/certs/metadata_key.pem
tls_ca_file=config/certs/ca.pem- Ensure metadata server is running first
- Check firewall settings
- Verify ports in configuration
- Check
config/users.csvformat - Ensure username and password match
- Verify file permissions
- Verify metadata file exists
- Check data server logs
- Ensure data servers sent heartbeats
- Increase thread pool size in code
- Use faster storage for data servers
- Enable connection pooling
- Consider SSD storage for metadata
common/: Shared utilities (logging, config, TLS, protocol, thread pool)clint_res/: Client-side operations (read, write, lookup)metadata_res/: Metadata management, heartbeat handlingdataserver_res/: Data storage and heartbeat sending
- Update protocol in
include/common/protocol.h - Implement handlers in respective modules
- Update client code to use new features
- Test with all components running
- No automatic data rebalancing
- No file deletion via CLI (web interface only)
- No file versioning
- Fixed block size per configuration
- No distributed locking for concurrent writes
- Manual failover required
- Block Size: Larger blocks reduce metadata overhead but increase minimum transfer size
- Replication Factor: More replicas improve availability but increase storage and network overhead
- Thread Pool: Adjust in source for higher concurrency
- Connection Pool: Reuses connections for better performance
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
See LICENSE.md for details.
- Passwords: Currently stored in plain text - implement hashing before production use
- TLS: Strongly recommended for production deployments
- Authentication: Web interface uses simple token-based auth
- File Permissions: Restrict access to config and database directories
- Network: Use firewall rules to restrict access to trusted hosts
- Password hashing (bcrypt/argon2)
- Data integrity checksums
- Automatic failover and rebalancing
- File deletion and rename operations
- Distributed locking mechanism
- Metrics and monitoring endpoints
- Docker containerization
- Backup and restore utilities
- Admin CLI tools
- Write-ahead logging for crash recovery
For issues, questions, or contributions, please open an issue on the repository.