Skip to content

MathewsVinoy/Distributed-File-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

125 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed File System (DFS)

A high-performance, fault-tolerant distributed file system written in C with support for data replication, TLS encryption, and web-based management.

Features

  • Data Replication: Automatic replication across multiple data servers for fault tolerance
  • TLS Encryption: Secure communication between all components with optional TLS support
  • Web Interface: Modern web-based client for file management with user authentication
  • Heartbeat Monitoring: Automatic health checking of data servers
  • Connection Pooling: Efficient connection management for improved performance
  • Block-based Storage: Efficient storage using configurable block sizes
  • Thread Pool: Concurrent request handling for improved throughput

Architecture

┌─────────────────┐
│   Web Client    │
│   (HTTP 8080)   │
└────────┬────────┘
         │
┌────────▼────────┐      ┌──────────────────┐
│  Client/HTTP    │◄────►│ Metadata Server  │
│    Server       │      │   (Port 9000)    │
└────────┬────────┘      └────────┬─────────┘
         │                        │
         │                   Heartbeat
         │                  (Port 9001)
         │                        │
    ┌────▼────────────────────────▼────┐
    │      Data Servers (8000+)        │
    │  ┌──────┐  ┌──────┐  ┌──────┐  │
    │  │ DS 1 │  │ DS 2 │  │ DS 3 │  │
    │  └──────┘  └──────┘  └──────┘  │
    └──────────────────────────────────┘

Components

  1. Metadata Server (metaser): Manages file metadata, block locations, and coordinates writes
  2. Data Servers (ser): Store actual file data blocks with replication
  3. CLI Client (cli): Command-line interface for file operations
  4. HTTP Server (client_http): Web interface with authentication

Prerequisites

  • GCC compiler
  • OpenSSL development libraries
  • pthreads support
  • Make

Installation

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install build-essential libssl-dev

RHEL/CentOS:

sudo yum install gcc openssl-devel

macOS:

brew install openssl

Building

# Build all components
make all

# Build individual components
make cli          # Build CLI client
make metaser      # Build metadata server
make ser          # Build data server
make client_http  # Build HTTP server

# Clean build artifacts
make clean

Binaries are created in the build/ directory.

Configuration

Edit config/dfs.conf to configure the system:

[global]
block_size=1024                    # Block size in bytes
metadata_file=database/metadata.txt
last_seen_file=database/lastseen.csv

[metadata]
listen_addr=0.0.0.0
listen_port=9000
heartbeat_port=9001
tls_enabled=false

[data]
bind_addr=0.0.0.0
port=8000                          # Base port (increment for multiple servers)
data_file=database/my_file.txt
log_dir=database/log
metadata_host=127.0.0.1
metadata_port=9000

[client]
metadata_host=127.0.0.1
metadata_port=9000
output_file=out/cli/myfile.txt

User Management

Edit config/users.csv to add users for the web interface:

username,password,root_path
alice,password123,database/users/alice
bob,password456,database/users/bob

⚠️ SECURITY WARNING: Passwords are stored in plain text. Use strong passwords and restrict file permissions:

chmod 600 config/users.csv

Running the System

1. Start Metadata Server

./build/metaser config/dfs.conf

2. Start Data Servers

Start multiple data servers on different ports for replication:

# Terminal 1
./build/ser config/dfs.conf 8000

# Terminal 2
./build/ser config/dfs.conf 8001

# Terminal 3
./build/ser config/dfs.conf 8002

3. Use the CLI Client

./build/cli config/dfs.conf

Options:

  1. Lookup block: Query block location
  2. Write block: Upload data to replicas
  3. Read entire file: Download complete file

4. Start Web Interface

./build/client_http config/dfs.conf

Access at http://localhost:8080

API Reference

Metadata Server Protocol

Lookup Block:

LOOKUP <filename> <block_id>

Get File Map:

GET_FILE_MAP <filename>

Write Block:

WRITE_BLOCK <filename> <block_id>

Data Server Protocol

Store Block:

PUT BLOCK <block_id>
<data>

Retrieve Block:

GET BLOCK <block_id>

File Structure

.
├── build/              # Compiled binaries
├── config/
│   ├── dfs.conf       # System configuration
│   └── users.csv      # User credentials
├── database/
│   ├── metadata.txt   # File metadata
│   ├── lastseen.csv   # Server health status
│   ├── log/           # Data server block storage
│   └── users/         # User file storage
├── include/           # Header files
│   ├── common/        # Shared utilities
│   ├── clint/         # Client headers
│   ├── dataserver/    # Data server headers
│   └── metadata/      # Metadata server headers
├── src/               # Source files
│   ├── common/        # Shared code
│   ├── clint_res/     # Client implementation
│   ├── dataserver_res/# Data server implementation
│   └── metadata_res/  # Metadata server implementation
└── webclient/         # Web interface files

TLS Configuration

To enable TLS encryption:

  1. Generate certificates:
# Create CA
openssl genrsa -out ca-key.pem 4096
openssl req -new -x509 -days 365 -key ca-key.pem -out ca.pem

# Create server certificate
openssl genrsa -out server-key.pem 4096
openssl req -new -key server-key.pem -out server.csr
openssl x509 -req -days 365 -in server.csr -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem
  1. Update config/dfs.conf:
[metadata]
tls_enabled=true
tls_cert_file=config/certs/metadata_cert.pem
tls_key_file=config/certs/metadata_key.pem
tls_ca_file=config/certs/ca.pem

Troubleshooting

Connection Refused

  • Ensure metadata server is running first
  • Check firewall settings
  • Verify ports in configuration

Authentication Failed

  • Check config/users.csv format
  • Ensure username and password match
  • Verify file permissions

Block Not Found

  • Verify metadata file exists
  • Check data server logs
  • Ensure data servers sent heartbeats

Performance Issues

  • Increase thread pool size in code
  • Use faster storage for data servers
  • Enable connection pooling
  • Consider SSD storage for metadata

Development

Code Structure

  • common/: Shared utilities (logging, config, TLS, protocol, thread pool)
  • clint_res/: Client-side operations (read, write, lookup)
  • metadata_res/: Metadata management, heartbeat handling
  • dataserver_res/: Data storage and heartbeat sending

Adding New Features

  1. Update protocol in include/common/protocol.h
  2. Implement handlers in respective modules
  3. Update client code to use new features
  4. Test with all components running

Known Limitations

  • No automatic data rebalancing
  • No file deletion via CLI (web interface only)
  • No file versioning
  • Fixed block size per configuration
  • No distributed locking for concurrent writes
  • Manual failover required

Performance Tuning

  • Block Size: Larger blocks reduce metadata overhead but increase minimum transfer size
  • Replication Factor: More replicas improve availability but increase storage and network overhead
  • Thread Pool: Adjust in source for higher concurrency
  • Connection Pool: Reuses connections for better performance

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

See LICENSE.md for details.

Security Considerations

⚠️ Important Security Notes:

  1. Passwords: Currently stored in plain text - implement hashing before production use
  2. TLS: Strongly recommended for production deployments
  3. Authentication: Web interface uses simple token-based auth
  4. File Permissions: Restrict access to config and database directories
  5. Network: Use firewall rules to restrict access to trusted hosts

Future Enhancements

  • Password hashing (bcrypt/argon2)
  • Data integrity checksums
  • Automatic failover and rebalancing
  • File deletion and rename operations
  • Distributed locking mechanism
  • Metrics and monitoring endpoints
  • Docker containerization
  • Backup and restore utilities
  • Admin CLI tools
  • Write-ahead logging for crash recovery

Support

For issues, questions, or contributions, please open an issue on the repository.

About

Developed a distributed file system from scratch using C and TCP socket programming that enables clients to store and retrieve files across multiple servers. Implemented core file operations such as upload, download, list, and delete, with reliable client–server communication. The system handles concurrent client requests and demonstrates concepts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors