Skip to content

Latest commit

 

History

History
56 lines (48 loc) · 5.05 KB

File metadata and controls

56 lines (48 loc) · 5.05 KB

Architecture: Survey Response Sync Engine

1. Architecture Overview

I chose a Repository-based, Offline-First Architecture utilizing Kotlin Multiplatform (KMP). This approach ensures a single source of truth for business logic while leveraging platform-native capabilities for data persistence (Room) and connectivity.

Key Components:

  • Data Layer (Source of Truth):
    • Room Database: Stores SurveyResponse entities locally. I chose Room for its robustness, transaction support, and TypeConverter capabilities, which are essential for handling complex, nested survey data stored as JSON strings.
    • DAO: Provides atomic operations for querying the sync queue and updating statuses.
  • Engine Layer (Business Logic):
    • SyncManager: The core orchestrator. It manages the sync loop, enforces concurrency limits (Mutex), and handles error classification.
    • NetworkMonitor: An abstraction for connectivity checks, allowing the engine to react to network changes without coupling to platform-specific APIs (ConnectivityManager on Android, Reachability on iOS).
  • API Layer:
    • SyncApi: An interface defining the contract for data upload, allowing for easy mocking and substitution of the real backend.

Why this architecture?

  • Separation of Concerns: The engine doesn't know about UI or specific network libraries. It only knows about the DAO and the generic SyncApi.
  • Testability: By injecting dependencies (SurveyDao, SyncApi), the entire sync logic can be tested with in-memory fakes, covering edge cases like "network drops after 3rd item" without flaky integration tests.
  • Resilience: The state machine (DRAFT -> PENDING -> SYNCING -> SYNCED/FAILED) ensures that no data is lost or duplicated. If the app crashes, the PENDING items remain in the DB.

2. Media File Uploads & Compression

To handle media:

  • Storage: Use the attachments list in SurveyResponse to store local file paths.
  • Pre-processing: Before upload, the SyncManager would invoke a MediaCompressor (interface). This step would be blocking or suspended. To prevent blocking the sync loop, compression should occur before the item enters the sync queue (e.g., at "Submit" time) or be an independent step in the sync process (State: COMPRESSING -> PENDING).
  • Multipart Upload: The SyncApi.upload method would be updated to accept List<File>.
  • Resumability: For large files, I would implement chunked uploads or use a dedicated background transfer service (WorkManager on Android, NSURLSession on iOS) instead of doing it directly in the foreground-bound coroutine scope.

3. Network Detection Risks & Mitigation

Scenario: The NetworkMonitor reports isConnected = true because the device has a signal, but the upstream connection is dead (Captive Portal, extensive packet loss, or DNS failure). Mitigation:

  • Application-Layer Ping: Don't rely solely on the OS signal. Attempt a lightweight HEAD request to a known endpoint (e.g., /health) before starting a batch.
  • Adaptive Timeout: If the first request times out, assume the network is "zombie" and back off exponentially, even if the OS says "Connected".
  • Traffic Analysis: If SyncApi throws NetworkException consecutively, force a "cooldown" period where the engine ignores the "Connected" signal to save battery.

4. Remote Troubleshooting

To diagnose issues without device access, I would log structured events to a remote logging service (e.g., Firebase Crashlytics or Sentry):

  • Sync Session Start/End: "Sync started with 10 items."
  • Item Result: "Item uuid succeeded/failed. Duration: 200ms."
  • Error Details: "Upload failed. Error: SocketTimeout. Signal Strength: Weak."
  • Queue State: "Remaining pending: 50. Oldest item age: 48 hours." This data reveals patterns (e.g., "Sync always fails on 2G networks" or "Particular survey structure causes 500 errors").

5. GPS & Field Boundaries

Challenges:

  • Drift & Accuracy: In rural areas, GPS accuracy can fluctuate (10m-50m error).
  • Satellite Lock: Getting a fix takes time, draining battery.
  • Validation:
    • Timestamp correlation: Ensure points are captured sequentially.
    • Speed checks: Reject points that imply impossible travel speeds (e.g., jumping 1km in 1 second).
    • Polygon Closure: Ensure the first and last points are close enough to close the loop.
    • Area Calculation: Real-time feedback ("Area: 0.001 hectares? Too small, try again").

6. Retrospective: What I'd Do Differently

  • WorkManager Integration: Currently, the sync must be triggered manually or by an app lifecycle event. I would add platform-specific background scheduling (WorkManager/BackgroundTasks) to retry automatically when connectivity returns.
  • Dependency Injection: I would introduce Koin or a simple DI container to manage the AppDatabase and SyncManager lifecycle, rather than manual instantiation.
  • Flow-based Status: I would expose a Flow<SyncState> (Idle, Syncing, Error) from SyncManager so the UI can show a progress bar ("Uploading 3 of 10...").