Project
Org-wide Data Centralization into a Data Lake
AFOUR Technologies
Sep 2022 – Jun 2023
PythonPySparkETLData Engineering
Implemented PySpark pipelines to aggregate data from multiple sources into a central analytics lake for reporting and observability.
Problem
Org data was fragmented across systems and databases, making analytics inconsistent, hard to trust, and expensive to reconcile.
Solution
Built PySpark ETL pipelines to ingest, transform, and centralize data into a single analytics lake with standardized schemas and repeatable ingestion flows.
Impact
- Improved consistency of analytics by consolidating fragmented sources into a single lake.
- Reduced time spent reconciling data across teams through standardized ingestion and transformation.