Project

Org-wide Data Centralization into a Data Lake

AFOUR Technologies

Sep 2022 – Jun 2023

PythonPySparkETLData Engineering

Implemented PySpark pipelines to aggregate data from multiple sources into a central analytics lake for reporting and observability.

Problem

Org data was fragmented across systems and databases, making analytics inconsistent, hard to trust, and expensive to reconcile.

Solution

Built PySpark ETL pipelines to ingest, transform, and centralize data into a single analytics lake with standardized schemas and repeatable ingestion flows.

Impact

Improved consistency of analytics by consolidating fragmented sources into a single lake.
Reduced time spent reconciling data across teams through standardized ingestion and transformation.