Exclusion Attributes

2022

Project Summary

Our 3 most critical files were legacy code, poorly written and with inaccurate analysis

My tasks

As a Data Engineer, I re-engineered one of our critical files that contained exclusion attributes. My main objective was to create a faster, well-documented, and easy-to-understand codebase that could be easily shared and maintained. Here are some of the technical aspects of the project:

Developed an efficient and abstract framework that all three exclusion tables in our system now use.
Developed consistent and efficient tests that cover the majority of the cases, reducing the likelihood of regression errors.
Cleaned the code from 3000 unshareable lines to 42, reducing complexity and making it easy to read and maintain.
Re-architected the system, reducing the number of files read from 17 to 6, improving performance and efficiency.
Decreased the pipeline's running time by 75%, and other pipelines that adopted my framework saw their running times reducing by around 50%.
Utilized technical skills and experience in Python, Spark, SQL, Linux, Workflow Orchestration, Hadoop, Data Governance, GDPR, Agile, and SaFe to deliver the project.
Adhered to best practices in Data Governance, GDPR, Agile, and SaFe, making the project more stable and reliable.

Through my technical skills, experience, and expertise in Data Engineering, I delivered a faster, more efficient, and maintainable codebase for the Exclusion Attributes project.