Exclusion Attributes
2022
Project Summary
Our 3 most critical files were legacy code, poorly written and with inaccurate analysis
My tasks
As a Data Engineer, I re-engineered one of our critical files that contained exclusion attributes. My
main objective was to create a faster, well-documented, and easy-to-understand codebase that could be
easily shared and maintained. Here are some of the technical aspects of the project:
- Developed an efficient and abstract framework that all three exclusion tables in our system now use.
- Developed consistent and efficient tests that cover the majority of the cases, reducing the
likelihood of regression errors.
- Cleaned the code from 3000 unshareable lines to 42, reducing complexity and making it easy to read
and maintain.
- Re-architected the system, reducing the number of files read from 17 to 6, improving performance and
efficiency.
- Decreased the pipeline's running time by 75%, and other pipelines that adopted my framework saw
their running times reducing by around 50%.
- Utilized technical skills and experience in Python, Spark, SQL, Linux, Workflow Orchestration,
Hadoop, Data Governance, GDPR, Agile, and SaFe to deliver the project.
- Adhered to best practices in Data Governance, GDPR, Agile, and SaFe, making the project more stable
and reliable.
Through my technical skills, experience, and expertise in Data Engineering, I delivered a faster, more
efficient, and maintainable codebase for the Exclusion Attributes project.