Badges
Certifications
Work Experience
Data Engineer
Leap Commerce•  March 2023 - Present•  Mumbai
End-to-End Data Warehousing Architecture: Developed data warehousing architecture on the Azure Cloud from scratch. This involved establishing a Data Lake (staging layer), Data Warehouse (consumption layer), and Data Ingestion pipelines, providing a solid foundation for our data-driven initiatives. * SAP Data Integration: Designed and implemented a data warehousing solution for the weekly, hourly, and historical loading of SAP data. This data is subsequently used to create the Sales vs. Targets Dashboard, which plays a pivotal role in driving business decisions for various brands within the organization. * Efficient ETL with Azure Data Factory: Developed ETL data pipelines using Azure Data Factory, enabling the incremental loading of SAP data into the Data Warehouse. This automation streamlined the process of data loading, cleansing, and transformation, significantly reducing the time required to gain insights for various Dashboards, thereby enhancing our decision-making efficiency. * Digital Marketing Data Integration: Designed a Data Warehousing solution for the incremental loading of Digital Marketing data. This data is instrumental in creating the Commercial and Marketing Dashboard for a consumer goods customer. * RPA-Powered Data Ingestion: Developed ETL pipelines using Azure Data Factory to incrementally load data downloaded from various E-Commerce platforms using a UiPath-based RPA tool. After performing various data cleansing and transformations, this data is consumed to generate reports in PowerBI, supporting the Commercial & Marketing Dashboard. * Competitor Data Extraction: Leveraging Python and Selenium, Developed web scraping scripts to extract competitors' data for advanced analytics, providing valuable insights into market dynamics and competitive intelligence.
Data Engineer
Quantiphi•  October 2021 - March 2023•  Bengaluru
API Development for Data Consumption: Designed and developed APIs that allow users to fetch validated data based on parameters and receive it as real-time streaming responses. This improved data accessibility and usability for end-users. * Secure Data Access with Azure Authentication: Implemented Azure authentication within our endpoints, ensuring region-based access control for end-users, and enhancing data security, and compliance. * Ingestion Tracking Mechanism: Developed an ingestion tracking mechanism that meticulously monitors every file's journey from the staging layer to the validation layer, providing transparency and accountability in our data pipeline. * Ingestion Metrics Extraction: Developed a metrics extraction system that derives valuable KPIs from our data ingestion pipeline. These KPIs serve as a foundation for Tableau Dashboards in the curated layer, facilitating data-driven insights. * DICOM Data Extraction: Spearheaded the development of an edge device capable of extracting DICOM images from PACS/NAS, facilitating seamless integration of medical imaging data into our pipeline. * ETL Pipeline for DICOM Data: Architected ETL ingestion pipelines that conducts de-identification and text redaction on DICOM data. The processed data is then seamlessly transferred to a GCS bucket for further processing. * Quality Check Model APIs: Developed user-friendly APIs for approving/rejecting Quality Check models, offering flexibility in metadata tag selection for de-identification and text redaction within the ETL pipeline. * GCP Datastore Management: Worked extensively with GCP Datastore, utilizing APIs for querying and updating Datastore kinds, and optimising data management processes.
Data Engineer Intern
Quantiphi•  June 2021 - October 2021•  Bengaluru
On-Premise De-Identification Solution: Played a pivotal role in developing an on-premise de-identification solution tailored for deployment at hospital sites. This solution was designed to seamlessly ingest DICOM data into the cloud, ensuring the secure handling of sensitive medical information. Metadata De-Identification: To adhere to strict HIPAA and PHI regulations, constructed ETL ingestion pipelines for the on-premises solution (edge device). This pipeline systematically processed DICOM data, applying predefined rulesets to de-identify metadata tags, guaranteeing patient data privacy and compliance. Integration with ML Models: Collaborating closely with our machine learning team, successfully integrated their containerized models into our data pipeline. This enabled automated text-redaction, enhancing the security and confidentiality of patient information in the cloud. User-Friendly Interface: Working in tandem with our UI team, contributed to developing a user-friendly interface. This interface empowered users to customize and apply their rulesets, allowing them to ensure data compliance before ingestion into the cloud. My time as an intern has allowed me to make meaningful contributions to cutting-edge projects, blending healthcare data security with advanced data engineering techniques. I'm enthusiastic about the possibilities of data engineering in healthcare and am eager to continue making a positive impact in this field.
Education
Rajeev Gandhi Technical University,Bhopal
Computer Science & Engineering, B.Tech•  July 2017 - July 2021