Siddharth Nautiyal

Badges

Certifications

Certificate: Python (Basic)

skill

Work Experience

Consultant
Deloitte• November 2020 - Present
Developed and set up an end to end scalable ETL data pipeline over Azure. Developed robust extraction flow using Azure DataFactory which will be handling incremental as well as full data load on the daily basis. Developed generic and reusable framework to implement Slowly Changing Dimensions (1 &2) and Facts, with the capability to handle the changed data capture using Pyspark and SQL in Azure DataBricks. Implemented Azure delta lake to improve the performance of the above mention framework, which reduces the time to half as compared to the basic spark-sql framework, which is being used previously. Implemented Dynamic Data Masking on Azure Synapse, which masks the PII Information present in the data
Big Data Engineer
RedDoorz• January 2020 - November 2020
• Developed and setup an end to end scalable ETL datapipeline over AWS. • Developed robust incremental/full extraction flow using NIFI. • Developed reusable validation framework using python for cleaning and massaging the data. • Developed reusable framework to implement SCD1/SCD2 and incremental/full data load to Redshift tables using python. • Developed complex and optimized SQL queries to implement business logic. • Created well optimized tables with AWS Redshift best practices, reducing querying and loading time. • Created AWS Step Function to automate the whole datapipeline, which triggers at scheduled time, with the help of AWS CloudWatch. • Developed daily automated business summary report, which contains daily, monthly and last month business related details and the comparison between them, using Python.
Technical Associate
Genpact• July 2017 - December 2019
Created data extraction framework to extract flat files like CSV, Excel, XML, txt from FTP server using Python. Developed a reusable data validation framework for cleaning and massaging the data using Spark. Developed reusable framework to implement business logic and perform SCD1/SCD2 using SPARK-SQL. Developed complex SQL queries to implement business logic implementation. Developed framework to load full/Incremental data to AWS Redshift using Spark-Sql. Developed framework in Scala to process multiple Excel files using Apache POI library. Developed Spark framework, which is an improvement over Scala framework for processing Excel files using Crealytics library. Created tables in AWS Redshift using best practices, which leads to better query and data loading performance. Created AWS Datapipeline scripts for orchestration of all the jobs as per batches. Created AWS lambda to automate AWS EMR creation/termination and triggering of AWS data pipeline. Created AWS Stepfunction to orchestrate the whole process. Developed AWS Data Migration Service pipeline to replicate 100 GBs of Historical data.