Shubham Gupta

Badges

Certifications

shubhamg931 has not earned any certificates yet.

Work Experience

Senior Software Engineer
Zomato• January 2020 - Present
Datalakehouse at Blinkit: Built a robust datalakehouse with over 1000 source tables (10TB data) getting replicated > Reduced the refresh frequency of lake tables to 30 minutes from existing 3 hours > Built support for derived tables creation over our Presto datalake with ACID capabilities > Upgraded Debezium from v1.3 to v1.9.4 and also migrated kafka connect to k8s from existing EC2 > Built a robust FSM based Web UI to make the onboarding process of tables in the datalake self-served > Patched data in lake to fix data discrepancy in over 25 tables > Migrated Airflow with over 1500 dags from v1.10 to v2.13 > Reduced Redshift storage by (1TB) by identifying the unused and automating the pruning process of tables Data Governance at Grofers: Introduced and drove the adoption of a robust data cataloging tool (Datahub by Linkedin) > Setup one-click deployment for the tool using docker, k8s helm charts > Integrated data catalog with ETL pipeline (Airflow) BI tool (Redash) to automate the catalog generation process > Migrated Datahub with over 4000 datasets from version v0.6 to v0.8.18 Data Warehouse at Grofers > Working on optimizing our data warehouse (Redshift) in terms of cost and query performance BI Reporting > Primary owner of the BI tool being used at Grofers > Built multiple features on top of existing OSS to make data more accessible to the company > Working in the Data Engineering Team > Debugged issues in ETL pipeline causing data loss > Worked on Internal BI tools, adding features to the existing tools > Built multiple slack-bots to automate the reporting and monitoring process > Reduced Aurora IOPs cost by 10% by optimizing queries > Reduced consumption of Xplenty(ETL pipeline) node hours by 50 hours per day > Reduced Redshift disk space usage by 15% > Migrated ETL pipeline from Xplenty to an in-house built Source Replication Pipeline
Software Engineer Intern
PolicyBazaar• May 2019 - August 2019
> Worked as a Big Data Developer Intern > Built a scalable and configurable big data analytics platform > Wrote scripts for data cleansing and preparation of the data warehouse > Structured data warehouse to reduce the query time > Communicated with front-end & back-end team to engineer design patterns > Apache Hadoop, Hive, Pyspark, Elasticsearch, Kafka