Narayana Nelluri

United States

@nvenkat42

Data Engineer / Data Scientist

Badges

Problem Solving
Python
Days of Code
Sql

Certifications

Work Experience

  • Senior Data Engineer

    Deloitte•  September 2022 - Present•  Dallas, TX - Remote

    Leading the design and implementation of a comprehensive Data Warehousing solution for integrating data from multiple sources. Optimized the ETL processes using Databricks and Pyspark, improving data load times by 40%. Developed and deployed application frameworks to implement the Data Quality and Auditing to track the job metrics and to optimize the data pipelines and increase efficiency by 20% and processing time reduction by 15%. Enhancing existing applications with new business requirements and streamlining existing processes to merge code repositories of similar processes and enhance the pipeline completion time by 32% and storage efficiency by 15%. Developing applications using Databricks, Airflow, Sage maker and S3 to load data from source to Order Management System. Led the effort of migrating 50+ dashboards from Looker to Tableau and saving navigation time by 30% by users. Developed dashboard using Tableau and Looker on Platform Insights for predicting and forecasting infrastructure platform budget. Designed and developed data pipelines using Airflow, Spark, Hive to read Millions of events hourly from Kafka and ingest into Hive Data warehouse to monitor Infrastructure Insights like KPI and Metrics to stakeholders. Developed and deployed API client to consume logs and generate insights for internal resource allocation platform and thereby reducing manual efforts and time saving by 25% and better decision making. Developed pipeline to read 100K+ JSON files with 900+ columns daily to load into hive and developed tableau dashboards to make informed decisions and to minimize the loss by maximizing the usage of infrastructure and forecast the budget.

  • Lead Data Engineer

    Wipro Inc•  September 2021 - August 2022•  Austin, TX - Remote

    Led the Ads Platform Prod Support Team in maintaining and improving performance of streaming applications with 1B+ events per day and improved pipeline efficiency and issue reduction by 65% and improve SLA by 50% by enhancing the application to track pipeline status, monitoring proactively and saving $100K+ revenue loss per month. Enhanced applications using Airflow, EKS, S3, EMR, Time Series DB, and Cymatics to track processing delays and identify areas for improvement, ensuring that the applications met the highest standards of performance and reliability. Designed and implemented Data validation process ensuring accuracy and consistency across multiple databases. Collaborated with cross-functional teams identify business gaps and develop data solutions ensuring data consistency.

  • Data Engineer

    Tata Consultancy Services Ltd•  September 2018 - September 2021•  Tampa, FL

    Designed and developed data pipelines using various technologies such as Kafka, Azure Data Factory, Azure Data Lake, and Snow pipe to ingest XML/JSON data into Snowflake data warehouse. Developed Analytics pipelines using ADF, Pyspark and Databricks to ETL data into Snowflake for business insights. Led the effort in enforcing Data Quality by applying business logics and implementing checks using iCEDQ tool. Worked closely with stakeholders to understand business requirements, perform gap analysis for Data Catalog and Data Lineage, and ensure metadata is curated and updated regularly. Developed python utilities to automate metadata curation and update the catalog with the latest information in Alation. Maintained documentation for data pipelines, processes, and best practices to ensure efficient knowledge transfer and onboarding for new team members. Managed ETL pipelines that processed 8K-10K batch jobs per day and 200K+ batch executions per month, handling over a million records each day from data warehouse platforms to AML target systems. Improved process efficiency and saved 90% of time for manual processes by writing Python scripts for process automation. Designed and implemented automated data pipelines using Spark for daily batch ingestion from Hive to MongoDB, transforming and loading data for analysis and visualization purposes. Led the data acquisition and loading from various sources into MongoDB for building data visualizations and supporting machine learning applications to run marketing campaigns for Credit card customers with promotions.

  • Data Analyst

    Epathusa•  July 2018 - August 2018•  Des Moines, IA

    Designed and implemented data pipelines to extract, transform, and load data from various sources into a centralized repository using technologies such as SQL Server Integration Service. Developed and maintained real-time dashboards and reports using tools such as Power BI, Tableau, or other similar platforms, to provide stakeholders with actionable insights into key business metrics and trends.

Education

  • Southern Arkansas University

    Master of Science•  August 2016 - May 2018•  GPA: 3.9

Skills

S3
EC2
EMR
CloudWatch
ADF
ADLS
Databricks
Event Hub
Git
JIRA
Confluence
Jenkins
CI/CD
Airflow
SQL
Hadoop
Hive
MongoDB
HBase
Snowflake
Pyspark
Kafka
Spark SQL
Python
Scala
Java
Python(Intermediate)
Data Structure
Python(Advanced)