Badges
Certifications
Work Experience
Senior Data Engineer
Adobe• October 2021 - Present
● Leading Data Engineering and ML Engineering teams for DSP (Demand Side Platform). ● Successfully leading multiple projects for Data, ML, and Product teams. ● Developed and improved Kafka Spark Streaming pipelines for collecting ad impression data in real time. ● Developed and optimized pipelines to send real-time metrics to DSP UI, such as impressions, spend, clicks, etc ● Led Qubole to EMR migration project for ML and Data teams, provided assistance to other teams. ● Assisted teams in improving query/job performance using Spark. ● Configured EMR cluster for adhoc Spark, Hive, and Presto Jobs/Queries with auto scaling. ● Managed and enhanced performance and monitoring of the job scheduling tool. ● Added features to the job scheduling platform, including data push to S3, Snowflake, Hive, etc. ● Enabled data pipelines for ML model-related and product-related dashboards. ● Established data pipelines for reporting on advertisement platforms. ● Facilitated data exchange pipelines with external partners and vendors. ● Managing ML model training pipelines, model serving, and model monitoring dashboards. ● Deployed new ML models on K8s cluster and automated model training and inference pipelines. ● Conducted several proofs of concept (POCs) for tools, technologies, and platforms. ● Identified and cleaned unused computers and storage resources.
Data Engineer 3
PayPal• May 2021 - October 2021
● Resolved issues in existing data pipelines, ensuring smooth data flow and resolving any obstacles. ● Onboarded new tables from PayPal properties (e.g., Venmo, Xoom) to Hadoop clusters, expanding data capabilities. ● Extracted data from multiple databases and successfully ingested it into the Hadoop cluster, ensuring accurate and reliable data. ● Automated data transformations and developed robust data pipelines, leveraging UC4 and a custom framework to enhance efficiency and accuracy. ● Enhanced the performance of slow-running SparkSQL jobs, optimizing query execution and improving overall efficiency. ● Processed real time transaction data from Kafka using Spark Streaming. Tools Used: Python, HDFS, Spark Streaming, Spark SQL, Pyspark, Hive, Impala, Kafka, Shell Script, Crontab.
Senior Software Engineer
Freshworks• November 2019 - May 2021
● Developed and improved Big Data pipelines on Spark, Hive, Impala, AWS, and RDS. ● Refactored the existing data pipelines to enhance code maintainability and performance. ● Developed a data pipeline to process feedback data for model training, monitoring, and report generation. ● Understood Machine Learning requirements from Data Scientists and developed data pipelines to provide appropriate data for model training. ● Developed data archival and purging solutions to remove personally identifiable information (PII). ● Designed database schemas, classified fields, and migrated data. ● Coordinated with different solution providers to modernize existing Data Platforms. ● Worked on proof of concepts (POCs) to evaluate different technologies and platforms that could help in modernizing existing platforms. ● Developed and conducted learning sessions on Big Data and related technologies Tools Used: Scala, Python, Java, Spark Streaming, Spark SQL, Pyspark, Hive, Impala, Kafka, RDS, S3, EFS, CloudWatch, EC2, DynamoDB, Github, Shell Script, Crontab, Databricks
Specialist Progrmmer
Infosys Ltd• May 2016 - November 2019
Education
NIT, Allahabad (Motilal Nehru National Institute Of Technology)
Computer Science & Engineering, B.Tech• 2012 - 2016
Links
Skills
manojd664 has not updated skills details yet.