Diksha Singh

India

@singhdikku

Badges

Problem Solving
Python
Sql

Certifications

Work Experience

  • Big Data Engineer

    GAP Inc•  August 2017 - Present

    • I have worked on multiple analytics project within my tenure in my current company. • Have worked on optimizing the prices for the all brands and all markets operated by company using Hadoop, Hive, Pig, Python, Pyspark, Shell, MySQL, Denodo Platform, RabbitMq cluster and Azure Cloud • Also Have experience in Optimizing the packing of the products using Hadoop,Hive ,Python, Pyspark, Shell, RabbitMq cluster and R. • Closely work with Data Scientist team to understand the basic requirement of the product and deliver the results appropriately which helps them in building models. • Used Kafka producer to ingest the raw data into Kafka topics run the Spark Streaming app to process real time customer purchase data and transformed the raw data and build the loyalty programs for customer based on their purchase and also predicted the tendency of customer to come back to store based on number of purchase done in a year. • Extracted real-time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS. • Developed ETLs to pull data from various sources and transform it for reporting applications using Hive, Denodo, Power BI • Handled the real time streaming data from different sources using Flume and set destination as HDFS. • Used Oozie and CA Workload automation tool for scheduling workflows and incorporating the cluster • Used Oozie and CAWA framework extensively for automating daily import jobs. • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis • Worked with SQL and NO SQL databases: Oracle, Mongo DB, DB2, MYSQL. • Collaborated with the infrastructure, network, database, application and analysis teams to ensure data quality and availability. • Consumed data from different data sources like Rabbit MQ queue, Web Services(JSON Format), MySQL and inserted data into hive using different consumptions method, processed the data in different layers( Raw, Transformation, Consolidation, Aggregation) and send data to SAS engine and Gurobi engine (Azure) for optimization and again consume it back to Hadoop cluster and then exposed to different UI for the business UI. • Optimized the data storage in Hive using partitioning and bucketing on both the managed and external tables. • Created the pipeline for real time feed data of customer purchase using Kafka and spark streaming • Experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts. • Supported various reporting teams and experience with data visualization tool POWER BI. • Created Logical & Physical Data Modeling on Relational (OLTP), Dimensional Data Modeling (OLAP) on Star schema for Fact & Dimension tables using Erwin. • Involved in importing the real time data to Hadoop using Kafka and implemented the CA workload automation tool job for daily. • Written multiple UDF’s in Pig and Python for data processing. • Implemented Python Data Analysis using Pandas, Matplotlib, Seaborn, Tensor Flow, and Numpy. • Designed the data aggregations on Hive for ETL to process data as per business requirement. • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS. • Real Time/Stream processing Apache Storm, Apache Spark • Fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka. • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data into HDFS using Spark Streaming. • Integrating Kafka with Spark streaming for high speed data processing. • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data. • Performed ETL operations between Data Warehouse and HDFS. • Expert in migrating streaming or static RDBMS data into Hadoop cluster from dynamically-generated files using Flume and Sqoop. • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop. • Aggregation, queries and writing data back to OLTP system directly or through Sqoop. • Loaded RDBMS of large datasets to big data by using Sqoop. • Handled the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop. • Successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE. • Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.

  • Big data Developer

    Tata Consultancy Services•  July 2015 - August 2017

    • Involved in Data management and Data manipulation by different MapReduce job to parse the raw data, populate staging tables and store the refined data in partitioned tables • Installed and configured Hadoop cluster in Test and Production environments • Configured Sqoop and developed scripts to extract data from MySQL into HDFS • Involved in importing and exporting data from RDBMS file system, local FS, UNIX FS to HDFS and Hbase by using SQOOP, HDFS command and vice versa. • Involved in moving all log/text files generated by various products into HDFS location. • Handled importing data from various data sources, performed transformations using Hive and Pig and loaded data into HDFS. • Load data into and out of HDFS using the Hadoop File System commands • Ingest real-time and near-real-time streaming data into HDFS • Involved in creating pig tables, loading data and writing Pig script which will run internally in Map Reduce way. • Participated in coding of application programs with Java and other scripting languages. • Developed multiple MapReduce jobs in java for data cleaning and preprocessing • Written the MapReduce program to find the all the subscribers who made the STD calls for more than 60 hours to offer them discount • Written the MapReduce program to enrich the Question Data with the user Data • Written MapReduce jobs in Pig and Hive to analyze and transform the data as per the user data • Written Pig Scripts and Hive queasier for processing and storing data. • Processed and analyzed the data from Hive tables using HiveQL. • Created Hive tables to store the processed results in a tabular format. • Writing the script files for processing data and loading to HDFS. • Analyzing the requirement to setup a cluster. • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing. • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior on using various relational operators like union, join, filter, order, limit etc. • Involved in defining job flows, managing and reviewing log files. • Written Map Reduce code that will take input as log/text files and parse the logs/texts and Structure them in tabular format to facilitate effective querying on the log data. • Used Zookeeper and Oozie Operational services for scheduling workflows and incorporating the cluster • Used Flume to collect, aggregate and store the web log data from different sources and pushed to HDFS. • Developed multiple Pig scripts for data preprocessing and staging data in HDFS for further analysis. • Used Oozie framework extensively for automating daily import jobs. • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis

  • Web Developer

    Infochord Technologies PVT Ltd•  June 2014 - December 2014

    • Design and Developed the Database architecture and successfully implemented. • Involved in creating simplified User Interface • Involved in writing the business logics and testing the website after completion • Involved in maintaining different products.

Education

  • Banasthali Vidyapith

    Computer Science & Engineering, B.Tech•  July 2011 - May 2015

Skills

singhdikku has not updated skills details yet.