Badges
Certifications
Work Experience
Data Analyst Intern
Chicago Transit Authority•  May 2019 - April 2020
• Cleaned the 100,000 (approx.) records of unstructured data using Python’s data wrangling techniques with Pandas, Numpy. • Analyzed the Records Center’s data to generate ad-hoc reports per requirements, using SQL, Amazon Redshift, and Access. • Modeled a search application on Power BI that extracts and mines data from the SharePoint site to create a dashboard improving the overall cross inventory search by 85% using slicers and filters. • Published and scheduled automated refreshes using the PowerBI Gateway connecting Desktop to the SharePoint website. • Created Power BI charts and dashboard to provide meaningful insights on the Records Center’s inventory. • Programmed SharePoint workflows to automate processes like Request and Disposal of CTA’s records with RESTful API. Environment: SQL, Microsoft Access, Excel, SharePoint, Power BI, Python, Amazon Redshift
System/Data Engineer
Tata Consultancy Services Ltd•  November 2014 - July 2018
• Coded in Perl, Unix Shell, and Python, and documented artifacts including designs, unit test plans, thus contributing to hands-on experience in every stage of an SDLC in an Agile driven project. • Installed, upgraded and deployed Linux SuSE physical and virtual machines over live network hosting 1000+ network elements including SAN and LUNs configuration for High Availability clustering setup using Linux command line. • Re-engineered the installation of High Availability solutions on Linux machines using Python to reduce the overall time to set up by 40%. • Developed Unix Shell scripts that are programmed to automate the legacy procedures to back up and restore Linux physical machines and virtual machines configured in a High Availability cluster. Thus, it reduced the downtime of live servers by 60%. • Developed an application using AWS core components like Amazon EC2, S3, ELB, AutoScaling, IAM. Worked with AWS CLI and Cloud Formation Templates and scripts in python using the Boto3 framework to interact with web services. • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Spark and Sqoop. • Used SparkAPI over Cloudera VM with Hadoop YARN to perform analytics on data in Hive. • Involved in importing and exporting data from different database source like SQL Server, Oracle, CSV and flat files on local/external file system and RDBMS to HDFS using Sqoop. • ETL Data cleansing, integration and transformation using Pig. Responsible for managing data from disparate sources. • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the R&D team and developed bash scripts to load the log files from FTP server to Hive tables. • Created Dashboards using calculations, parameters in Tableau and generated reports that provide clear visualizations. • Access and change extremely large datasets through filtering, grouping, aggregation, joining, blending and statistical calculation. • Understood and implemented Tableau features like joins, data blending, python integration and dashboard actions • Developed customized calculations when required. • Created various actions in database like filter actions to send information to other tableau worksheets and to connect the same in analytical flows, URL actions to connect views of data to external resources and Highlight action to provide rich visual interaction between the views on the sheet. • Actively involved in testing new/upgraded features like new Visualizations, Forecasting, Data Blending, Parallelized Dashboard, Hyperlink Objects, Color-coded tabs etc. in Tableau desktop. • Worked on Performance tuning of report with huge queries using concurrent query execution for better performance of report and leaving no room for performance degradation. • Developed PySpark and Spark SQL code to build workflows that aggregate user metrics from multiple sources processing around 100-150 million records. • Worked on developing ETL jobs where my responsibilities include writing Spark and Hive scripts. Rerouted Hive workflows in Spark thereby reduced workflow time by 40%. • Worked on Spark jobs doing ad-hoc querying and daily data processing from log files. • Modeled a NoSQL CouchDB database to act as a caching layer/content repository to improve application performance. Environment: Unix/Linux, Bash/Shell Scripting, Python, Agile, SDLC, Linux SuSE High Availability, Disaster Recovery, Spark, Hadoop, HDFS, PySpark, Pig, Hive
Education
Illinois Institute of Technology, Chicago
Information Technology, MS•  August 2018 - May 2020
Gandhi Institute of Technology and Management
Electronics and Instrumentation Engineering, B.Tech•  June 2010 - April 2014
Links
Skills
sippili has not updated skills details yet.