Badges
Certifications
Work Experience
Site Reliability Engineer
Deutsche Bank Global Technology, Inc.•  October 2023 - Present•  Cary, NC, USA
• Monitored health and performance of systems, tracked SLO’s of application, and services using GCP monitoring, Geneos, Splunk & Newrelic. Also, configured alerts to avoid any outage beforehand. • Responded to incidents within SLA, conducting root cause analyses and implementing preventive measures along with updating documentation for configurations, procedures, and troubleshooting guides. • Active participant in on-call rotations, ensuring 24/7 support for production systems. • Developed dashboards for increasing monitoring and observability which is later used in multiple applications across company. • Automated multiple reports using Python and Linux scripting which reduced the Service Requests by more than 20%. • Created multiple Task scheduler jobs for repeated manual tasks which reduced manual efforts by around 15 hrs. per month. • Designed and maintained disaster recovery plans and procedures, tested and validated these plans periodically. • Configured and monitored Kubernetes resources such as pods, deployments, services, and ingress controllers. • Used tools like Terraform or Helm to manage Kubernetes infrastructure as code and tools like kubectl and container logs to investigate incidents and root causes. • Conducted thorough post-incident reviews to identify the root cause of issues and developed preventive measures and shared lessons learned with relevant teams to improve overall system reliability. • Supported development teams to ensure smooth and reliable deployment of new features and updates. • Analyzed system performance data, identified bottlenecks, and worked proactively to optimize performance and planned for future capacity needs to handle peak loads.
Systems Engineer
Tata Consultancy Services•  November 2017 - February 2022•  Chennai, India
• Automated manual tasks using Python and Unix scripting, replacing manual log rotation, compression, or error checking, freeing up SRE time by 30%and reducing human error. • Designed and maintained monitoring systems and alert configurations, improving incident response times. • Supported Java-based applications by ensuring optimal performance and availability of JVMs, web servers, and application servers, with extensive experience in application debugging and troubleshooting. • Collaborated on the development of a React-Node.js web application for asset tracking, receiving accolades for innovation. • Built and optimized data pipelines using Spark and AWS services for high-volume data transformations and analytics including data selection, cleaning and storing results in S3 and Redshift. • Have written complex SQL queries for ETL developments using large datasets and have experience integrating data from multiple data sources and file types such as JSON, Parquet, Avro, flat files, Web API, XML, SQL and NoSQL databases.
Education
The University of Texas at Arlington
Master of Science in Data science•  January 2022 - May 2023•  GPA: 3.9
Andhra University
Electrical Engineering & Computer Science, BE•  August 2013 - May 2017