Badges
Certifications
Work Experience
Software Engineer
Modak Analytics Llp•  February 2022 - Present
Built an Dat Engineering model for an Health Sector Industry to extract data from the PostgreSQL Database where more than 20 tables each under various schemas and some consisted of more than 5 million records as well as some with less than 5 million records were stored into GCS storage based on the dates patients when they are admitted. Technologies used were PySpark, Kafka, a ETL tool (StreamSets), and GCS bucket storage. When the record count of a table data is more than 5 million data was loaded into the Kafka producer using spark streaming application in python and then loaded into the GCS storage using StreamSets which gives around 95% accuracy with least time complexity. When the record count is less than 5 million the data is directly loaded into the GCS from using StreamSets which given around 98% accuracy.
Education
KIIT, Bhubaneswar (Kalinga Institute of Industrial Technology)
Electronics and Computer Science, B.Tech
Links
Skills
arpanmitra53 has not updated skills details yet.