Data Engineer

Location:

Kothrud, Maharashtra, 411038, India

Salary:

Posted:

September 20, 2025

Contact this candidate

Resume:

Nikita Godase

*.*+ Years as Data Engineer

Citiustech

Contact info

954-***-****

****************@*****.***

India, Pune, Kothrud pune

Education

University of Pune

India, Pune

**** - ****

Skills

SQL

Python

Big Data

Data Warehousing

Data Mining

ETL

Apache Hadoop

Apache Spark

NoSQL

Data Modeling

Cloud Computing

Data Visualization

Data Analysis

Business Intelligence

Pyspark

Data bricks

Data Pipeline

Data Migration

Data Validation

Azure Blob storage

Professional summary

3.6+ Years as Data Engineer at Citiustech from Feb 2022 to till date

Data Engineer with 3.6+ years of experience in

Data Engineering, Big Data technologies and Data

Analysis, . Proficient in Python, PySpark, Databricks SQL, Hadoop, Hive, Azure Cloud with expertise

in building and managing scalable data pipelines

using Azure Data Factory. Skilled in Data Mining,

Data Preparation, Data modeling and ETL processes, ensuring high-quality data flow for analytical and business needs. Experienced in handling large

datasets, implementing Machine Learning algorithms, and working with cloud platforms for data storage

and processing. Strong knowledge of Proof of Concepts

(PoC) and gap analysis to drive data-driven solutions for enterprise applications.

Experience

Data Engineer December 2023 - Now

Tibsovo (Pharmaceutical Services), United Kingdom

Project Title: USA – UK Pharma Data Pipeline

Domain: Pharmaceutical

Technical Skill Sets: PySpark, Python, SQL, Pandas, AWS (Lambda, S3, Glue, SNS/SQS, Step Functions,

Redshift), ETL

Project Overview

• Built a robust AWS-based data pipeline for

processing pharmaceutical sales data from

multiple external sources.

• Implemented Medallion Architecture (Bronze

Silver Gold) for data quality, enrichment, and

delivery.

•

Azure Data Factory

Azure Data Pipeline

Azure Logic app

Azure Data Lake

AWS s3 Glue RDS Redshift SNS

SQS

Lambda Step function

Event Bridge Cloud watch

PowerBI Excel DAX

Snowflake

Git

Docker

NoSQL

Hobbies

Traveling

Cooking

Awards

Best Performer in Team May 2024

Languages

English Hindi Marathi

Personal info

Date of birth:

6 December 1999

Place of birth:

Satara

Nationality:

India

Automated data ingestion, validation,

transformation, and delivery to Amazon Redshift

for analytics and reporting.

• Reduced manual intervention and accelerated

decision-making.

Outcomes

• 80% reduction in manual data handling.

• 40% improvement in processing time.

• Enabled real-time broker-wise sales reporting.

• Handled 1 TB/month of sales data with scalable,

reusable pipelines.

Challenges

• Frequent schema drift across multiple broker

data formats.

• Processing and optimizing large file sizes.

• Integrating legacy systems with AWS-native

solutions.

• Achieving cost-efficiency while scaling.

Solution & Architecture

Bronze Layer – Raw Data Ingestion

1. External API Lambda trigger Raw data stored in S3.

2. S3 event notifications SNS/SQS Lambda for

validation; errors stored in error bucket.

Silver Layer – Validation & Enrichment

3. AWS Glue Jobs for transformation, validation,

schema mapping, and error logging.

Gold Layer – Aggregation & Delivery

4. Partitioned & enriched datasets stored in S3 by broker/region.

5. Data loaded into Amazon Redshift for analytics & reporting.

Role & Responsibilities

• Designed & developed ETL workflows in AWS

Glue and PySpark.

• Created PySpark transformation logic and

managed Delta tables in S3.

• Implemented query optimizations (partitioning,

caching).

• Managed schema evolution and dynamic

ingestion workflows.

•

Applied Medallion Architecture best practices

for data layering.

Data Engineer February 2022 - November 2023

Kaiser Permanente, United States

Project Title: End-to-End Doctor-Patient Transcription Pipeline (Azure),

Domain: Healthcare,

Technical Skill Sets: Azure API Management, Azure

Data Lake Gen2, Azure Functions, Azure Databricks, Azure Synapse Analytics, Power BI, Delta Lake, Python, Spark,

Project Overview: Designed and implemented

a secure, scalable data pipeline to process

doctor-patient conversation transcripts from

Whisper/OpenAI, adopted Medallion Architecture

(Bronze–Silver–Gold layers) for traceability, quality, and performance, enabled ingestion, transformation, anonymization, and analytics for clinical voice-to-text data, delivered structured, query-ready datasets to downstream analytics tools for KPI tracking and

research insights,

Outcomes: Real-time and historical tracking of patient care trends, improved decision-making using diagnosis frequency and consultation quality metrics, fully

automated and scalable pipeline with plug-and-play integration for new clinics/doctors, HIPAA-compliant PHI/PII anonymization for secure data sharing,

• Challenges: Processing unstructured transcripts

with overlapping speaker dialogues, ensuring

HIPAA compliance with secure storage and

masking of sensitive patient data, harmonizing

inconsistent and semi-structured metadata from

multiple clinic sources,

Contact this candidate