Sandeep N
Data Engineer
Chicago, IL +1-779-***-**** *****************@*****.***
SUMMARY
Data Engineer with 6+ years of experience designing enterprise ETL pipelines, data migration frameworks, and cloud-based data platforms.
Strong expertise in data extraction, cleansing, transformation, and migration from legacy ERP systems to modern data platforms including SAP HANA-compatible architectures.
Hands-on experience with AWS Glue, Azure Data Factory, Spark, Python, SQL, and DataStage-style ETL pipelines.
Proven ability to identify data quality issues, resolve duplication, and enforce data governance frameworks across enterprise datasets.
Experience collaborating with architecture, product, and business teams to deliver scalable data pipelines and enterprise data solutions.
Experience supporting AI-driven data workflows and preparing data for ML use cases
Worked with modern data lake architectures including Delta Lake and Apache Iceberg
Strong understanding of data modeling, warehousing concepts, and distributed system design
SKILLS
Category
Skills
Programming Languages
Python, SQL, Scala, Java, T-SQL, Shell Scripting, Unix
Big Data Technologies
Apache Spark (Core, SQL, Streaming), Hadoop, Hive, Pig, Sqoop, Flume,
MapReduce, Apache Kafka, Apache Flink, Apache Airflow, Yarn, Zookeeper
Frameworks & Libraries
PySpark, FastAPI, Django, Pandas, NumPy, SQLAlchemy, Confluent Kafka
ETL & Orchestration Tools
Azure Data Factory, AWS Glue, Matillion, SSIS, Apache Airflow, DBT, Step Functions, Azure Logic Apps
Azure Cloud Services
Azure Databricks, Azure Data Lake Gen2, Azure SQL Database, Synapse Analytics, Event Hubs, Stream Analytics, Azure Functions, Azure VMs, Azure Monitor
AWS Cloud Services
S3, Lambda, EC2, Kinesis, RDS, Redshift, Athena, EMR, Glue, DynamoDB, Step Functions, Route 53, CloudWatch, ECS, API Gateway, SNS, Elasticsearch, IAM
Data Warehouses & Databases
Snowflake, Redshift, Azure Synapse, Azure SQL, MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, DynamoDB, Cosmos DB, SQL Server
CI/CD & DevOps Tools
Docker, Azure DevOps, AWS CodePipeline, Jenkins, Terraform, Kubernetes (AKS,
ECS)
Version Control & Agile
Git, GitHub, GitLab, Jira, Azure Boards
Operating Systems
Windows, Linux
EXPERIENCE
Cigna Healthcare, US Jan 2024 – Current Senior Data Engineer
Led data migration and transformation pipelines supporting enterprise system modernization initiatives.
Designed automated data extraction and cleansing frameworks using AWS Glue, Python, and Spark to process legacy ERP datasets.
Developed data conversion pipelines transforming legacy system data into SAP HANA-compatible formats for downstream ERP integration.
Built data profiling and validation frameworks to identify duplicate records, missing attributes, and schema inconsistencies.
Created data mapping documentation mapping legacy source attributes to SAP master data objects such as business partners and transactional entities.
Implemented data quality dashboards and monitoring scripts using SQL and CloudWatch to track migration accuracy and completeness.
Automated ETL workflows integrating multiple enterprise systems including relational databases, APIs, and file-based datasets.
Collaborated with enterprise architects and business stakeholders to define migration strategies and data governance policies.
Designed validation reports ensuring 100% reconciliation between legacy and target SAP datasets during migration cycles.
Developed SQL-based validation scripts and reconciliation reports to ensure 100% migration accuracy.
Integrated datasets into SAP HANA-compatible formats for downstream ERP integration.
Established end-to-end observability with CloudWatch metrics, logs, dashboards, and SNS alerts, improving reliability and MTTR.
Optimized Spark jobs using partitioning, broadcast joins, caching, and tuning, improving performance by ~30%.
Enforced enterprise security controls using IAM, KMS encryption, Secrets Manager, and least-privilege access.
Wipro, India Jan 2022 – Dec 2022 Data Engineer I
Developed ETL pipelines to extract and transform legacy enterprise system data for enterprise data platforms.
Built Spark-based pipelines for large-scale data transformation and cleansing of transactional datasets.
Implemented incremental loads and CDC-based ingestion frameworks ensuring accurate synchronization between source and target systems.
Assisted in evaluation and implementation of ETL tools and frameworks supporting enterprise data migration initiatives.
Developed SQL validation scripts to ensure data accuracy, reconciliation, and completeness across migration cycles.
Documented data extraction methodologies, data mapping templates, and ETL testing frameworks used across migration projects.
Collaborated with cross-functional teams including data architects, SAP integration teams, and analytics teams. Collaborated in Agile teams delivering production-ready pipelines..
Assisted in evaluation and selection of ETL tools for migration acceleration.
Documented repeatable extraction methodologies and test frameworks.
Documented data extraction methodologies and best practices for repeatable migration processes.
Collaborated in Agile Scrum teams, working closely with product owners, QA, and analytics stakeholders.
Aktrix, India Apr 2019 –Dec 2021 Data Engineer
Designed distributed Spark ETL pipelines processing large datasets across enterprise systems.
Built Python scripts automating data extraction, transformation, and data cleansing processes.
Implemented data warehouse schemas and ETL frameworks for enterprise analytics platforms.
Optimized Spark workloads using partitioning, caching, and parallel processing techniques.
Built automated monitoring pipelines ensuring data integrity, error detection, and pipeline reliability.
Supported data integration initiatives between multiple enterprise applications and analytics systems. Optimized distributed Spark workloads reducing runtime by 40%.Optimized Spark pipelines reducing execution time by 40%.
Supported cross-functional collaboration with architecture and business teams.
Implemented CloudWatch logging, metrics, and alarms for proactive monitoring and incident detection
Refactored Spark jobs using partition pruning and in-memory caching, reducing execution time by ~40%.
Education
Masters in business administration
Lewis University - 2024
Bachelors in computer engineering
Vardhaman college of Engineering - 2018