Data Engineer Senior

Location:

United States

Salary:

$56/hr

Posted:

April 15, 2025

Contact this candidate

Resume:

SANDEEP CHUNDU

Senior Data Engineer

+1-816-***-**** *************.****@*****.*** http://www.linkedin.com/in/sandeepchundu

PROFESSIONAL SUMMARY:

Senior Data Engineer with over a decade (10+ Years) of experience designing, building, and optimizing scalable data solutions across leading cloud platforms including AWS, Azure and GCP.

Proficient in building real-time data analytics pipelines using Spark, PySpark, Kafka, AWS Glue and Airflow.

Expertise in data warehousing (Snowflake, Redshift, Synapse, BigQuery) for efficient data analysis.

Hands-on experience in ETL development and data transformation using AWS Data Pipeline, Apache NiFi, Talend, and Google Dataflow, optimizing workflows for structured and unstructured data.

Expertise in Python, SQL, T-SQL, Bash, PowerShell, and Java for data engineering and automation tasks.

Strong proficiency in big data technologies such as Apache Hadoop, Hive, Spark SQL, and MapReduce, driving distributed computing solutions for large-scale data processing.

Well-versed in machine learning and AI frameworks, including Amazon SageMaker, TensorFlow, and Scikit-Learn, with experience in feature engineering and predictive analytics.

Deep understanding of cloud storage solutions like Amazon S3, Azure ADLS and Google Cloud Storage (GCS) ensuring secure, reliable, and cost-effective data storage.

Adept at Infrastructure as Code (IaC) and DevOps methodologies, leveraging Terraform, AWS CloudFormation and Azure DevOps to automate deployments and enhance system reliability.

Proficient in containerization and orchestration using Docker, Kubernetes (K8s), Amazon ECS, AWS Fargate, GKE, and AKS, improving application scalability and performance.

Extensive knowledge of database management systems, including MongoDB, MySQL, PostgreSQL, SQL Server, AWS RDS, and Azure Cosmos DB, focusing on data modeling and query optimization.

Experience in security and compliance frameworks, implementing IAM roles, OAuth 2.0, RBAC, GDPR compliance, and cloud security audits to safeguard sensitive data.

Skilled in monitoring and logging using AWS CloudWatch, ELK Stack (Elasticsearch, Logstash, Kibana), and distributed tracing for performance tuning and issue resolution.

Proficient in translating complex data into actionable insights using QuickSight, Power BI, Tableau, and Jupyter.

Strong background in workflow orchestration and automation, leveraging Apache Airflow, AWS Step Functions, and Apache Luigi to streamline business operations.

Experienced in Agile and Scrum methodologies, utilizing JIRA and project management tools to drive collaboration and efficient execution of data-driven initiatives.

Adept at identifying inefficiencies, optimizing workflows, and implementing innovative data strategies that align with business goals.

TECHNICAL SKILLS:

Cloud Platforms & Services

AWS Glue, Data Pipeline, Redshift, Lake Formation, S3, Kinesis, RDS, Aurora, DynamoDB, IAM, CloudWatch, Lambda, Step Functions, CloudFormation, CDK, EMR, QuickSight, Azure Data Factory (ADF), HDInsight, Synapse Analytics, ADLS, Blob Storage, Cosmos DB, AKS, Machine Learning, Purview, Key Vault, DevOps, GCP Dataproc, Pub/Sub, Cloud Functions, Apache Beam, Dataflow, BigQuery, GCS, GKE.

Programming Languages & Scripting

Python, SQL, T-SQL, Bash, PowerShell, Java

Big Data & Distributed Computing

Apache Spark, PySpark, Spark SQL, Apache Hadoop, Apache Hive, Apache Kafka, Apache Beam, MapReduce, Apache Pig

Data Warehousing & Databases

Snowflake, Amazon Redshift, Azure Synapse Analytics, BigQuery, MongoDB, MySQL, PostgreSQL, SQL Server, Hive, AWS RDS, Amazon Aurora, Azure Cosmos DB

ETL & Data Pipelines

AWS Glue, AWS Data Pipeline, Apache NiFi, Talend, Apache Airflow, Apache Luigi, Azure Data Factory, Google Dataflow

Machine Learning & AI

Amazon SageMaker, TensorFlow, Scikit-Learn, Azure Machine Learning, Feature Engineering

Infrastructure as Code & Devops

Terraform, AWS CloudFormation, AWS CDK, Ansible, Azure DevOps, Jenkins, CI/CD Pipelines, BitBucket, Git, GitHub

Containerization & Orchestration

Docker, Kubernetes (K8s), Amazon ECS, AWS Fargate, GKE, AKS

Security & Compliance

AWS IAM, Azure Key Vault, OAuth 2.0, Role-Based Access Control (RBAC), GDPR Compliance, Cloud Security Audits

Monitoring & Logging

AWS CloudWatch, ELK Stack (Elasticsearch, Logstash, Kibana), Distributed Tracing

Data Visualization & Reporting

Amazon QuickSight, Power BI, Tableau, Jupyter Notebooks, ELK Stack (Kibana)

Data Modeling & Query Optimization

Data Warehousing Techniques, Data Transformation & Preprocessing, SQL Query Optimization, Data Imputation Strategies, Indexing & Partitioning Strategies

Workflow Orchestration & Automation

Apache Airflow, Apache Luigi, AWS Step Functions

Project Management Tools

JIRA, Agile/Scrum Methodologies

PROFESSIONAL EXPERIENCE:

Client: Molina healthcare, Bothell, WA November 2023 - Present

Role: Senior Data Engineer

Responsibilities:

Developed and optimized ETL pipelines using Azure Data Factory (ADF) to automate data ingestion, transformation, and orchestration.

Utilized Azure HDInsight to implement Apache Spark on HDInsight and optimized distributed data processing workflows.

Engineered real-time data streaming solutions using Apache Kafka, improving event-driven architectures and low-latency analytics.

Developed PowerShell and Azure CLI scripts to automate cloud administration, resource management, and security configurations.

Designed and maintained data warehousing solutions using Azure Synapse Analytics, improving data accessibility and reporting efficiency.

Implemented Azure Data Lake Storage (ADLS) and Azure Blob Storage for scalable and cost-effective data storage solutions.

Leveraged Python, Pandas, and NumPy for data transformation, statistical analysis, and preprocessing tasks.

Designed data streaming architectures to support real-time analytics and improve business intelligence capabilities.

Managed source code and collaborative development using BitBucket, ensuring version control and streamlined deployments.

Deployed Apache Spark and PySpark on Azure Databricks for big data processing and advanced analytics.

Built and managed Azure Cosmos DB for high-performance NoSQL database solutions supporting real-time applications.

Leveraged Azure Machine Learning to integrate predictive analytics and AI-driven insights into business intelligence solutions.

Developed CI/CD pipelines in Azure DevOps to automate deployment processes, reducing manual intervention and improving efficiency.

Deployed and orchestrated containerized applications using Azure Kubernetes Service (AKS) to enhance scalability and reliability.

Integrated ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging, monitoring, and real-time data visualization.

Automated cloud resource provisioning with Terraform, ensuring scalable and repeatable infrastructure deployment.

Designed and executed data modeling strategies to optimize SQL Server and relational databases for analytical workloads.

Wrote complex T-SQL queries for data extraction, transformation, and query performance tuning.

Created interactive Power BI dashboards to visualize and analyze large datasets, supporting business decision-making.

Participated in Agile sprints for collaborative problem-solving, including planning and daily stand-ups.

Established Azure Purview for data governance, ensuring compliance, security, and effective metadata management.

Configured Azure Key Vault to securely manage sensitive data, secrets, and credentials within cloud environments.

Environment: ADF, Azure HDInsight, Apache Spark, Apache Kafka, PowerShell, Azure CLI, Azure Synapse Analytics, ADLS, Azure Blob Storage, Python, Pandas, NumPy, BitBucket, Azure Databricks, PySpark, Azure Cosmos DB, Azure Machine Learning, Azure DevOps, AKS, ELK Stack, Terraform, SQL Server, T-SQL, Power BI, Azure Purview, Azure Key Vault.

Client: Global Atlantic financial group, Indianapolis April 2021 - October 2023

Role: Data Engineer

Responsibilities:

Designed and optimized ETL pipelines using AWS Glue and AWS Data Pipeline, improving data ingestion and transformation efficiency.

Built and maintained data warehousing solutions using Amazon Redshift, ensuring scalable and high-performance analytics.

Developed data lake architectures with AWS Lake Formation and Amazon S3, enabling efficient data storage and retrieval.

Engineered real-time data streaming solutions with AWS Kinesis, enhancing event-driven processing capabilities.

Managed data modeling for structured and unstructured data across Amazon RDS, DynamoDB, and Amazon Aurora.

Developed large-scale distributed data processing pipelines using Apache Spark, PySpark, and Spark SQL.

Implemented Apache Hadoop and Apache Hive for batch data processing and distributed querying.

Built and deployed machine learning models using Amazon SageMaker, TensorFlow, and Scikit-Learn, leveraging feature engineering for model optimization.

Deployed ML models to production, ensuring scalability, monitoring, and retraining strategies.

Designed and implemented data governance policies, ensuring GDPR compliance, RBAC, and security best practices.

Enforced cloud security measures using AWS IAM, protecting data integrity and access controls.

Configured AWS CloudWatch for real-time logging, monitoring, and distributed tracing, improving system observability.

Automated serverless data orchestration using AWS Lambda and AWS Step Functions, reducing operational overhead.

Managed Apache Airflow and Apache Luigi for workflow orchestration, optimizing pipeline scheduling and execution.

Deployed and managed Kubernetes and Docker for containerized data engineering workloads.

Developed Infrastructure as Code (IaC) using AWS CloudFormation, AWS CDK, and Ansible, ensuring repeatable cloud deployments.

Led data migration projects, transitioning on-premise databases to AWS cloud-native solutions seamlessly.

Implemented CI/CD pipelines using Jenkins, automating data pipeline testing and deployment.

Optimized distributed data processing with Amazon EMR, ensuring cost-effective computing power.

Designed and developed serverless computing architectures for scalable and event-driven workloads.

Integrated Amazon QuickSight for data visualization and interactive business intelligence reporting.

Automated cloud security audits to enforce best practices in AWS compliance and governance.

Established role-based access control (RBAC) frameworks for managing sensitive data permissions.

Implemented data streaming architectures with AWS Kinesis, ensuring low-latency data ingestion.

Optimized Spark SQL queries for big data analysis and improved cluster performance.

Developed AWS Glue-based ETL jobs, reducing data processing times and improving scalability.

Built multi-region data pipelines to support disaster recovery and high availability.

Conducted cost optimization strategies for cloud resource utilization, reducing AWS expenses.

Led cross-functional collaboration between data science, engineering, and DevOps teams to improve AI/ML deployment strategies.

Drove innovation in cloud-native data engineering, implementing best practices to enhance performance, security, and scalability.

Environment: AWS Glue, AWS Data Pipeline, Amazon Redshift, AWS Lake Formation, Amazon S3, AWS Kinesis, Amazon RDS, DynamoDB, Amazon Aurora, Apache Spark, PySpark, Spark SQL, Apache Hadoop, Apache Hive, Amazon SageMaker, TensorFlow, Scikit-Learn, AWS IAM, AWS CloudWatch, AWS Lambda, AWS Step Functions, Apache Airflow, Apache Luigi, Jenkins, Amazon EMR, Amazon QuickSight.

Client: Ford, Dearborn, MI December 2018 - March 2021

Role: Data Engineer

Responsibilities:

Implemented Apache Spark on Cloud Dataproc to accelerate distributed data processing and analytics.

Designed Cloud Pub/Sub architectures for real-time data streaming, ensuring seamless event-driven data ingestion.

Deployed Google Cloud Functions to automate data transformations and enhance workflow efficiency.

Designed and optimized ETL workflows using Apache Beam and Dataflow, ensuring efficient real-time data processing on the Google Cloud Platform (GCP).

Developed and maintained Google BigQuery solutions for large-scale data warehousing, enabling high-performance analytics.

Utilized Google Cloud Storage (GCS) for secure and scalable storage, optimizing data aggregation and retrieval processes.

Managed containerized applications using Google Kubernetes Engine (GKE), improving system scalability and reliability.

Leveraged Terraform for Infrastructure-as-Code (IaC), enabling automated provisioning and management of cloud resources.

Wrote and optimized complex SQL queries for data extraction, transformation, and performance tuning.

Designed microservices architectures to modularize data services, improving system maintainability and scalability.

Applied containerization techniques to deploy and manage workloads efficiently across cloud environments.

Worked in an Agile/Scrum framework, utilizing JIRA for sprint planning, task tracking, and project management.

Used Git and GitHub for version control, facilitating seamless collaboration and code management.

Automated deployment pipelines using Jenkins and CI/CD best practices, ensuring smooth production rollouts.

Implemented OAuth 2.0 authentication mechanisms to secure data access and enforce authorization policies.

Developed interactive Tableau dashboards to visualize key business metrics and insights from structured data.

Created and refined data models to enhance data warehousing strategies and support scalable analytics.

Engineered robust data transformation and processing pipelines to standardize data formats and improve accuracy.

Environment: Apache Spark, Cloud Dataproc, Cloud Pub/Sub, Google Cloud Functions, Apache Beam, Dataflow, GCP, BigQuery, GCS, GKE, Terraform, SQL, Microservices, JIRA, Git, GitHub, Jenkins, CI/CD, OAuth 2.0, Tableau.

Client: Merck Pharma, Branchburg, NJ April 2017 - November 2018

Role: Data Engineer

Responsibilities:

Developed and maintained AWS S3, AWS EC2, and AWS Redshift infrastructures for scalable and secure cloud computing solutions.

Utilized Snowflake and Talend to streamline data integration, ensuring efficient storage and retrieval for analytical processing.

Engineered high-performance batch processing and real-time data streaming solutions using Apache Kafka and Apache Spark.

Employed PySpark and MapReduce for distributed data wrangling, aggregation, and transformation across large-scale datasets.

Designed and implemented ETL workflows and data pipelines, optimizing data preprocessing, transformation, and ingestion across diverse data sources.

Created automated data engineering pipelines to support real-time analytics, reducing processing latency for business intelligence applications.

Utilized Pig scripts for complex data transformation tasks, ensuring seamless integration with Hadoop-based data lakes.

Managed Git-based version control to track changes and facilitate collaboration in Agile and DevOps environments.

Designed NoSQL and SQL database schemas, optimizing query performance for MongoDB, MySQL, and Hive-based storage solutions.

Apache Hadoop and HDFS were implemented for distributed data storage optimization, enabling high-throughput analytics.

Developed machine learning models leveraging Scikit-Learn and predictive analytics techniques to generate actionable insights.

Optimized data storage structures and retrieval strategies, enhanced system efficiency, and minimized computational costs.

Designed scalable data architectures supporting big data analytics, enabling efficient data retrieval and computation.

Led the implementation of real-time data streaming frameworks and improved data availability for predictive analytics and AI-driven applications.

Built interactive reports and dashboards using Jupyter Notebooks and data visualization tools to communicate insights effectively.

Environment: AWS S3, AWS EC2, AWS Redshift, Snowflake, Talend, Apache Kafka, Apache Spark, PySpark, MapReduce, ETL, Pig, Git, MongoDB, MySQL, Hive, Apache Hadoop, HDFS, Scikit-Learn, Jupyter Notebooks.

Client: PalTech, Hyderabad, India March 2015 – December 2016

Role: Jr. Data Engineer

Responsibilities:

Managed and maintained AWS RDS and S3 storage solutions to efficiently store, retrieve and secure structured and unstructured data.

Performed data cleaning, preprocessing, and statistical analysis using Python (Pandas, NumPy).

Leveraged Excel functionalities, including PivotTables, VLOOKUP and the Data Analysis Toolpak, to analyze and visualize large datasets.

Built Jupyter Notebooks for exploratory data analysis (EDA) and enabled efficient debugging and visualization of complex datasets.

Designed and optimized ETL processes to streamline data extraction, transformation and loading, ensuring high-quality data for analysis and reporting.

Developed and fine-tuned SQL queries for data retrieval, query optimization, and performance enhancement across PostgreSQL and Redshift databases.

Applied data wrangling techniques to refine raw data into structured formats, enhancing usability for business intelligence teams.

Designed data transformation workflows to normalize, aggregate, and enrich data for downstream applications and reporting.

Created automated pipelines for data extraction and ingestion, reducing manual effort and improving data accessibility.

Developed data visualization reports and dashboards to present insights, trends, and KPIs to stakeholders.

Ensured compliance with best data preprocessing, cleansing and validation practices to maintain data integrity.

Conducted data imputation strategies to handle missing values, improving data completeness and reliability for decision-making.

Enhanced database management by implementing indexing and partitioning strategies and improved data storage optimization and retrieval speed.

Collaborated with cross-functional teams in an Agile and Scrum environment, ensuring efficient sprint planning and project execution.

Maintained version control using Git, facilitating seamless collaboration and code deployment across data engineering teams.

Environment: AWS RDS, S3, Python, Pandas, NumPy, Excel, Jupyter Notebooks, ETL, SQL, PostgreSQL, Redshift, Git.

Contact this candidate