Data Engineer Big

Location:

Seattle, WA

Posted:

October 15, 2025

Contact this candidate

Resume:

Nandini Bathini

# ****************@*****.*** +1-806-***-****

ï linkedin

Summary

• Data Engineer with 5+ years of expertise in designing, optimizing, and deploying scalable data pipelines and architectures across enterprise-level environments. Skilled in Big Data technologies such as Hadoop, Spark, and Hive for efficient data processing and analytics.

• Proficient in Python, Java, and SQL, with hands-on experience in data warehousing solutions including Snowflake, Redshift, and Azure Synapse, enabling structured data storage, query optimization, and efficient analytics.

• Expert in ETL development, leveraging tools like Informatica, Talend, SSIS, and Apache Airflow to streamline data integration, transformation, and automation, ensuring high-quality, real-time insights.

• Extensive experience in cloud platforms—AWS, Azure, and GCP—building serverless ETL pipelines with AWS Glue, Redshift, and Lambda, optimizing processing speed, and reducing latency.

• Strong DevOps and CI/CD expertise, utilizing Jenkins, GitHub, Docker, Terraform, and Kubernetes to automate deployments, enhance scalability, and ensure efficient workflow management.

• Hands-on experience with real-time data streaming technologies such as Apache Kafka, Amazon Kinesis, and Spark Streaming, improving data ingestion, reliability, and low-latency analytics.

• Proven ability in implementing scalable architectures, automating workflows, and optimizing distributed systems, enabling data-driven decision-making, advanced analytics, and AI/ML-driven business intelligence. Skills

• Programming Languages: Python, Java, Scala, SQL, PL/SQL, T-SQL, Unix Shell Scripting

• Big Data Technologies: Hadoop, Spark, Hive, Pig, Sqoop, MapReduce, HBase, EMR

• Methodologies: SDLC, Agile, Waterfall

• DevOps Tools: Git, Jenkins, Docker, Kubernetes

• ETL Tools: SSIS, Informatica PowerCenter, Erwin, Talend, DataStage

• Databases: SQL Server, PostgreSQL, MySQL, Oracle, Snowflake, DynamoDB, MongoDB

• Pipelines: Apache Airflow, AWS Step Functions, Luigi, Prefect, Oozie

• Streaming Technologies: Amazon Kinesis, Apache Spark, Apache Kafka

• Libraries: Pandas, NumPy, Matplotlib, SciPy, Scrapy, TensorFlow, PyTorch, Scikit-learn, NLTK, Plotly, Keras, PyMC3

• Data Visualization: Microsoft Excel, Power BI, Tableau, IBM Cognos, QlikView, QuickSight, Seaborn, SSRS

• Cloud Platforms: AWS (EC2, S3, Redshift, Glue), Azure (Azure Data Factory, Azure Databricks), GCP

• Data Warehousing: Amazon (DynamoDB, RDS, Athena), Azure (Synapse, Blob, Data Lake), BigQuery, Teradata, Snowflake

Education

Texas Tech University

Masters in Computer and Information Sciences

• Specialized in Big Data Engineering — focus on distributed processing, cloud data platforms, and large-scale pipeline design.

Certifications

• AWS Cloud Practitioner Certified

• Databricks Certified Data Engineer Associate

Experience

Meta Dec 2024 – Present

Data Engineer

• Engineered and optimized high-throughput data pipelines using Python and Meta’s internal data infrastructure, improving data processing efficiency by 30% and enabling seamless integration across petabyte-scale distributed systems.

• Automated 80% of routine data pipeline tasks using Python, resolving over 200 data integrity issues.

• Optimized and accelerated data transformation processes, slashing latency by 40% and enhancing data flow efficiency across massive datasets by architecting high-performance Spark-based solutions within Meta’s data framework, leveraging AWS Glue for serverless ETL.

• Developed and maintained over 20 automated data orchestration workflows in Apache Airflow, utilizing complex DAGs to streamline data pipeline execution and decrease processing time by 40% for intricate data transformation tasks within Meta’s analytical ecosystem.

• Architected and implemented event-driven data solutions using AWS Lambda, integrated with Meta’s real-time data streaming platforms and AWS services (API Gateway, SNS, SQS, S3, CloudWatch), enabling immediate data-driven actions and notifications.

• Engineered and optimized high-performance SQL queries within Snowflake and Meta’s data warehousing systems, executing comprehensive data analysis to eradicate discrepancies across large-scale fact and dimension tables, leveraging star and snowflake schema principles to maximize efficiency and accuracy.

• Automated deployment processes using Jenkins, GitHub, and Bash/Shell scripting within CI/CD pipelines, accelerating continuous integration and delivery, reducing deployment time by 50%, and ensuring robust and reliable releases within Meta’s production environment.

CVS Health Jul 2020 – Jun 2023

Data Engineer

• Established and deployed data pipelines with Azure Synapse Analytics and Data Factory, automating data integration and transformation operations, increasing query performance by 40%, and optimizing analytical workflows.

• Organized data processing in Python and Pandas, using MongoDB for efficient storage and retrieval, and lowering data manipulation time by 30% for large-scale datasets.

• Implemented big data processing on Hadoop using YARN and MapReduce, improving resource allocation and reducing job execution time by 35% for large-scale data analysis activities.

• Executed scalable and high-performance NoSQL solutions on Azure Cosmos DB, resulting in 30% faster data retrieval and seamless scalability for cloud-based applications.

• Orchestrated complex data pipelines using Apache Airflow, automating workflows and reducing task execution time by 35%, ensuring seamless data integration and monitoring.

• Utilized Docker and integrated it with CI/CD pipelines, automating containerized application deployments and decreasing release times by 50%, resulting in consistent environments and shorter delivery cycle.

• Engineered and deployed robust ETL processes utilizing SSIS to automate data extraction, transformation, and loading, increasing data integration efficiency by 30% across several corporate systems.

• Manufactured and distributed messaging solutions using Kafka for real-time data streaming and ZooKeeper for cluster management, increasing system stability and lowering message delivery latency by 25%.

• Customized and improved large data processing workflows using Hive and HiveQL for querying, as well as Pig and Pig Latin for data transformation, resulting in a 30% increase in processing speed in a distributed setting.

• Developed and optimized scalable ETL pipelines in Azure Databricks, integrating with Data Lake Storage to process petabyte-scale datasets, improving performance by 40% in distributed analytics.

• Designed advanced machine learning workflows on Databricks leveraging PySpark and MLlib, enabling predictive analytics and reducing model training time by 35% in production environments.

Contact this candidate