Nandini Bathini
# ****************@*****.*** +1-806-***-****
ï linkedin
Summary
• Data Engineer with 5+ years of expertise in designing, optimizing, and deploying scalable data pipelines and architectures across enterprise-level environments. Skilled in Big Data technologies such as Hadoop, Spark, and Hive for efficient data processing and analytics.
• Proficient in Python, Java, and SQL, with hands-on experience in data warehousing solutions including Snowflake, Redshift, and Azure Synapse, enabling structured data storage, query optimization, and efficient analytics.
• Expert in ETL development, leveraging tools like Informatica, Talend, SSIS, and Apache Airflow to streamline data integration, transformation, and automation, ensuring high-quality, real-time insights.
• Extensive experience in cloud platforms—AWS, Azure, and GCP—building serverless ETL pipelines with AWS Glue, Redshift, and Lambda, optimizing processing speed, and reducing latency.
• Strong DevOps and CI/CD expertise, utilizing Jenkins, GitHub, Docker, Terraform, and Kubernetes to automate deployments, enhance scalability, and ensure efficient workflow management.
• Hands-on experience with real-time data streaming technologies such as Apache Kafka, Amazon Kinesis, and Spark Streaming, improving data ingestion, reliability, and low-latency analytics.
• Proven ability in implementing scalable architectures, automating workflows, and optimizing distributed systems, enabling data-driven decision-making, advanced analytics, and AI/ML-driven business intelligence. Skills
• Programming Languages: Python, Java, Scala, SQL, PL/SQL, T-SQL, Unix Shell Scripting
• Big Data Technologies: Hadoop, Spark, Hive, Pig, Sqoop, MapReduce, HBase, EMR
• Methodologies: SDLC, Agile, Waterfall
• DevOps Tools: Git, Jenkins, Docker, Kubernetes
• ETL Tools: SSIS, Informatica PowerCenter, Erwin, Talend, DataStage
• Databases: SQL Server, PostgreSQL, MySQL, Oracle, Snowflake, DynamoDB, MongoDB
• Pipelines: Apache Airflow, AWS Step Functions, Luigi, Prefect, Oozie
• Streaming Technologies: Amazon Kinesis, Apache Spark, Apache Kafka
• Libraries: Pandas, NumPy, Matplotlib, SciPy, Scrapy, TensorFlow, PyTorch, Scikit-learn, NLTK, Plotly, Keras, PyMC3
• Data Visualization: Microsoft Excel, Power BI, Tableau, IBM Cognos, QlikView, QuickSight, Seaborn, SSRS
• Cloud Platforms: AWS (EC2, S3, Redshift, Glue), Azure (Azure Data Factory, Azure Databricks), GCP
• Data Warehousing: Amazon (DynamoDB, RDS, Athena), Azure (Synapse, Blob, Data Lake), BigQuery, Teradata, Snowflake
Education
Texas Tech University
Masters in Computer and Information Sciences
• Specialized in Big Data Engineering — focus on distributed processing, cloud data platforms, and large-scale pipeline design.
Certifications
• AWS Cloud Practitioner Certified
• Databricks Certified Data Engineer Associate
Experience
Meta Dec 2024 – Present
Data Engineer
• Engineered and optimized high-throughput data pipelines using Python and Meta’s internal data infrastructure, improving data processing efficiency by 30% and enabling seamless integration across petabyte-scale distributed systems.
• Automated 80% of routine data pipeline tasks using Python, resolving over 200 data integrity issues.
• Optimized and accelerated data transformation processes, slashing latency by 40% and enhancing data flow efficiency across massive datasets by architecting high-performance Spark-based solutions within Meta’s data framework, leveraging AWS Glue for serverless ETL.
• Developed and maintained over 20 automated data orchestration workflows in Apache Airflow, utilizing complex DAGs to streamline data pipeline execution and decrease processing time by 40% for intricate data transformation tasks within Meta’s analytical ecosystem.
• Architected and implemented event-driven data solutions using AWS Lambda, integrated with Meta’s real-time data streaming platforms and AWS services (API Gateway, SNS, SQS, S3, CloudWatch), enabling immediate data-driven actions and notifications.
• Engineered and optimized high-performance SQL queries within Snowflake and Meta’s data warehousing systems, executing comprehensive data analysis to eradicate discrepancies across large-scale fact and dimension tables, leveraging star and snowflake schema principles to maximize efficiency and accuracy.
• Automated deployment processes using Jenkins, GitHub, and Bash/Shell scripting within CI/CD pipelines, accelerating continuous integration and delivery, reducing deployment time by 50%, and ensuring robust and reliable releases within Meta’s production environment.
CVS Health Jul 2020 – Jun 2023
Data Engineer
• Established and deployed data pipelines with Azure Synapse Analytics and Data Factory, automating data integration and transformation operations, increasing query performance by 40%, and optimizing analytical workflows.
• Organized data processing in Python and Pandas, using MongoDB for efficient storage and retrieval, and lowering data manipulation time by 30% for large-scale datasets.
• Implemented big data processing on Hadoop using YARN and MapReduce, improving resource allocation and reducing job execution time by 35% for large-scale data analysis activities.
• Executed scalable and high-performance NoSQL solutions on Azure Cosmos DB, resulting in 30% faster data retrieval and seamless scalability for cloud-based applications.
• Orchestrated complex data pipelines using Apache Airflow, automating workflows and reducing task execution time by 35%, ensuring seamless data integration and monitoring.
• Utilized Docker and integrated it with CI/CD pipelines, automating containerized application deployments and decreasing release times by 50%, resulting in consistent environments and shorter delivery cycle.
• Engineered and deployed robust ETL processes utilizing SSIS to automate data extraction, transformation, and loading, increasing data integration efficiency by 30% across several corporate systems.
• Manufactured and distributed messaging solutions using Kafka for real-time data streaming and ZooKeeper for cluster management, increasing system stability and lowering message delivery latency by 25%.
• Customized and improved large data processing workflows using Hive and HiveQL for querying, as well as Pig and Pig Latin for data transformation, resulting in a 30% increase in processing speed in a distributed setting.
• Developed and optimized scalable ETL pipelines in Azure Databricks, integrating with Data Lake Storage to process petabyte-scale datasets, improving performance by 40% in distributed analytics.
• Designed advanced machine learning workflows on Databricks leveraging PySpark and MLlib, enabling predictive analytics and reducing model training time by 35% in production environments.