Sai Prateek Muddasani
*******@***********.*** +1-313-***-**** Michigan, USA LinkedIn
Profile Summary
Experienced Data Engineer with 3+ years of expertise in building scalable data pipelines, ETL workflows, and big data solutions using Python, SQL, and Scala. Skilled in Apache Spark, Hadoop, Airflow, and Kafka, with hands-on experience across AWS (S3, Redshift, Glue), Azure (ADF, Synapse), and GCP (BigQuery). Strong in data modeling, warehousing, and real-time processing. Proficient in ML/NLP integration, CI/CD, and DevOps tools like Docker, Git, and Terraform. Skills
Programming & Scripting: Python, SQL, Scala, Bash Big Data Ecosystem: Apache Spark, Hadoop, Hive, Kafka, Flink, Pig
ETL & Data Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, dbt, Informatica
Cloud Platforms & Services: Amazon Web Services (S3, Redshift, Glue, EMR, Lambda, RDS), Microsoft Azure (Synapse Analytics, Azure Data Lake, Azure Data Factory), Google Cloud (BigQuery, Dataflow, Cloud Storage)
Data Warehousing & Modeling: Snowflake, Amazon Redshift, Google BigQuery, Star & Snowflake Schema, Dimensional Modeling, OLAP/OLTP, Databricks
Databases & Analytics: PostgreSQL, MySQL, MongoDB, Cassandra, SQL Server, Pandas, NumPy, Tableau, Power BI
DevOps & CI/CD Tools: Docker, Jenkins, Git, GitHub Actions, Terraform, Kubernetes
Machine Learning & NLP: Scikit-learn, TensorFlow, spaCy, NLTK, Pandas, NumPy, MLflow
Version Control & Collaboration: Git, GitHub, Bitbucket, JIRA, Confluence
Soft Skills: Problem-solving, Communication, Team Collaboration, Agile/Scrum Methodologies Professional Experience
Data Engineer, LTIMindtree Aug 2024 – Present USA
Developed real-time data pipeline using Apache Kafka and Apache Spark (Scala) to process approximately 500K+ daily user events, enabling customer behavior analysis and contributing to 7–10% uplift in sales conversions.
Leveraged Hadoop and MapReduce to process over 5 TB of historical transactional data, reducing batch processing time by 35% and improving data consistency and accuracy across systems.
Designed and optimized a Snowflake data warehouse to support scalable querying of structured and semi-structured data, improving reporting performance by 40% and enabling faster analytics turnaround.
Automated ETL workflows using Python, SQL, and Apache Airflow for ingestion from 6+ data sources, decreasing manual data handling efforts by 60–70% and enhancing pipeline reliability.
Built dynamic dashboards in Power BI, delivering critical business metrics to stakeholders and reducing reporting lead time from 1–2 days to less than 4 hours.
Automated CI/CD pipelines with Azure DevOps, and Git reducing release time by 25%, and implemented Prometheus/Grafana monitoring to ensure system uptime by enabling proactive failure detection across production environments
Data Engineer, Capgemini Feb 2021 – Jul 2023 India
Implemented end-to-end CI/CD pipelines using Azure DevOps, automating deployment of data pipelines, integrating with Git repositories for source control, and enabling faster, error-free releases across environments.
Managed Azure DevOps Boards to track epics, user stories, tasks, and defects, enhancing agile sprint planning, collaboration, and cross-functional visibility, resulting in a 25% improvement in sprint deliverable completion.
Engineered scalable ETL pipelines utilizing Azure Data Factory (ADF) and Azure Databricks, enabling high-volume data ingestion, transformation, and orchestration from sources including on-premises SQL servers and cloud APIs.
Implemented Delta Lake architecture on Azure Synapse Analytics and Databricks, supporting ACID transactions, time travel, and schema enforcement, thereby improving data quality, auditability, and query performance.
Orchestrated hybrid data integration solutions, using Azure Data Gateway, Event Grid, and Logic Apps to seamlessly connect cloud-native and on-premises systems, enabling real-time and batch data processing.
Developed high-performance data transformation logic using PySpark in Azure Databricks, reducing end-to-end data processing time by 40% and enhancing the overall throughput of data analytics workloads.
Designed and maintained data lake storage strategies using Azure Data Lake Storage Gen2, ensuring secure, cost-effective, and scalable data storage for both structured and unstructured data.
Implemented monitoring and alerting mechanisms using Azure Monitor and Log Analytics to proactively track pipeline health, identify bottlenecks, and ensure SLA compliance. Education
Master of Science, Central Michigan University Aug 2023 – May 2025 USA Information Systems
Bachelor of Technology, Guru Nanak Institutions Technical Campus Jun 2017 – Jul 2021 India Computer Science
Certification
Career Essentials for Data Analysis - Microsoft and LinkedIn Learning
AWS Cloud Practitioner Essentials - AWS Learning
Lean Six Sigma Green Belt - Central Michigan University