MOHAN SAI
Cincinnati, Ohio open to relocation
513-***-**** ********.****@*****.*** linkedin.com/in/mohansai3/ github.com/jastimi
Summary
Results-driven Data Engineer with 5 years of experience in architecting and implementing scalable cloud-based data solutions, using AWS and Azure services. Skilled in designing batch and real-time pipelines, and expert in scalable distributed processing with Apache Spark and Scala for big data workloads. Proficient in Snowflake and MapReduce for high-performance cloud data warehousing. Demonstrates excellent collaboration and analytical skills with a consistent track record of delivering impactful data solutions aligned with business goals.
Skills
•AWS Services: S3, EC2, EMR, Redshift, Athena, Glue, RDS, Lambda, Kinesis, SNS, SQS, AMI, IAM, Cloud formation
•Big Data & ETL Tools: Apache Spark, Delta Lake, Hive, Kafka, Airflow, Snowflake, Informatica, DataStage, Talend, Apache NiFi, SSIS, DBT
•Databases: Oracle, Microsoft SQL Server, MySQL, Teradata, Apache Cassandra, MongoDB
•Programming Languages: Java, Scala, Python, R, SQL, Shell Scripting
•Machine Learning and Data Science: Pandas, NumPy, Scikit-learn, TensorFlow, Amazon SageMaker
•Cloud services: AWS, Azure, Azure Data Factory, Azure Data Lake, Docker, Kubernetes, Jenkins, GitHub
•Others: Agile (Scrum), SDLC, Tableau, Power BI, Excel, Linux, Eclipse, JIRA
•Teamwork and Collaboration, Communication, Leadership, Adaptability, Problem solving
Experience
KeyBank, Ohio Data Engineer Feb 2025 – Present
•Reduce manual reporting effort by 60% through the orchestration of a serverless data pipeline integrating AWS Lambda and Amazon Athena for automated log processing and near real-time analytics on S3-stored datasets.
•Enhance real-time data decision-making by implementing Apache Kafka for high-throughput streaming pipelines, supporting instant processing of 1M+ customer transactions and loan operations.
•Streamline cloud-based CI/CD operations by automating data pipeline deployments with AWS CodePipeline, GitHub Actions, Jenkins, and Apache Airflow ensuring continuous integration and cutting deployment time by 60%.
•Enable strategic financial insights by designing scalable Amazon Redshift environments and interactive Power BI dashboards, empowering 50+ business users to visualize and act on real-time financial and transaction data.
Cleveland-Cliffs, Ohio Data Engineer Aug 2024 – Dec 2024
•Implemented fine-grained S3 access controls using AWS Lambda and DynamoDB, enhancing secured access to 10+ TB of S3 data across multiple steel production systems and reducing unauthorized access incidents across steel production systems.
•Leveraged Apache Flink for real-time stream processing of manufacturing data from sensors, enabling instant decision-making on production floors and improving operational efficiency through proactive issue detection.
•Built reusable ETL framework in Databricks using PySpark and Hive, reducing processing time by 40% and enabling scalable analytics.
Cognizant, India Data Engineer Aug 2021–Jul 2023
•Developed and optimized ETL pipelines in Databricks using PySpark and Spark SQL to process terabytes of semi-structured insurance data from Amazon S3, improving query performance by 60% and accelerating claims analysis.
•Implemented real-time data integration using Amazon Kinesis Data Streams and AWS Glue, orchestrated with Apache Airflow to streamline message-driven workflows and reduce latency in global insurance data processing pipelines across MetLife systems.
•Utilized DBT (Data Build Tool) to transform 70+ raw data tables into clean, documented models, standardizing data modeling workflows and increasing data reliability and trust across analytics and business intelligence teams.
•Deployed workloads on Amazon EKS using Kubernetes, improving scalability and boosting resource utilization by 30% for cloud apps.
ACT Fibernet, India Data Engineer Jun 2019– Jul 2021
•Designed and implemented scalable data architectures using Apache Kafka and Apache Spark, optimizing the flow of broadband performance metrics and usage data, which enhanced user experience and improved network performance by 35%.
•Built and automated robust ETL workflows with SQL Server Integration Services (SSIS) to extract data from CRM systems and network logs, reducing manual processing time and improving data pipeline efficiency by 50%.
•Developed reusable PySpark modules in Databricks, standardizing data transformation logic across multiple teams and domains, resulting in 30% faster deployment of data workflows and improved cross-team collaboration.
Education
University of Cincinnati Cincinnati, Ohio
Master of Science, Information Technology, CGPA: 3.94/4.00 Dec 2024
Certifications
AWS Certified: Data Engineer – Associate Microsoft Certified: Fabric Analytics Engineer – Associate Microsof t Certified: Power BI Data Analyst – Associate