Data Engineer Machine Learning

Location:

Hyderabad, Telangana, India

Posted:

October 15, 2025

Contact this candidate

Resume:

Sai Krishna Janapala

DATA ENGINEER

San fransisco, CA Mail: ***************@*****.*** Ph: 970-***-****

PROFESSIONAL SUMMARY

•5+ years of experience Data Engineer with a strong track record in designing and optimizing ETL pipelines, implementing advanced analytics solutions and leveraging cloud technologies for efficient data processing.

•Proficient in a wide range of technologies including Azure Data Factory, PySpark, AWS Glue, Apache Kafka, Snowflake, Tableau, Power BI, Python and machine learning frameworks like TensorFlow and Scikit-learn.

•Demonstrated ability to manage large-scale data projects, process and analyze massive datasets and develop predictive models to improve business outcomes.

•Adept at collaborating with cross-functional teams and utilizing Agile methodologies to deliver high-quality, data-driven solutions.

•Skilled in optimizing SQL queries, maintaining data integrity and compliance, and automating deployments using CI/CD pipelines.

•Strong background in healthcare, finance, and e-commerce domains with a focus on driving operational efficiency, enhancing decision-making, and ensuring system reliability and performance.

Technical Skills:

Methodologies:

SDLC, Agile, Waterfall.

Programming Language:

Python, SQL, R.

Packages:

NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn.

Visualization Tools:

Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP), Quick Sight, Qlik Sense

IDEs:

Visual Studio Code, PyCharm, Jupyter Notebook, IntelliJ.

Database:

MySQL, PostgreSQL, MongoDB, SQL Server.

Data Engineering Concept:

Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, ETL/ELT, MySQL, PySpark.

Cloud Platforms:

Azure, AWS (EC2, S3, Lambda, Glue, Athena, SNS, RDS, EMR), GCP

Other Technical Skills:

Data Lake, SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Confluence, Terraform, Informatica, Talend, Snowflake, Google Big Query, Data Quality and Governance, Machine Learning Algorithms, Natural Language Processing, Big Data, Advance Analytics, Statistical Methods, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills,

Presentation Skills, Problem-Solving.

Version Control Tools:

Git, GitHub.

PROFESSIONAL EXPERIENCE

Master Card May 2023 - Current

Data Engineer

•Leveraged cloud services to improve performance and scalability, achieving a 20% increase in ROI through cost-effective strategies and efficient resource management.

•Optimized cloud expenses by 30% using Databricks, streamlining data operations and enhancing advanced analytics capabilities to boost organizational efficiency.

•Collaborated with stakeholders to design and optimize data models and storage systems, utilizing MongoDB for unstructured data and PostgreSQL for structured data, improving operational efficiency by 25%.

•Streamlined data validation processes and improved data reliability by 25% through the implementation of DBT, ensuring high data quality with automated testing and validation workflows.

•Orchestrated data workflows using Apache Airflow by managing Directed Acyclic Graphs (DAGs), increasing pipeline efficiency by 35%.

•Designed and implemented ETL pipelines using Apache Spark and Python, automating data ingestion and transformation, improving data flow efficiency by 40%.

•Developed and optimized data pipelines for batch and real-time data processing using Apache Kafka and PySpark, enabling faster data processing and reducing latency by 50%.

•Reduced deployment time by 40% by optimizing CI/CD pipelines with Jenkins, enhancing operational readiness and minimizing downtime.

•Created interactive dashboards using Tableau, integrating with optimized PostgreSQL queries to deliver real-time retail analytics insights, improving decision-making efficiency and stakeholder satisfaction by 30%.

•Optimized complex SQL queries and data models in PostgreSQL, reducing data retrieval times by 40% and enhancing dashboard responsiveness for real-time retail analytics.

•Developed and deployed machine learning models for demand forecasting and price optimization using Python, TensorFlow, and scikit-learn, integrated with the retail analytics dashboard, achieving 92% accuracy in sales predictions and driving an 8% increase in profit margins.

•Documented data pipelines and architecture in Git and Confluence, improving collaboration efficiency by 30% and streamlining analytics dashboard integrations.

•Collaborated with cross-functional teams to implement enterprise metadata frameworks and data governance policies, ensuring compliance with institutional and regulatory standards.

Merck Jan 2020 - Dec 2021

Data Engineer

•Architected scalable ETL pipelines using Azure Data Factory, automating ingestion, transformation, and integration of multi-source healthcare datasets into Azure Data Lake Storage, enhancing data accessibility for downstream analytics.

•Optimized PySpark workflows in Databricks, reducing processing time by 40% (from 8 hours to 5 hours) through advanced cluster tuning, efficient data partitioning, and caching strategies.

•Designed Snowflake-based data pipelines, leveraging Snowflake Streams and Tasks for near-real-time data integration, reducing data latency by 35% and ensuring seamless ingestion of 10+ TB of healthcare data.

•Crafted high-performance SQL queries in Snowflake and Azure Synapse Analytics, applying materialized views, partition pruning, and query result caching to improve data retrieval efficiency and reporting.

•Integrated Databricks and Snowflake to automate bulk data loading, utilizing Snowflake’s COPY command and staging features, minimizing ETL overhead by 30%.

•Implemented Snowflake’s Time Travel and Zero-Copy Cloning, streamlining historical data analysis, debugging, and scenario- based reporting, reducing data recovery time by 50%.

•Built event-driven streaming pipelines integrating Kafka and Databricks Structured Streaming, delivering clinical trial monitoring

with <5-second latency, enabling real-time alerts and decision-making for critical healthcare events.

•Developed predictive machine learning models in Databricks using TensorFlow and Scikit-learn, achieving 92% accuracy for patient adherence predictions and providing actionable insights for personalized treatment strategies.

•Designed interactive Power BI dashboards, connected with Snowflake and Databricks, to deliver real-time insights into drug efficacy and supply chain performance, increasing adoption among stakeholders by 70%.

•Automated CI/CD workflows using Azure DevOps and Jenkins, implementing dynamic validation scripts and robust version control

to accelerate ETL pipeline deployments by 80%.

•Implemented end-to-end data validation frameworks across ETL pipelines, ensuring data integrity and reducing error rates by 30% prior to loading into Snowflake and other downstream systems.

•Enhanced data security by implementing Snowflake’s data masking policies, encryption at rest, and role-based access controls to maintain HIPAA compliance and protect sensitive patient information.

EDUCATION

Master of Science in Computer Science - University of Missouri- Kansas City, Kansas City, MO, USA Jan 2022 – May 2023

Bachelor of Technology in Computer Science – BML MUNJAL University, INDIA. Aug 2016 – May 2020

Contact this candidate