Data Engineer Machine Learning

Location:

Dallas, TX

Salary:

60000

Posted:

September 10, 2025

Contact this candidate

Resume:

KALYAN A

Data Engineer

+* (***) ***- **** ******.*****@*****.*** https://www.linkedin.com/in/kalyansaiakuthotaa

PROFESSIONAL SUMMARY

Experienced Data Engineer with around 4 years of experience in designing and implementing end-to-end data solutions across healthcare and retail industries. Expertise in big data processing, machine learning, and cloud computing, driving business intelligence, automation, and predictive analytics.

Hands-on experience with big data technologies including Apache Spark, Hadoop, Databricks, Azure Databricks, and AWS EMR for high-performance data processing.

Led the development of automated data pipelines and data lakes on Azure Data Lake Storage for efficient data storage and retrieval.

Expertise in data warehousing solutions like Snowflake, Redshift, BigQuery, Azure SQL, and Amazon Redshift, ensuring efficient storage and retrieval.

Integrated Azure Synapse Analytics for large-scale data warehousing and analytics, improving query performance by 25%.

Developed real-time data workflows using Apache Kafka, AWS Kinesis, Azure Event Hubs, and Azure Stream Analytics, reducing manual intervention by 40%.

Utilized Azure Logic Apps and Azure Data Factory to streamline ETL processes, enhancing operational efficiency.

Proficient in Python, Pandas, PySpark, and SQL, leveraging data wrangling and transformation for scalable ML pipelines.

Experience in deploying ML models in cloud environments using Azure Machine Learning and AWS SageMaker.

Skilled in ML model development for predictive analytics, fraud detection, and recommendation systems, improving business operations.

Strong expertise in fraud detection models using graph-based network analysis in Azure Databricks and AWS SageMaker, reducing fraudulent claims processing by 30%.

Implemented data validation frameworks using Great Expectations, ensuring data integrity and compliance in both AWS and Azure environments.

Strong background in CI/CD automation, implementing Jenkins, Terraform, GitHub Actions, and Azure DevOps for continuous deployment of ML and data pipelines.

Proficient in data visualization using Power BI, Tableau, and Looker, enabling data-driven decision-making.

Experienced in NLP and deep learning techniques, extracting insights from unstructured healthcare data and provider notes.

EDUCATION

University of Texas at Dallas 2022 - 2024

Master’s in Business Analytics.

Amrita Viswa vidhyapeetham 2018 - 2022

Bachelor’s in Computer Science Engineering

SKILLS

Programming: Python (Pandas, NumPy, SQLAlchemy), C#

Data Warehousing: Snowflake, Redshift, BigQuery, MySQL, Azure,SQL

Data Visualization: Power BI, Tableau, Matplotlib, Seaborn

Cloud Platforms: AWS (RDS, S3, Lambda), Azure, GCP

Version Control: Git, GitHub

Containerization: Docker

Cloud Platforms: AWS, GCP, Azure

Real-Time Systems: Apache Kafka, AWS Kinesis, Lambda

Big Data: Spark, Hadoop, Databricks

WORK EXPERIENCE

Data Engineer August 2023 - Present

United Healthcare USA

Built and maintained a modern data platform to support real-time analytics, claims processing, and fraud detection for clinical and operations teams. This involved ingesting large volumes of healthcare data, applying business transformations, and delivering insights through dashboards and automated reports. The solution utilized cloud-native tools and followed best practices for data modeling, validation, and deployment automation.

Developed automated ETL pipelines using Azure Data Factory to ingest and transform data from claims, EHR systems, and provider feeds.

Modeled dimensional data marts using star schema and snowflake schema patterns in Snowflake and Azure SQL, enabling performance-optimized reporting.

Gathered and documented business and user requirements through cross-functional sessions with analysts and clinical stakeholders.

Created dynamic and interactive dashboards in Power BI for monitoring patient risk scores, claim rejection rates, and operational KPIs.

Integrated with external systems using REST APIs and parsed responses using Python, enabling real-time claim and eligibility validations.

Implemented data quality checks using Great Expectations, ensuring schema compliance, null handling, and referential integrity before warehouse load.

Scheduled daily and event-driven data workflows via ADF triggers, ensuring data readiness for reporting by early morning business hours.

Wrote optimized SQL queries and used partitioning techniques to reduce report load time and query latency in Power BI and Snowflake.

Monitored data pipeline health and failures using Azure Monitor, reducing incident resolution time through proactive alerting and logging.

Documented pipeline logic, job configurations, business rules, and data mappings for technical onboarding and audit readiness.

Collaborated with DevOps teams to automate deployment pipelines using GitHub Actions, accelerating delivery and reducing manual errors.

Conducted peer code reviews and implemented version control best practices in Git to maintain clean, production-ready codebases.

Participated in daily standups and sprint planning to prioritize and align deliverables in Agile ceremonies.

Conducted root cause analysis of data anomalies and implemented remediation scripts in SQL and Python to correct upstream issues.

Worked closely with business intelligence teams to define metrics and KPIs for reporting dashboards and regulatory submissions.

Built reusable Python modules for common transformation logic and shared across multiple ADF pipelines to promote modularity.

Deployed pipeline changes across dev, test, and prod environments while adhering to SDLC and change management processes.

Provided post-deployment support for production pipelines, resolving load failures, schema mismatches, and data latency issues

Data Analyst May 2021 - Jun 2022

Tech Mahindra India

Developed a cloud-based data analytics platform using AWS to optimize sales forecasting, inventory management, and customer segmentation at Apollo Pharmacy. Leveraged tools like Amazon SageMaker, AWS Glue, and Amazon QuickSight to improve decision-making, enhance demand prediction accuracy by 30%, and streamline operational processes.

Built and optimized data pipelines using AWS Glue to automate the extraction, transformation, and loading (ETL) of sales and customer transaction data from multiple sources, improving processing efficiency by 40%.

Designed and deployed a centralized data lake using Amazon S3 to store both structured and unstructured data, ensuring scalability and easy access for analytical purposes.

Automated real-time data transformations using AWS Lambda, reducing manual data processing effort by 50% and enabling faster decision-making.

Developed machine learning models for demand forecasting using Amazon SageMaker, improving prediction accuracy by 30% and enhancing inventory management.

Implemented customer segmentation models using Amazon SageMaker to categorize customers based on purchasing behavior and risk profiles, improving targeted marketing efforts.

Integrated data storage and analytics by using Amazon Redshift for fast querying of large datasets, reducing query time by 25%.

Created interactive dashboards in Amazon QuickSight to visualize real-time sales, inventory, and customer behavior, improving operational efficiency by 12%.

Automated reporting workflows by integrating Amazon CloudWatch to monitor performance and alert teams on data discrepancies.

Scheduled daily reporting jobs using AWS Step Functions, streamlining the reporting process and reducing manual intervention by 40%.

Designed and optimized SQL-based forecasting models in Amazon Redshift, improving retail sales prediction accuracy by 30%.

Developed and deployed anomaly detection models using Amazon SageMaker to identify unusual sales patterns and mitigate potential fraud.

Integrated Amazon RDS (SQL Server) with Amazon QuickSight to provide real-time insights into sales performance, inventory levels, and customer trends.

Implemented real-time transaction monitoring using Amazon EventBridge to trigger alerts for high-value transactions, enhancing fraud prevention.

Optimized inventory management by leveraging Amazon Forecast to predict future demand and minimize stockouts and overstocking issues.

Improved data quality and governance by implementing data validation techniques and monitoring data consistency using AWS Glue.

Built a customer lifetime value (CLV) model using Amazon SageMaker, helping segment customers and increase marketing ROI by 15%.

Collaborated with business stakeholders to define key performance indicators (KPIs) and metrics, ensuring alignment between analytics outputs and business goals.

Ensured data security and compliance by implementing encryption, access controls, and monitoring practices with Amazon S3 and AWS IAM.

PROJECTS

Real-Time Monitoring System with Machine Learning for Compliance in Retail Environments

Designed a real-time monitoring system using Python and OpenCV, incorporating the VGGNet and Haar Cascade models for face mask detection and social distance compliance.

Deployed the application in a Docker container, ensuring consistent performance across multiple environments.

Integrated the system with AWS Lambda for scalable processing and AWS S3 for storing captured footage, demonstrating proficiency in serverless architecture.

Adapted the system for retail use, enabling real-time monitoring of customer compliance and reducing operational risks by 20%.

Contact this candidate