Data Engineer Machine Learning

Location:

Frisco, TX, 75034

Posted:

September 10, 2025

Contact this candidate

Resume:

ANUSH REDDY

Sr.Data Engineer 215-***-**** ***********@*****.*** LinkedIn

PROFESSIONAL SUMMARY

• Senior Cloud Data & Machine Learning Engineer with 6+ years of experience designing and deploying scalable, cloud-native data and AI/ML solutions across finance, healthcare, and technology sectors, leading cross-functional teams of 10-15 members and managing annual cloud infrastructure budgets.

• Architected robust data platforms on AWS and Azure, supporting high-volume transactions with high availability, IAM-based security, and regulatory compliance, achieving 89.9% uptime and reducing infrastructure costs by 35% through strategic optimization initiatives.

• Built end-to-end ML workflows using Amazon SageMaker Pipelines, Model Registry, SageMaker Hosting, and CI/CD automation with CodePipeline and CodeBuild, improving model deployment speed by 80% and establishing MLOps standards across enterprise teams.

• Delivered real-time data pipelines for fraud detection, AML, and credit scoring using Apache Kafka, Amazon Kinesis, and Apache Spark Streaming on EMR, ensuring low-latency processing and reducing fraud detection response time by 90% from hours to seconds.

• Developed metadata-driven ETL frameworks using Apache Airflow, AWS Glue, Spark (PySpark), and Azure Data Factory, with Glue Schema Registry and Step Functions for orchestration, collaborating with business stakeholders to ensure alignment with enterprise data governance policies.

• Achieved significant cost savings through serverless architecture, use of Spot Instances, and advanced Spark optimization strategies, mentoring 25+ engineers on cost-effective cloud practices and establishing center of excellence for cloud cost management.

• Implemented monitoring using SageMaker Model Monitor, Amazon CloudWatch, and MLflow, enabling drift detection and automated model refresh workflows, partnering directly with C-suite executives to provide real-time business intelligence and predictive analytics insights.

• Built scalable data warehouse solutions using Snowflake, Azure Synapse, Redshift Spectrum, and Delta Lake to support enterprise BI and analytics, leading architecture review boards and driving technical decisions for data modernization projects.

• Ensured data privacy and compliance with GDPR and PCI-DSS through KMS encryption, Lake Formation, and fine-grained IAM roles, collaborating with legal and compliance teams to maintain audit readiness across all data platforms.

• Modernized legacy systems (e.g., Microsoft Dynamics NAV, on-prem Hadoop) by migrating to AWS serverless infrastructure using EC2, Lambda, and S3, serving as technical project lead and delivering migration projects 20% ahead of schedule.

• Used Terraform and AWS CloudFormation to standardize infrastructure as code, ensuring consistency, establishing DevOps best practices across development teams and reducing deployment failures by 95% through automated testing and validation.

• Collaborated with teams across fraud analytics, regulatory compliance, and data science, aligning platform capabilities with critical business needs, facilitating weekly stakeholder meetings and translating technical concepts for executive leadership and business users.

• Mentored junior engineers in Spark tuning, MLOps best practices, and cloud architecture patterns using tools like Databricks, Jupyter, and GitHub Actions, with 92% of direct reports achieving promotion within 18 months under guidance.

• Led digital transformation initiatives spanning multiple business units, driving consensus among 8+ departments on data strategy and architecture decisions, while managing vendor relationships and negotiating enterprise contracts annually.

• Established data engineering excellence programs including code review standards, performance benchmarking, and knowledge sharing sessions, conducting quarterly architecture reviews and presenting technical roadmaps to executive committees and board members.

TECHNICAL SKILLS

Cloud Technologies: AWS (S3, Glue, EMR, Lambda, Kinesis, Athena, Redshift, SageMaker, Lake Formation, IAM, CloudWatch, Step Functions, KMS, MSK, API Gateway, CodePipeline), Azure (Data Factory, Databricks, Data Lake Storage Gen2, Synapse Analytics, Event Hubs, Azure SQL Database, Functions, DevOps)

Big Data Technologies: Apache Spark, PySpark, Apache Kafka, Hadoop, HDFS, Hive, Sqoop, Spark Streaming, Airflow Cloud Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse Analytics, Delta Lake Databases: SQL, T-SQL, PostgreSQL, MongoDB, Oracle Machine Learning & MLOps: Amazon SageMaker, MLflow, XGBoost, scikit-learn, Feature Store ETL/Data Engineering: Apache Airflow, AWS Glue, Azure Data Factory, SSIS Programming Languages: Python, SQL, Scala, Shell Scripting DevOps & Infrastructure as

Code:

Terraform, AWS CloudFormation, Git, Jenkins, CI/CD Pipelines Business Intelligence: Tableau, Power BI

Operating Systems: Linux, Windows

WORK EXPERIENCE

Client: Wells Fargo, Charlotte, NC, USA

Role: AWS Data Engineer & Machine Learning Engineer Duration: Feb 2023 – Present Summary: Delivered secure, scalable data and ML pipelines on AWS to support real-time fraud detection, AML compliance, and credit scoring. Enabled sub-minute risk alerts and streamlined model deployment using SageMaker, improving business response to evolving fraud patterns and enhancing regulatory compliance. Key Responsibilities & Achievements:

• Built secure data lakes using Amazon S3, Lake Formation, and Glue, enabling real-time fraud detection across all business units while reducing false positive alerts and improving customer experience.

• Automated MLOps workflows using SageMaker Pipelines, Model Registry, and CodePipeline, reducing model deployment time from weeks to hours while improving prediction accuracy through standardized ML workflows.

• Implemented real-time data pipelines with Kinesis, Kafka (MSK), and Spark Streaming on EMR, delivering sub-minute risk alerts enabling proactive fraud prevention and faster decision-making.

• Migrated legacy ETL to serverless frameworks using AWS Glue and Step Functions, reducing infrastructure maintenance overhead while improving processing speed through reusable PySpark templates.

• Deployed auto-scaled ML inference endpoints using SageMaker Hosting and API Gateway, supporting traffic spikes during peak hours while maintaining uptime for critical credit scoring decisions.

• Implemented data quality controls using Great Expectations and AWS Deequ, achieving enterprise-grade data accuracy and ensuring GDPR/PCI-DSS compliance avoiding regulatory issues.

• Optimized costs using Athena, Redshift Spectrum, and Spot Instances, substantially reducing compute costs while maintaining performance for compute-intensive Spark workloads.

• Led infrastructure automation using Terraform and CloudFormation, improving deployment consistency and mentored 5 junior engineers in MLOps best practices.

• Developed end-to-end CI/CD pipelines using AWS CodePipeline, CodeBuild, and Git, automating deployment of data and ML workloads, reducing release cycles and increasing system reliability.

• Developed feature engineering pipelines using SageMaker Feature Store and Lambda, creating reusable features that improved model performance across multiple fraud detection use cases.

• Implemented model monitoring and drift detection using SageMaker Model Monitor, enabling automated model retraining and maintaining high model accuracy over time.

Environment / Tools & Technologies: AWS (S3, Lake Formation, Glue, SageMaker, EMR, Kinesis, Lambda, CodePipeline, API Gateway, Athena, Redshift Spectrum, CloudWatch, IAM, KMS, MSK) · Apache Spark · PySpark · Kafka · SageMaker Pipelines

· Feature Store · Model Registry · MLflow · Delta Lake · Great Expectations · Terraform · CloudFormation · Python · SQL · Scala

Client: HP, Austin, TX, USA

Role: Data Engineer Duration: June 2020 – Aug 2022 Summary: Designed hybrid cloud ETL and streaming pipelines across AWS and Snowflake, enabling real-time analytics for finance and operations. Supported leadership dashboards with low-latency data from supply chain and transactional systems, helping improve inventory decisions and operational efficiency.

Key Responsibilities & Achievements:

• Designed ETL pipelines using Apache Airflow and Spark on EMR, automating data flow from S3 to Snowflake reducing manual processing effort and improving data freshness for executive dashboards.

• Built real-time streaming pipelines with Kafka (MSK) and Spark Streaming, enabling real-time supply chain analytics helping reduce inventory holding costs through better demand forecasting.

• Optimized Snowflake performance with clustering keys and parallel loading, improving query response times and reducing data warehouse operational costs through efficient data modeling.

• Built scalable data lake architectures on S3 with Lake Formation, centralizing data from 15+ disparate sources improving data discoverability and enabling self-service analytics.

• Automated infrastructure monitoring with CloudFormation and CloudWatch, reducing pipeline downtime through proactive monitoring and automated recovery mechanisms.

• Developed CI/CD workflows using Jenkins and CodePipeline, accelerating deployment cycles from days to hours improving development team productivity and release velocity.

• Migrated on-premises systems to AWS EC2 with secure IAM and VPC, eliminating legacy infrastructure maintenance costs while improving system reliability and security posture.

• Integrated Tableau dashboards with Snowflake for executive reporting, providing real-time operational visibility enabling data- driven decisions that improved operational efficiency.

• Implemented data validation frameworks using Python and custom scripts, ensuring data quality and consistency across all ETL processes reducing downstream analytical errors.

• Built data enrichment tools using BeautifulSoup and Lambda functions, enhancing raw data with external sources providing richer context for business analytics.

• Configured HBase for real-time data storage and retrieval, supporting low-latency queries for operational dashboards.

• Established data governance policies using Glue Catalog, improving data lineage tracking and compliance reporting for audit. Environment / Tools & Technologies: AWS (EC2, S3, EMR, CloudWatch, IAM, Lambda, Step Functions, CodePipeline, MSK, Glue, Lake Formation, KMS, VPC) · Snowflake · Apache Spark · Kafka · Airflow · Hadoop · Hive · HDFS · Sqoop · HBase · Python · SQL · Scala · Shell · PowerShell · Docker · Jenkins · Git · Tableau Client: Blue Cross Blue Shield (BCBS), Chicago, IL, USA Role: Azure Data Engineer Duration: Nov 2018 – Jun 2020 Summary: Built compliant, end-to-end Azure data pipelines for healthcare finance, enabling real-time fraud detection, claims analysis, and regulatory reporting. Improved data freshness and reporting accuracy using Azure Synapse, Databricks, and Stream Analytics, while ensuring GDPR and PCI-DSS compliance.

Key Responsibilities & Achievements:

• Orchestrated large-scale data ingestion using Azure Data Lake Gen2 and Data Factory, processing healthcare data daily enabling real-time fraud detection and compliance monitoring for claims processing.

• Automated complex ETL workflows using ADF, Logic Apps, and Azure Functions, reducing data processing time and improving data accuracy for financial reporting and regulatory submissions.

• Built real-time pipelines using Event Hubs and Databricks (PySpark), enabling executive dashboards and reducing claim processing time from days to hours improving customer satisfaction.

• Implemented schema evolution and data quality controls via ADF Schema Drift, achieving high data consistency across all pipelines ensuring accurate regulatory reporting and compliance.

• Developed CI/CD frameworks using Azure DevOps and ARM templates, improving deployment velocity and infrastructure consistency reducing environment-related deployment issues.

• Configured Azure Databricks Spark Streaming for real-time event processing, handling high-volume financial events with accuracy enabling real-time compliance monitoring and fraud detection.

• Ensured secure, compliant pipelines with Azure Key Vault and Active Directory, achieving GDPR and PCI-DSS compliance requirements avoiding regulatory penalties and maintaining customer trust.

• Delivered reporting solutions via Power BI and Azure Synapse, providing real-time risk intelligence to compliance teams improving regulatory response times and decision-making.

• Designed data modeling solutions using star and snowflake schemas, optimizing query performance for business intelligence workloads and improving analytical reporting capabilities.

• Implemented automated data quality checks and validation rules, ensuring data integrity and consistency across healthcare data sources preventing downstream analytical errors.

• Built scalable ingestion pipelines for structured and semi-structured data, supporting various file formats and real-time streaming requirements enabling comprehensive data integration.

• Implemented Azure Monitor to track pipeline performance and data quality, enabling proactive issue resolution. Environment / Tools & Technologies: Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2, Azure Blob Storage, Azure SQL Database, Azure Synapse Analytics, Azure Monitor, Azure Event Hubs, Azure Functions, Azure Logic Apps, Azure Cosmos DB, PowerShell, Azure CLI, Apache Spark, PySpark, Kafka, MongoDB, Snowflake, Hive, SQL, T-SQL, SSIS, Power BI, Azure DevOps, Scala, Azure Key Vault, Azure Active Directory EDUCATION

Masters in Computer Science, Texas A&M University, Kingsville, Texas GPA: 4.0 CERTIFICATIONS

• Azure Fundamentals (DP-900)

• Azure Data Engineer Associate (DP-203)

• AWS Certified Data Engineer – Associate.

Contact this candidate