Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Dallas, TX
Posted:
October 15, 2025

Contact this candidate

Resume:

Yamini Malladi Data Engineer

******@***********.*** +1-940-***-**** Dallas,TX LinkedIn GitHub PROFESSIONAL SUMMARY

Data Engineer with 3+ years of experience in designing, developing and optimizing scalable ETL pipelines and cloud-based data solutions. Proficient in Python, SQL and cloud platforms such as Azure, AWS and GCP, with hands-on expertise in tools like Data Factory, Databricks and Apache Spark. Skilled in building real-time data architectures, implementing machine learning models, and delivering actionable insights through interactive business dashboards. Well-versed in collaborating with cross-functional teams to deliver secure, compliant and high-performance data architectures in fast-paced environments. TECHNICAL SKILLS

Programming Languages: Python, SQL, Scala, Bash

Big Data & Data Engineering: PySpark, Hadoop, Apache Spark, Kafka, Airflow, Delta Lake, dbt, ETL/ELT Pipelines Cloud Platforms & Data Warehousing: AWS (S3, Redshift, EMR, Lambda, RDS, EC2), Azure (Data Factory, Databricks, Blob Storage, Text Analytics), GCP (BigQuery), Snowflake Databases: PostgreSQL, MySQL, Oracle, MongoDB, CosmosDB DevOps & Tools: Jenkins, Git, GitHub, GitLab, CI/CD, Docker, Kubernetes, Terraform, REST APIs, Agile Methodologies Analytics & Machine Learning: Power BI, Tableau, Pandas, NumPy, TensorFlow, MLflow, NLP, LLMs PROFESSIONAL EXPERIENCE

Data Engineer, Meditech May 2025 – Present Remote, USA

• Designed real-time Kafka ingestion pipelines to stream unstructured clinical documents into Azure Data Lake, enabling scalable NLP workflows.

• Engineered dbt models on Databricks to refine raw Bronze data into Silver FHIR-aligned tables within a Delta Lake medallion architecture, enforcing schema consistency, data lineage and governance via version-controlled SQL transformations.

• Integrated Azure Text Analytics for Healthcare within Databricks pipelines to automatically extract clinical entities and output FHIR resource bundles for downstream interoperability. Data Engineer, State Street Feb 2024 – April 2025 Remote, USA

• Built an Azure lakehouse using Data Factory, Databricks and Delta Lake, reducing ETL time by 50% and improving Azure SQL query performance by 25%.

• Developed an end-to-end, real-time anomaly detection system with Azure Event Hubs, Stream Analytics and ML models in under 5 seconds across 100K+ daily events.

• Architected secure and compliant storage using Azure SQL, Cosmos DB and Blob Storage, implementing validation and RBAC to resolve 95% of legacy data integrity issues.

• Partnered with ML engineers to deploy risk-forecasting pipelines using Azure ML Studio, Python, TensorFlow and MLflow on gold-layer data, boosting credit deterioration prediction accuracy by 30%.

• Created interactive Power BI dashboards to visualize NAV trends, holdings, risk and anomalies accelerating insights to 50+ stakeholders by 60%.

• Automated CI/CD workflows using Azure DevOps, Terraform, MLflow and Azure Monitor, reducing retraining time by 40% and enhancing pipeline reliability by 35%.

Data Engineer, Genpact Sept 2020 – July 2022 Hyderabad, India

• Automated ingestion of 5 TB of daily transaction log files into AWS S3 using Lambda and EventBridge, enabling near-real- time detection and processing of new data.

• Optimized PySpark jobs on EMR to clean, deduplicate, correct data types and enrich transaction logs by joining with customer profiles stored in PostgreSQL, reducing downstream data errors by 70%.

• Orchestrated ETL workflows with Airflow DAGs, incorporating with retry logic, monitoring and alerting, reducing failed runs by 85%.

• Implemented a Redshift data warehouse using star schema, optimizing join persistence and query performance through bulk-loading of cleaned transaction data from S3 staging.

• Collaborated with ML engineers to compute RFM metrics track model versions through MLflow, automated model retraining and achieving a 25% improvement in model accuracy over baseline.

• Built SQL and data validation tests to ensure accuracy of analytics data, deployed Tableau dashboards to visualize churn risk, customer behavior trends, regional purchase summaries and delivered insights to 20+ business stakeholders. EDUCATION

Master’s of Science in Computer & Information Science, University of North Texas Aug 2022 – May 2024 Denton, Texas Bachelor’s of Technology in Computer Science, GITAM July 2018 – Apr 2022 Hyderabad, India CERTIFICATION

AWS Certified Solutions Architect - Associate



Contact this candidate