Yamini Malladi Data Engineer
******@***********.*** +1-940-***-**** Dallas,TX LinkedIn GitHub PROFESSIONAL SUMMARY
Data Engineer with 3+ years of experience in designing, developing and optimizing scalable ETL pipelines and cloud-based data solutions. Proficient in Python, SQL and cloud platforms such as Azure, AWS and GCP, with hands-on expertise in tools like Data Factory, Databricks and Apache Spark. Skilled in building real-time data architectures, implementing machine learning models, and delivering actionable insights through interactive business dashboards. Well-versed in collaborating with cross-functional teams to deliver secure, compliant and high-performance data architectures in fast-paced environments. TECHNICAL SKILLS
Programming Languages: Python, SQL, Scala, Bash
Big Data & Data Engineering: PySpark, Hadoop, Apache Spark, Kafka, Airflow, Delta Lake, dbt, ETL/ELT Pipelines Cloud Platforms & Data Warehousing: AWS (S3, Redshift, EMR, Lambda, RDS, EC2), Azure (Data Factory, Databricks, Blob Storage, Text Analytics), GCP (BigQuery), Snowflake Databases: PostgreSQL, MySQL, Oracle, MongoDB, CosmosDB DevOps & Tools: Jenkins, Git, GitHub, GitLab, CI/CD, Docker, Kubernetes, Terraform, REST APIs, Agile Methodologies Analytics & Machine Learning: Power BI, Tableau, Pandas, NumPy, TensorFlow, MLflow, NLP, LLMs PROFESSIONAL EXPERIENCE
Data Engineer, Meditech May 2025 – Present Remote, USA
• Designed real-time Kafka ingestion pipelines to stream unstructured clinical documents into Azure Data Lake, enabling scalable NLP workflows.
• Engineered dbt models on Databricks to refine raw Bronze data into Silver FHIR-aligned tables within a Delta Lake medallion architecture, enforcing schema consistency, data lineage and governance via version-controlled SQL transformations.
• Integrated Azure Text Analytics for Healthcare within Databricks pipelines to automatically extract clinical entities and output FHIR resource bundles for downstream interoperability. Data Engineer, State Street Feb 2024 – April 2025 Remote, USA
• Built an Azure lakehouse using Data Factory, Databricks and Delta Lake, reducing ETL time by 50% and improving Azure SQL query performance by 25%.
• Developed an end-to-end, real-time anomaly detection system with Azure Event Hubs, Stream Analytics and ML models in under 5 seconds across 100K+ daily events.
• Architected secure and compliant storage using Azure SQL, Cosmos DB and Blob Storage, implementing validation and RBAC to resolve 95% of legacy data integrity issues.
• Partnered with ML engineers to deploy risk-forecasting pipelines using Azure ML Studio, Python, TensorFlow and MLflow on gold-layer data, boosting credit deterioration prediction accuracy by 30%.
• Created interactive Power BI dashboards to visualize NAV trends, holdings, risk and anomalies accelerating insights to 50+ stakeholders by 60%.
• Automated CI/CD workflows using Azure DevOps, Terraform, MLflow and Azure Monitor, reducing retraining time by 40% and enhancing pipeline reliability by 35%.
Data Engineer, Genpact Sept 2020 – July 2022 Hyderabad, India
• Automated ingestion of 5 TB of daily transaction log files into AWS S3 using Lambda and EventBridge, enabling near-real- time detection and processing of new data.
• Optimized PySpark jobs on EMR to clean, deduplicate, correct data types and enrich transaction logs by joining with customer profiles stored in PostgreSQL, reducing downstream data errors by 70%.
• Orchestrated ETL workflows with Airflow DAGs, incorporating with retry logic, monitoring and alerting, reducing failed runs by 85%.
• Implemented a Redshift data warehouse using star schema, optimizing join persistence and query performance through bulk-loading of cleaned transaction data from S3 staging.
• Collaborated with ML engineers to compute RFM metrics track model versions through MLflow, automated model retraining and achieving a 25% improvement in model accuracy over baseline.
• Built SQL and data validation tests to ensure accuracy of analytics data, deployed Tableau dashboards to visualize churn risk, customer behavior trends, regional purchase summaries and delivered insights to 20+ business stakeholders. EDUCATION
Master’s of Science in Computer & Information Science, University of North Texas Aug 2022 – May 2024 Denton, Texas Bachelor’s of Technology in Computer Science, GITAM July 2018 – Apr 2022 Hyderabad, India CERTIFICATION
AWS Certified Solutions Architect - Associate