Machine Learning Data Engineering

Location:

Dallas, TX

Salary:

60 $/hr on c2c

Posted:

April 25, 2025

Contact this candidate

Resume:

Mrudula Vijayanarasimha

Phone no: +1-585-***-****

@ *******.*****@*****.***

LinkedIn: MrudulaV GitHub: MrudulaV

Senior Cloud Data Engineer Mlops Engineer

Professional Summary

7+ years IT experience in Data Engineering and Machine Learning across various industries including eCommerce, Manufacturing and Finance etc. Leading a small team of Mlops engineers to build end to end orchestration from data collection, ingestion, transformation and prediction to usecases.

Technical Skills:

Programming Languages Python, Scala, SQL, Java, .NET Cloud Services AWS, GCP, API

Databases MySQL, DBT, SQL Server, Oracle, Teradata, Snowflake NoSQL Data Bases MongoDB, DynamoDB, Cassandra DB, HBase DevOps Docker, Kubernetes, CI/CD, Airflow, Jenkins, GIT, Hadoop, Spark, Kafka Data Engineering ETLs, DBT, Teradata, Iceberg, Airflow, Modeling, Performance Tuning, Warehousing AI/ML GenAI, NLP techniques, Deep Learning methodologies, Prompt Engineering, Building LLMs

Certifications:

AWS Certified Data Analytics/ML– Specialty

Google Cloud Professional Data Engineer

GCP Machine Learning Engineer

Microsoft Certified: Azure AI Engineer Associate

Education:

Masters’ in Computer Science Bachelor’s in Computer Science Rochester Institute of Technology Visvesvaraya Technological University Aug 2017 – May 2020 Rochester, NY June 2013 – May 2017 Bangalore, India Professional Experience:

Client: BMW, Greenville, SC

GenAI Engineer / MLops Engineer Nov 2021 to Present Responsibilities:

• Developed and maintained Terraform and Go modules to manage a serverless cloud native data-system platform (Cloud Data Hub) using Python, Snowflake, and SQL optimizing data retrieval between pipelines by 30%.

• Led a GEN AI project that integrated data from various distributed frameworks to develop a chatbot-style interactive platform. The platform provided users with real-time answers to data-related queries and utilized predictive analytics to forecast future trends and numbers.

• Implemented a predictive maintenance solution for a fleet by orchestrating data workflows using Python, Teradata, Scala, Airflow that predict vehicle failures, optimize part replacements, and minimize downtime using containers (Docker, Kuberenetes, Jenkins) in cloud.

• Orchestrated AWS, utilizing EC2, S3, SageMaker, Redshift, Lambda, CloudWatch, and Step Functions, to develop and lead an MLops architecture to streamline end-to-end ETL processes. Resulting in a 15% reduction in processing time and efficient flow for delivering results to S3, enhancing overall data accessibility.

• Engineered and implemented a Machine Learning model using PyTorch, XGBoost, Pandas, NumPy, Scikit leveraging 503 car use cases, incorporating object detection and image classification, identifying defects in automobiles, enhancing quality control processes, resulting in a monthly cost savings of $50M global plants.

• Proficient in Python, AI/ML libraries (PyTorch, TensorFlow, LangChain), AWS AI services (Bedrock, Knowledge Base), and serverless (Lambda, API Gateway, DynamoDB); solid experience with Generative AI, LLMs (GPT, Claude, Llama), Retrieval-Augmented Generation (RAG), and vector databases (Pinecone, OpenSearch).

• Designed and implemented several data pipelines and orchestrations using Terraform and Github Actions as IaC to support many domains like Sales, supply Chain, Quality with their logistical operations.

• Enhanced data validation for pipeline operations by implementing a Simple Data Platform framework, conducting checks post job completion using APIs. Resulting in a notable 25% improvement in accuracy and reliability among businesses.

• Experienced in designing, developing, fine-tuning, and deploying scalable AI solutions using foundation models and cloud-based architectures; actively collaborates with cross-functional teams, follows MLOps best practices, ensures regulatory compliance, and remains updated on emerging trends in generative AI.

• Applied Apache Spark and Kafka in leveraging big data technologies for extracting insights and achieved a 30% increase in actionable insights, coupled with a 25% reduction in data latency.

• Led a high-alert OPS issues board, guiding a team of data engineers in collaboration with Data Governance with business owners and fellow engineers. Successfully addressed data issues, achieving 100% resolution rate. Environment: AWS services, Python, Spark, Terraform, APIs, Tableau, ML libraries, Scala, SQL, Hadoop, Apache Kafka, ETL, Hive, Teradata, GIT, Apache Airflow. Client: OrangeBees Greenville, SC

Data Engineer / Data Scientist May 2020 to Oct 2021 Description: It aims to enhance customer engagement and deliver personalized data experiences to its clients. The objective of this project is to develop a customer engagement and personalized platform that leverages data engineering techniques to analyze customer data, identify preferences, and tailor banking services accordingly. Responsibilities:

• Utilized Dataflow and Cloud Dataproc on GCP to compose MapReduce jobs in Java, Pig, and Python, facilitating efficient data processing and analysis.

• Implemented Hadoop MapReduce and HDFS concepts within GCP, developing Java-based MapReduce jobs to clean and process data effectively.

• Involved in setting up CI/CD pipelines with Cloud Run, automating the deployment process for efficient and rapid updates of applications.

• Designed and implemented scalable Azure Data Solutions, including Azure Data Factory, Azure Databricks, and Azure Synapse Analytics to support high-volume data processing and analytics workloads.

• Implemented advanced DevSecOps practices in Azure environments, integrating security into CI/CD pipelines and automated compliance checks.

• Analyzed and optimized Azure costs using Azure Cost Management and Azure Advisor, achieving a 30% reduction in monthly cloud expenses. Optimized large-scale data processing workflows using Azure Data Lake Analytics and Azure HDInsight, improving data processing times by 40%.

• Automated cloud infrastructure provisioning using Azure Resource Manager (ARM) templates and terraform, significantly reducing deployment times and human error.

• Stored both raw and processed data in Google Cloud Storage (GCS), organizing a structured data lake to ensure accessibility and usability across various use cases.

• Proficient in implementing error handling mechanisms and optimizing Python scripts for improved performance and efficiency in data processing workflows within GCP.

• Leveraged monitoring and logging tools within Cloud Run, ensuring visibility into application performance, error tracking, and resource utilization.

• Executed Hive queries within GCP, enabling market analysts to identify emerging trends by comparing new data with Enterprise Data Warehouse (EDW) reference tables and historical statistics.

• Automated data loading into GCP's Hadoop Distributed File System (HDFS) using Oozie, optimizing with PIG for preprocessing data, allowing for quick reviews and competitive advantages.

• Utilized Google Cloud Data prep and Dataproc for transforming and transferring large data volumes within and across GCP services like Cloud Storage and Google Cloud Bigtable. Environment: Python, Java, Teradata, Scala, Flink, AWS services, SQL, Spark, SQL Server, Kafka, Terraform, PowerShell, Apache Airflow, Hadoop, YARN, Hive.

Client: Wegmans Food Markets Rochester, NY

Data Management Sep 2017 to Jan 2020

• Developed serverless functions on GCP using Python, utilizing Cloud Functions to execute small units of code triggered by various events for streamlined data processing.

• Introduced a Holiday consumption model-to accurately improve stacking on shelves every holiday & saved 20M revenue.

• Collaborated with cross-functional teams to develop and implement reporting solutions using Genesys WFM and Power BI, facilitating streamlined communication and data-driven decision-making processes.

• Multiprocessing implementation for Database Extraction Engine (ETL/ ELT).

• Improved Application Performance by 30% by converting conventional result sets into Bulk Fetching and Bulk Binding techniques.

Client: Riversand Technologies Ltd., Bengaluru, India Software Development Intern Jan 2015 to Jan 2017

• Produced code for development team by building a web-based Employee Leave Management portal using Java, C# and .NET.

• Analyzed the origin and spread of Zika virus across the globe through my data visualization techniques to get clarity on what measure works for which region and further help contain the virus to help future generations live in a safe world.

• Successfully loaded data into SQL Server and analyzed the result. Cut projected time for data analysis by one week by developing reusable ETL components.

• Implemented ETL processes and queries based on BWand S/4HANA for SAP Analytics Cloud Dashboard leading to a 90% increase in operational excellence and streamlined payroll processing.

Contact this candidate