Data Engineer Machine Learning

Location:

Ahmedabad, Gujarat, India

Posted:

September 10, 2025

Contact this candidate

Resume:

Vyshnavi Danda

**************@*****.*** 928-***-**** El Cajon, CA linkedin.com/in/vyshnavi-danda-35b828237/ PROFESSIONAL SUMMARY

Results-driven Data Engineer with 5 years of experience designing, building, and optimizing large-scale data pipelines, Artificial Intelligence, and analytics solutions across diverse industries. Proven expertise in ETL development, data modeling, machine learning, cloud platforms

(AWS, Azure, GCP), distributed systems, and big data frameworks. Skilled in Python, SQL, Apache Spark, Airflow, Kafka, Snowflake, and CI/CD pipelines, delivering scalable, secure, and business-impactful data solutions. Adept at collaborating with cross-functional teams to translate business needs into high-quality, production-grade data products. TECHNICAL SKILLS

Programming Languages: Python (Pandas, NumPy, SQLAlchemy), SQL, C++, JavaScript Databases & Data Warehousing: SQL Server, PostgreSQL, MySQL, Oracle, MongoDB, Snowflake, Amazon Redshift, Azure Synapse Analytics, Schema Design, Indexing, Partitioning, Query Optimization Cloud Platforms & Infrastructure: AWS (S3, Redshift, Glue, Lambda, EMR, EC2, Route53, Elastic Beanstalk, Redshift Spectrum, Kinesis), Azure (Data Factory, Databricks, Blob Storage, Synapse Analytics, Azure Data Lake Storage) Big Data Technologies: Spark (PySpark, Spark SQL), Kafka, Airflow, Hive, NiFi, Hadoop (HDFS, YARN, MapReduce) ETL / ELT & Data Pipelines: AWS Glue, Talend, Informatica Data Modeling & Architecture: Dimensional Modeling, Star Schema, OLAP DevOps & Infrastructure: CI/CD (Jenkins, GitHub Actions), Version Control (Git, GitHub, GitLab), Infrastructure as Code (Terraform, Artificial Intelligence: Supervised & Unsupervised Learning, Reinforcement Learning, Deep learning Architectures (Neural networks, Transfer Learning), Reinforcement Learning, Computer Vision, Natural Language Processing

(Semantic Analysis, Text segmentation), ML Ops (ML Flow), LLM’s (GPT-4, Claude, Time Series Analysis Data Visualization & Analytics: Tableau, Power BI, Amazon QuickSight, Business Intelligence (BI) Methodologies: SDLC, Agile, Scrum, Kanban, Waterfall PROFESSIONAL EXPERIENCE

Mastercard, USA: Data Engineer (AI/ML) Feb 2025 – Present

Designed and implemented serverless data pipelines using AWS Lambda, API Gateway, and DynamoDB, enabling real-time ingestion and processing of structured and unstructured data from S3 events.

Built ETL jobs using AWS Glue and PySpark to transform data from S3, ORC/Parquet/Text formats into Redshift, enhancing analytical insights and reducing manual intervention.

Developed AWS Step Functions workflows integrated with Lambda, SQS, and DynamoDB to manage stateful data processing and improve automation and scalability across pipelines.

Created CloudFormation templates to automate provisioning of AWS resources including S3, RDS, EC2, IAM, SNS, and SQS, ensuring reproducibility and adherence to infrastructure-as-code practices.

Utilized AWS Athena and Hive to perform interactive queries on S3-based datasets and developed external table partitions for optimized big data analysis and reporting.

Implemented AWS CloudWatch to monitor Lambda executions, visualize service latency, and proactively detect performance bottlenecks and errors in real-time.

Edward Jones, USA: AI Data Engineer Intern Jul 2024 – Dec 2024

Collaborated with cross-functional teams to develop predictive models using Random Forest, SVM, Neural Networks, Logistic Regression to address customer churn, retention, and personalized recommendations on Microsoft Azure.

Achieved customer retention uplift by implementing churn prediction models using Scikit-learn for targeted marketing strategies.

Conducted geospatial analysis & visualization using Pandas, Matplotlib and Tableau to uncover location-based customer behavior trends.

Applied Natural Language Processing (NLP) techniques using Python and NLTK to classify and extract sentiment from Foursquare online reviews for customer experience analysis.

Designed and evaluated content-based and collaborative filtering recommendation systems using Apache Spark with Scala on Databricks and AWS to enhance movie recommendation accuracy and scalability. Transol Systems, INDIA: Cloud Data Engineer Apr 2019 – Jul 2023

Designed and deployed scalable data ingestion pipelines with Apache Spark, Hadoop, and Hive, processing 10TB+ of structured and unstructured data daily, which improved pipeline efficiency and scalability for enterprise-wide analytics.

Migrated of on-prem Oracle warehouses to Azure Synapse Analytics, building automated ETL workflows with Azure Data Factory that cut infrastructure costs by 25% while boosting query performance and reliability.

Developed real-time and incremental ETL processes in Azure Data Factory, processing streaming data with under 2 minutes latency, enabling near real-time dashboards for faster operational insights.

Performed data modeling and schema design in Redshift to support high-performance querying and reporting, including the design of logical and physical data models for structured sources.

Improved data governance and compliance by creating Python-based automation for metadata management and lineage tracking, integrating with Azure Purview to speed up audits and issue resolution.

Optimized BI and reporting performance by building OLAP cubes, refining dimensional data models, and tuning Spark jobs, reducing report load times by 30% and improving user adoption of Power BI/Tableau dashboards.

Strengthened data security and team productivity by implementing RBAC and encryption policies for sensitive datasets, mentoring junior engineers on SQL optimization and PySpark, and reducing onboarding time for new hires. EDUCATION

Master of Science in Computer Science: Northern Arizona University, AZ Dec 2024

Contact this candidate