Data Engineer Machine Learning

Location:

Ahmedabad, Gujarat, India

Posted:

September 10, 2025

Contact this candidate

Resume:

Meena Gandham Data Engineer

Tampa, Florida +1-813-***-**** **********@*****.*** LinkedIn

SUMMARY

Data Engineer with over 2 years of experience in building scalable, cloud-native data pipelines and automating ETL workflows. Proficient in Python, SQL, and PySpark for data transformation and pipeline automation. Experienced with Apache Airflow, AWS Glue, and cloud platforms like AWS and Azure for orchestration and deployment. Skilled in integrating data workflows with machine learning systems, including data preparation, model training, and inference processes. Focused on optimizing data processing, reducing operational costs, and ensuring high system reliability. Strong understanding of data security protocols, regulatory compliance frameworks, and collaborative software development practices within cross-functional teams. TECHNICAL SKILLS

Programming & Scripting: Python, SQL, Java, Scala, Bash, Shell Scripting Web Frameworks: Flask, Django, RESTful APIs

Data Warehousing & Storage: Amazon S3, Snowflake, Amazon Redshift, Azure Data Lake Storage, Azure Synapse, Google BigQuery ETL/ELT & Workflow Orchestration: Apache Airflow, Apache NiFi, AWS Glue, dbt, Talend

Databases: SQL – PostgreSQL, MySQL, SQL Server; NoSQL – MongoDB, Cassandra, DynamoDB

Big Data Ecosystem: Apache Spark, Hadoop, Kafka, Hive, Databricks, Flink Cloud Platforms: AWS (S3, Redshift, EC2, EMR, Glue, Lambda, SageMaker), Azure (Data Lake Storage, Data Factory, Synapse) Data Modeling & Architecture: Dimensional Modeling, Star Schema, Snowflake Schema, Normalization

AI/ML & Statistical Modeling: Scikit-learn, TensorFlow, Keras, XGBoost, MLlib, PyTorch, Pandas, NumPy, Matplotlib, Seaborn, Feature Engineering, Model Deployment, Hyperparameter Tuning

Data Governance & Security: Data Quality Assurance, Data Lineage, Role- Based Access Control (RBAC), Compliance (SOX, FINRA, PII Protection, Financial Regulatory Reporting).

DevOps & Version Control: Git, Docker, Kubernetes, CI/CD Pipelines, Jenkins, Terraform

BI & Visualization Tools: Excel, Tableau, Looker, Power BI WORK EXPERIENCE

Data Engineer, Morgan Stanley Florida June 2024 – Present

• Assisted in developing scalable batch and streaming data pipelines for trading analytics using Apache Kafka, PySpark, and Airflow. Processed 1.2 TB of structured and semi-structured financial data daily, applying distributed data processing and orchestration best practices.

• Supported ingestion from upstream trading and risk systems into an AWS S3-based data lake by scripting ingestion using boto3, validating schema consistency, and tagging metadata, which improved data discoverability and reduced duplication by 35%.

• Analyzed PySpark job performance on AWS EMR with Spark UI; optimized inefficient transformations by reordering filters and simplifying joins, achieving a 42% reduction in ETL runtime for regulatory reporting datasets.

• Built PySpark based pipelines to sync features between AWS Glue and SageMaker Feature Store, reducing data preprocessing redundancy by 40% and improving model consistency.

• Automated full ML workflows with Airflow and SageMaker, handling data prep, training, tuning, and inference with robust error handling and conditional logic in DAGs.

• Integrated ML-based anomaly detection on ETL outputs using scikit-learn Isolation Forest, flagging outliers in financial transactions and streamlining exception review.

• Developed early-stage feature engineering scripts to support AI-driven risk scoring models, collaborating with data scientists to deliver curated, high- quality datasets from the data lake.

• Verified RBAC policies in Snowflake, validated column-level masking rules, and ensured compliance with PII protection and internal data governance protocols.

Data Analyst, Cybage Software India Feb 2022 – July 2023

• Created interactive dashboards using Power BI for internal teams and external clients, reducing manual reporting time by approximately 15 hours/week and enabling faster decision-making.

• Maintained and optimized 10+ ETL pipelines using Python and SQL to automate data ingestion, transformation, and loading for client-facing reports, ensuring timely and accurate data delivery.

• Scheduled daily workflows with Apache Airflow, improving task reliability by over 30% and reducing operational overhead through automated dependency management.

• Migrated 5 critical workflows from on-premise servers to AWS Glue and Amazon Redshift, cutting processing time by 40% and enhancing scalability for growing client data volumes.

• Optimized complex SQL queries to process raw structured & semi-structured data, enabling efficient data analysis, real-time analytics, and business intelligence reporting.

• Resolved 20+ recurring data quality issues through root-cause analysis and data profiling, significantly improving the accuracy of analytical outputs.

• Collaborated with the data science team to prepare features and clean training datasets for ML models used in client segmentation and churn prediction.

• Developed Python-based scripts for automated model performance tracking using metrics like F1-score and ROC-AUC, aiding business stakeholders in understanding predictive model reliability.

PROJECTS

Real-Time Data Pipelines with Apache Kafka and Apache Flink

• Engineered scalable real-time data pipelines using Apache Kafka for event streaming and Apache Flink for stream processing, enabling low-latency data ingestion and analytics.

• Optimized data transformations within Flink, handling both bounded and unbounded data streams, to support real-time analytics and decision-making.

• Integrated with downstream systems such as Elasticsearch and Kibana for real-time data visualization and monitoring of e-commerce transaction and sales analytics, including metrics like daily sales, top-selling categories, and error rates. Automated Data Flow for Customer Transaction Processing

• Orchestrated streaming data flows using Apache NiFi on AWS EC2 to simulate and route customer transaction data in a cloud-native environment.

• Generated high-volume synthetic data with Python's Faker library and routed it through NiFi processors to downstream storage.

• Automated ingestion and implemented Slowly Changing Dimensions (Type 1 and Type 2) using Snowflake Snowpipe, streams, and tasks for historical tracking.

EDUCATION

Master of Science in Computer Science University of Central Florida Orlando, Florida. Dec 2024 Bachelor of Technology in Computer Science Vellore Institute of Technology Tamil Nadu, India. July 2023 CERTIFICATIONS

AWS Certified Data Engineer Associate – AWS (Link) Dec 2024 IBM Data Engineering – IBM (Link) Aug 2024

Contact this candidate