Machine Learning Data Engineer

Location:

Allen, TX

Posted:

October 23, 2025

Contact this candidate

Resume:

Anusha Namala

Data Engineer

Contact: +1-660-***-****

Email: **************@*****.***

PROFESSIONAL SUMMARY:

• 4+ years of experience in software engineering, specializing in project management, application development, software design, and SQL.

• Proficient in machine learning, implementing predictive models, and deploying AI solutions for data analysis and automation tasks.

• Skilled in designing and implementing RPA solutions for automating repetitive tasks and improving operational efficiency.

• Expertise in working with Apache Airflow to create, debug, schedule, and monitor ETL workflows, ensuring smooth and efficient data pipelines.

• Advanced skills in AWS services, including EC2, S3, Redshift, Glue, DynamoDB, Lambda, Step Functions, Athena, and Kinesis, to deliver scalable cloud solutions.

• Strong experience in designing RDL (Report Definition Language) and TDL (Transformation Definition Language) for creating effective reports and data transformations.

• Proven ability to manage complex data retention mapping, ensuring compliance with organizational policies and optimizing storage processes.

• Experience in developing complex data models and schemas in Snowflake, improving data storage and retrieval for efficient analytics.

• Proficient in ELT (Extract, Load, Transform) techniques for preparing canonical data and ensuring reliable and standardized data pipelines.

• Hands-on experience in designing and implementing machine learning workflows and models for automating data processing.

• Expertise in troubleshooting and debugging data pipelines, stored procedures, and machine learning models to ensure accurate and efficient data processing.

• Proficient in using tools such as AWS Glue, Kafka, Spark-Streaming, and DynamoDB for managing data pipelines and enhancing data accessibility.

• Strong knowledge of dimensional modeling, data lifecycle design, and data governance.

• Skilled in leveraging AWS Step Functions for orchestrating scalable, distributed workflows and improving ETL and data processing pipelines.

EDUCATION:

• University of Central Missouri, MO, USA

Master of Science, Computer Science

Aug 2021 - Dec 2022

CGPA: 3.30

SKILLS:

• Programming: Python, Java, C#, PySpark

• Databases: MySQL, MongoDB, PostgreSQL, Oracle, DynamoDB

• Frameworks: Maven, Spring, React

• Cloud: Amazon Web Services (S3, EC2, Lambda, Redshift, EMR, Athena, Glue, DynamoDB, Step Functions)

• Machine Learning & AI: TensorFlow, Keras, SageMaker, PyTorch, XGBoost

• Big Data Tools: Hadoop, Spark, Kafka

• Workflow Tools: Apache Airflow, Azkaban, Luigi

• Data Management: ELT, Canonical Data Pipelines, Data Lifecycle Design, RDL, TDL

• Reporting Tools: Power BI, Tableau, SSRS

• Stream Processing: Spark-Streaming, Kinesis Firehose

• Programming Languages: Python, Java, C++, Scala

• IDE: IntelliJ, Eclipse, Visual Studio, Jupyter Notebooks PROFESSIONAL EXPERIENCE:

Stellus Rx

Data Engineer

October 2024 - Present

• Built and managed automated ingestion pipelines for diverse file formats

(.csv, .xlsx, .xml, .txt, .x12) from secure SFTP servers into Snowflake, PostgreSQL, and MySQL databases.

• Handled ingestion of sensitive healthcare data domains, including Adhere, Statins, and Medical Economics, ensuring accuracy, compliance, and secure transformation.

• Implemented RPA workflows using Robocorp and Apache Airflow to automate: File discovery, parsing, and validation

Transfer to AWS S3

Load and transformation in Snowflake/PostgreSQL

• Developed and maintained manifest-driven ingestion frameworks, streamlining onboarding of new data sources with schema mapping to canonical models.

• Utilized Cerberus for secure SFTP connection handling with built-in retry logic, error alerts, and session logging.

• Integrated OpenAI APIs to:

Automatically document data pipelines and schema changes Generate summaries for transformation logic and ingestion status Detect anomalies in Adhere/Statins datasets using NLP techniques Power natural-language search tools within internal data portals

• Created Qlik dashboards for real-time monitoring and visualization of ingestion metrics, data quality KPIs, and pipeline status, using PostgreSQL and Snowflake as data sources.

• Built Tableau dashboards for executive reporting on Medical Economics data, integrating insights from S3, PostgreSQL, and Snowflake.

• Developed self-service data management portals using Retool, enabling users to: Monitor SFTP file ingestions

View schema history and logs

Use OpenAI-powered assistants to query ingestion processes in natural language

• Designed and executed automated file removal protocols post-ingestion from SFTP, ensuring compliance with internal data retention and security policies.

• Orchestrated complex ETL workflows using AWS Glue, Step Functions, DynamoDB, and CloudWatch, ensuring scalable, resilient data pipelines with full observability.

• Created and maintained GraphQL APIs to expose metadata, ingestion logs, and schema information for cross-team collaboration and tooling.

• Collaborated with DevOps to implement CI/CD workflows using GitHub and Bitbucket for automated testing, deployment, and change tracking of data pipelines. Tools & Environment: Snowflake, PostgreSQL, MySQL, Apache Airflow, Robocorp (RPA), AWS

(S3, Glue, Step Functions, DynamoDB, CloudWatch), OpenAI GPT APIs, Qlik, Tableau, Retool, Cerberus, GitHub, Bitbucket, GraphQL, Python, Bash, SFTP, .csv, .xlsx, .xml, .txt, .x12 (Adhere, Statins, Medical Economics data).ython, SFTP, RPA, Cerberus, AWS S3, Dynamodb.OpenAI, Tableau, Snowflake

Aspire Health

Data Warehouse Engineer

Sept 2023 - Sept 2024

• Configured and deployed Apache Airflow for managing workflows between S3 buckets and Snowflake data warehouse, including the creation of directed acyclic graphs (DAGs).

• Automated script execution and workflow using Apache Airflow and shell scripting, ensuring seamless daily operations in production.

• Conducted data cleaning, feature scaling, and feature engineering using Python libraries such as pandas and NumPy.

• Developed Python scripts to facilitate the loading of CSV files into S3 buckets and automate mapping retention of files.

• Prepared canonical data using ELT techniques for visualization and analytic modeling.

• Developed data pipelines to standardize incoming data, prepare it for analysis, and migrate between databases.

• Conducted rigorous debugging and testing of stored procedures, ensuring data integrity in ETL processes.

• Utilized AWS Glue jobs to archive data from Redshift tables to S3, adhering to data retention policies, and managed data layers in S3 and databases like Postgres.

• Integrated data from multiple source systems into Snowflake cloud data warehouse and AWS S3 buckets, including handling nested JSON formatted data.

• Loaded data into Snowflake tables from internal stages using Snow SQL and designed the data warehouse using Star/Snowflake schema concepts.

• Transferred Fact/Dimension and aggregate data from S3 to Redshift for historical analysis using tools like Tableau and QuickSight.

• Evaluated and improved approaches to data lifecycle design and elastic search strategies.

• Leveraged machine learning models and generative AI techniques for predictive analytics and operational insights.

• Monitored database performance using SQL Server Profiler to identify and optimize slow- running queries and server activity.

• Utilized AWS DynamoDB for high availability, performance, and scalability in managing NoSQL data.

• Configured AWS CloudWatch to monitor and log production workflows, ensuring high reliability and prompt issue resolution.

• Designed serverless workflows using AWS Step Functions to improve automation and data processing across AWS services.

• Developed and implemented RDL for generating reports, as well as TDL for data transformation.

• Managed data retention mapping policies for data archiving and compliance. Tools & Environment: Apache Airflow, AWS Glue, Snowflake, Pandas, NumPy, Python, RDL, TDL, Snow SQL, Postgres SQL, AWS S3 bucket, Athena, QuickSight, Lambda functions, Step Functions, Tableau, DynamoDB, Git, SQL Server.

Warner Bros

Data Engineer

Aug 2022 - Aug 2023

• Responsible for provisioning essential AWS Cloud services and configuring them to ensure scalability, flexibility, and cost optimization.

• Developed VPCs and subnets, including both private and public configurations, along with NAT gateways across a multi-region, multi-zone infrastructure landscape to efficiently manage global operations.

• Managed AWS infrastructure using orchestration tools such as CloudFormation (CFT), Terraform, and Jenkins Pipeline.

• Automated the deployment of EC2 instances, S3 buckets, EFS, EBS volumes, IAM roles, snapshots, and Jenkins servers using Terraform scripts.

• Established cloud data stores within S3 storage, employing logical layers tailored for Raw, Curated, and Transformed data management.

• Implemented data ingestion modules utilizing AWS Glue to load data across various layers in S3, facilitating reporting through tools like Athena and Power BI.

• Developed Glue Jobs for technical data cleansing tasks such as deduplication, NULL value imputation, and redundant column removal.

• Leveraged Kafka for message queuing and stream processing of large datasets.

• Implemented Canonical data pipelines to enable large-scale machine learning and analytics.

• Generated reports using SSRS, incorporating calculated fields, parameters, and hierarchies.

• Transitioned reports from SSRS to Power BI and developed multiple reports in Power BI, including drill-down tables, financial tables, cross-tabs, and charts.

• Published Power BI Desktop reports from Report view to the Power BI service.

• Created complex SQL queries using stored procedures, CTEs, and temporary tables to support Power BI reports.

• Utilized AWS CloudWatch for monitoring application logs and optimizing performance.

• Utilized AWS Step Functions to coordinate microservices and optimize application workflows.

• Integrated data retention mapping into storage policies and lifecycle management. Tools & Environment: AWS Glue, Athena, S3, EC2, Kafka, Terraform, Power BI, Jenkins, Python, SQL.

Wipro

Data Analyst

Dec 2019 - June 2021

• Developed ETL pipelines to integrate and clean data using Glue jobs and Spark.

• Managed file movements between HDFS and AWS S3 using automated workflows.

• Implemented RPA to streamline repetitive tasks and improve efficiency in data processing.

• Developed analytical reports using Tableau and conducted advanced data blending.

• Utilized AWS Lambda services to automate daily EBS snapshot creation in the cloud.

• Successfully orchestrated the migration of an on-premises application to AWS, employing services like EC2 and S3 for small datasets.

• Proficient in maintaining Hadoop clusters on AWS EMR.

• Designed dashboards and prototypes for diverse reports using Tableau Desktop.

• Leveraged AWS DynamoDB to handle unstructured and semi-structured data for improved reporting.

• Integrated file retention policies into data processing pipelines to ensure regulatory compliance. Tools & Environment: ETL, AWS Lambda, S3, Spark, Kafka, EMR, Tableau, Python, SQL. Certifications:

• AWS Certified Cloud Practitioner.

Contact this candidate