Data Engineer Machine Learning

Location:

United States

Salary:

75000

Posted:

June 25, 2025

Contact this candidate

Resume:

Naga Tulluri

Data Engineer

Charlotte, NC ******.********@*****.*** +1-251-***-****

Summary

Data Engineer with 5+ years of experience designing, building, and optimizing large-scale data pipelines and cloud-based data solutions. Proven expertise in developing end-to-end ETL workflows using tools like Azure Data Factory, Databricks, Apache Spark, and Airflow across healthcare, finance, and enterprise domains. Adept in leveraging cloud platforms such as AWS, Azure, and GCP to drive performance, scalability, and cost-efficiency. Skilled in data modeling, real-time streaming (Kafka, Flink), and advanced analytics with strong command over SQL, Python, PySpark, and BI tools like Power BI and Tableau. Experienced in implementing robust data governance, security, and compliance frameworks (HIPAA, GDPR). A proactive team player with a strong foundation in machine learning, dashboard development, and DevOps practices.

Skills

• Methodologies: Agile, Scrum, Waterfall, Kanban

• Programming Languages: Python, SQL, PySpark, C, Scala

• Big Data Technologies: Apache Spark, Hadoop, Kafta, AWS Kinesis, AWS EC2, AWS S3

• BI Tools: Tableau, Power BI

• Data Warehousing: Amazon Redshift, Google Big Query, Azure Data Factory, Azure Databricks, Azure Synapse, Snowflake

• Database Management: MySQL, PostgreSQL, HBase, Cosmos DB, Snowflake, MongoDB

• ETL Tools: Apache Airflow, Talend, Informetica

• Cloud Platforms: AWS, AWS GLUE, AWS Redshift, AWS Lambda, Google Cloud, AI infrastructure, dbt

• Streaming Analytics: Apache Kafka, Apache Flink

• Containerization and Orchestration: Docker, Kubernetes, Apache

• Data Skills: Visualization, Data Modeling, Data Normalization, Data Warehousing, Data Mining, Data Analysis, Statistic

• Machine Learning: Scikit-Learn, TensorFlow, Keras

• Operating system: Window, Linux, Android

• Frameworks: Pandas, NumPy, Dask

Experience

Northern Trust, USA Oct 2024 - Present

Data Engineer

● Designed and deployed scalable ETL pipelines using Azure Data Factory, Databricks, and Azure Synapse, reducing data processing time by 30%.

● Integrated REST APIs, SQL databases, and Blob Storage to enable seamless data ingestion and transformation workflows.

● Enabled hybrid cloud data movement by integrating SSIS with Azure Data Factory and optimizing real-time streaming using Apache Kafka and Azure Event Hubs, reducing latency by 25%.

● Orchestrated DBT transformations using Airflow, Prefect, and Dagster; automated version-controlled deployments with dbt Cloud, CLI, and Terraform.

● Built modular transformation logic in PySpark and Python (Pandas, Seaborn) for schema mapping, anomaly detection, and missing data imputation.

● Implemented the Medallion Architecture (Bronze, Silver, Gold) in Azure Databricks to support structured, scalable data pipelines.

● Engineered scalable analytics pipelines in BigQuery, leveraging partitioning, clustering, and federated queries for high-performance reporting.

● Developed and optimized Kafka consumer groups and topics for high-throughput ingestion; applied PySpark transformations for real-time trade data cleansing and enrichment.

● Automated and monitored ETL workflows across Azure Data Factory and Databricks, integrating Azure Monitor and Log Analytics for resilience and failure tracking.

● Created dynamic Power BI dashboards using DAX and Power Query, improving decision-making efficiency by 25% across business units.

● Conducted robust data validation using SQL and PySpark, including outlier detection, histogram analysis, and schema enforcement to maintain data integrity.

● Designed dimensional data models (Star Schema, Snowflake Schema) and optimized performance using Delta Lake features such as Z- Ordering, Bloom Filters, and Data Skipping.

● Optimized storage and retrieval of large datasets using GCP Cloud Storage, implementing lifecycle policies and cold storage to reduce cost.

CitiusTech, India Aug 2021 – Dec 2023

Data Engineer

● Built event-driven ingestion pipelines using Event Streams, Real-Time Analytics, and Azure Event Hubs, integrating data from IoT devices, EHR systems, and wearables, reducing ingestion latency by 20%.

● Automated data validation and reconciliation workflows to ensure compliance with GDPR and HIPAA, significantly enhancing data accuracy and audit readiness.

● Engineered and optimized dimensional data models (Star Schema, Snowflake Schema, Galaxy Schema) for efficient querying across clinical, operational, and financial datasets.

● Collaborated with providers and data scientists to deploy predictive analytics models for risk scoring, readmission prediction, and treatment optimization.

● Migrated legacy healthcare data systems to Azure and AWS, increasing scalability and reducing infrastructure costs by 15%.

● Built ETL pipelines to ingest data into Cassandra from Azure Data Lake, AWS S3, and Hadoop using PySpark and DataStax Bulk Loader.

● Designed and developed scalable ETL pipelines in Microsoft Fabric using Dataflows, Data Pipelines, and Notebooks to process large- scale patient records, claims, and clinical datasets, ensuring HIPAA compliance and robust data security. Mphasis, India Jun 2019 - Jul 2021

Data Engineer

● Designed and optimized SQL queries and Delta tables using Z-Ordering, partitioning, and caching, resulting in a 40% improvement in query performance.

● Built star and snowflake schema data models in Microsoft Fabric (Warehouse & Lakehouse) to support advanced BI reporting and analytics.

● Developed and maintained ETL workflows using SSIS, Talend, and Google Dataflow, reducing processing time by 35% and ensuring consistency across SQL, APIs, and flat file sources.

● Automated data transformation and cleansing with Python (Pandas, NumPy) and Power Query, reducing manual workload by 25% and improving data accuracy.

● Applied Infrastructure-as-Code (IaC) using Terraform for GCP resource provisioning, improving environment consistency and rollback capabilities.

● Documented DBT models and lineage using DBT Docs, improving team collaboration and transparency in transformation logic.

● Designed and normalized/denormalized relational schemas and implemented stored procedures, user-defined functions (UDFs), and triggers to boost scalability and maintainability by 22%.

● Improved Cassandra backup and recovery strategies using nodetool snapshots and incremental backups, ensuring high data availability.

● Built scalable BigQuery schemas for sales data and developed batch and streaming pipelines using Google Dataflow, integrating ML forecasting models into Looker dashboards for real-time insights.

● Conducted statistical analysis and time-series forecasting in R, enhancing demand prediction accuracy by 10% and optimizing inventory planning.

● Collaborated with cross-functional teams to define KPIs and build real-time dashboards, improving executive decision-making efficiency by 10%.

● Monitored GCP workloads using Cloud Monitoring (Stackdriver), setting up custom dashboards and alerts for proactive system management.

CERTIFICATIONS

• AWS Cloud Technical Essentials

• PwC Switzerland - Power BI Job Simulation

• Tableau Desktop Specialist

• Accenture North America - Data Analytics and Visualization Job Simulation

• Data Analysis

• Data Analytics Essentials

• Power BI Bootcamp

• SQL (Advanced)

Education

Master in Management Information Systems Jan 2024 - Dec 2024 Auburn University at Montgomery, AL, USA

Bachelor in Electronics and Communication Engineering Jun 2017 - May 2021 Vignan’s Foundation for Science, Technology & Research, India

Contact this candidate