Data Engineer with 3+ Years of Experience

Location:

Marietta, GA, 30062

Salary:

70000

Posted:

April 30, 2026

Contact this candidate

Resume:

Bhavana Reddy Tadimarri - Data Engineer

GA, USA +1-470-***-**** *****************@*****.*** LinkedIn SUMMARY

Data Engineer with 3+ years of experience designing scalable data pipelines, real-time streaming systems, and distributed data platforms across financial and healthcare domains. Skilled in building ETL workflows using Python, Spark, SQL, and modern cloud technologies (AWS, Azure, GCP). Experienced in developing ML-ready data pipelines, optimizing data architectures, and enabling analytics and machine learning workloads for data-driven decision making.

Skills

• Languages & Scripting: Python (Pandas, NumPy), SQL (CTEs, Window Functions), Scala, Java, Bash

• Big Data & Streaming: Apache Spark (PySpark – Batch & Structured Streaming), Kafka, Hive, Hadoop (basic)

• ETL & Data Engineering: ETL/ELT Pipeline Design, AWS Glue, Azure Data Factory, dbt, Informatica, CDC (Change Data Capture), Incremental Processing

• Data Architecture & Storage: Data Lake, Lakehouse Architecture, Medallion Architecture, Delta Lake, Apache Iceberg, Data Modeling (Star Schema, Fact & Dimension Tables), AWS S3, Azure Data Lake (ADLS), Google Cloud Storage

• Data Warehousing & Modeling: Snowflake, AWS Redshift, Azure Synapse, Dimensional Modeling, Star Schema, Fact & Dimension Tables, Performance Tuning, ER Diagrams

• Cloud Platforms: AWS (S3, Redshift, Glue, Lambda, Kinesis, CloudWatch, IAM), Azure (Data Factory, Synapse, Databricks, ADLS), GCP (BigQuery, Dataflow, Pub/Sub, Cloud Storage)

• Workflow Orchestration: Apache Airflow, DAG Design, Cron Jobs

• DevOps & CI/CD: Git, GitHub, GitLab, Jenkins, Docker, Kubernetes, Terraform

• BI & Visualization: Power BI, Tableau, Looker, AWS QuickSight

• Development Tools: Linux/Unix, Jupyter Notebook, VS Code, Agile/Scrum

• AI / Machine Learning Data Engineering: ML Data Pipelines, Feature Engineering, MLOps, MLflow, Vector Databases (FAISS / Pinecone), Retrieval Augmented Generation (RAG)

PROFESSIONAL EXPERIENCE

Capital One Financial, GA, USA

Data Engineer Jan 2025 – Present

• Designed a unified data ingestion framework using Apache Spark and AWS Glue to process 10M+ financial transactions daily, improving data freshness by 45%.

• Automated ETL orchestration with Airflow, enabling end-to-end visibility and reducing manual intervention by 70%.

• Engineered secure data pipelines integrating external API feeds, achieving near real-time fraud analytics for risk management teams.

• Re-architected SQL data warehouse with partitioned tables and optimized queries, cutting reporting latency from 9 hours to under 2.

• Developed and deployed CI/CD workflows for data pipelines using Docker and Terraform, ensuring reliable infrastructure updates.

• Developed ML-ready data pipelines with PySpark and Airflow to automate feature engineering and deliver high-quality datasets for downstream predictive analytics models.

CitiusTech, India

Data Engineer Aug 2022 – Aug 2023

• Built scalable ETL pipelines using Azure Data Factory ingesting data from 25+ healthcare sources into centralized data lake storage.

• Integrated Spark-based processing to transform HIPAA-compliant patient records, improving downstream ML data readiness by 35%.

• Automated schema validation and data quality checks using PySpark and Delta Lake, ensuring 99.9% data accuracy in production pipelines.

• Collaborated with analysts and clinicians to develop real-time dashboards in Power BI that provided instant insights into patient risk trends.

• Designed incremental load mechanisms to reduce data duplication, saving 20% on cloud storage and compute costs.

• Prepared ML-ready datasets by performing data cleansing, normalization, and feature engineering for downstream predictive analytics models.

CitiusTech, India

Associate Data Engineer Feb 2021 – Jul 2022

• Developed Python-based ETL scripts to extract, cleanse, and load healthcare data from relational databases and flat files into Azure SQL.

• Optimized Spark batch jobs processing 50GB+ unstructured data, reducing pipeline runtime by 40%.

• Implemented version-controlled data workflows with Git and Docker, enabling reproducible and modular development environments.

• Created stored procedures and materialized views to support analytics and reporting teams with curated datasets.

• Implemented data validation and transformation logic in Python and SQL to ensure accuracy and consistency of healthcare datasets.

• Ensured data governance compliance by implementing audit trails, lineage tracking, and secure access policies across the data stack. EDUCATION

Master of Science in Computer Science May 2025

Kennesaw State University, GA, USA

Bachelor of Technology in Computer Science Engineering May 2022 Amrita Vishwa Vidyapeetham, India

Certificates

• Google Data Analytics Professional Certificate – Coursera (Credential Link) PROJECT

Real-Time Transportation Analytics with Kafka and Spark

• Designed a real-time streaming pipeline using Kafka and Spark Structured Streaming to process GPS data and detect traffic congestion, improving forecasting accuracy by 22%.

• Containerized pipeline with Docker and stored processed metrics in PostgreSQL, enabling reproducible deployments and scalable querying for predictive transportation analytics applications.

Financial Risk Asset Management & Investment Analytics (Piper Jaffray, Goldman Sachs)

• Built Python and SQL–based data pipelines to ingest, clean, and transform 15+ years (2003–2018) of historical financial market data for risk and return analysis across equities and fixed-income assets.

• Implemented time-series transformations and statistical computations to derive key risk metrics (average returns, volatility, variance), enabling classification of high-risk and low-risk investment opportunities.

• Created analytics-ready datasets and BI dashboards to visualize portfolio performance, risk exposure, and long-term return trends for client advisory teams.

Contact this candidate