Data Engineer Real-Time

Location:

Charlotte, NC

Salary:

85000

Posted:

September 10, 2025

Contact this candidate

Resume:

ANUPAMA A

******.*******@*****.*** +1-201-***-****

LinkedIn https://www.linkedin.com/in/anupama--reddy

PROFESSIONAL SUMMARY

Results-driven Data Engineer with 4+ years of experience designing, developing, and optimizing scalable data solutions across financial services, healthcare, and regulatory domains. Proven expertise in ETL/ELT pipelines, big data frameworks, real-time streaming, and cloud-native platforms (AWS, Azure, GCP). Strong background in fraud analytics, regulatory reporting, and predictive modeling, with hands-on skills in:

- Technologies: Python, SQL, DBT, IBM DataStage, Apache Kafka, Spark, Snowflake

- Platforms: AWS, Azure, GCP, Hadoop, Docker, Kubernetes

- Use Cases: Fraud detection, real-time analytics, data lake architecture, compliance (GDPR, CCPA)

PROFESSIONAL EXPERIENCE

Bank of America — Data Engineer

Charlotte, NC Sep 2024 – Present

Built scalable ETL pipelines using IBM DataStage and DBT, integrating multi-source financial crime data.

Developed microservices in Golang to process fraud patterns and push results to internal APIs and dashboards.

Designed Snowflake data models with partitioning & clustering; boosted audit query performance by 30%.

Integrated Kafka, Spark Streaming, and Snowflake to enable real-time fraud alerting (seconds instead of minutes).

Designed and implemented Delta Lake architecture within the lakehouse model, enabling ACID transactions, schema evolution, and improved audit logging.

Used JavaScript (with Plotly.js and custom UI components) to build a fraud analytics dashboard for internal audit teams.

Delivered fraud models with XGBoost, LightGBM (85%+ detection accuracy); deployed using CI/CD on GitHub Actions + Terraform.

Participated in Agile/SCRUM ceremonies, including sprint planning, retrospectives, and daily stand-ups, contributing to iterative delivery and backlog grooming.

Replaced legacy Jenkins deployment with GitLab CI/CD, reducing deployment errors and improving rollback safety.

Visualized compliance KPIs via Tableau, Power BI, and Plotly for 10+ internal teams.

Integrated metadata tracking and secure access to PII data via Snowflake's governance tools and AWS IAM policies.

Acted as technical mentor, onboarding two junior engineers and conducting regular peer code reviews and design walkthroughs.

UnitedHealth Group (UHG) — Data Engineer

Phoenix, AZ Aug 2022 – Aug 2024

Orchestrated large-scale ETL pipelines for healthcare data (1+ TB/day) using DataStage, Informatica, Talend.

DynamoDB for fast-access lookup of session and reference data tied to patient IDs and event tracking.

Built secure API endpoints in C# (.NET Core) for querying patient risk scores and treatment gaps across care systems.

Developed Spark-based jobs using Scala in Databricks Notebooks to process complex patient claims, enabling predictive risk modeling at scale.

Developed fraud detection models using AWS SageMaker, achieving 95%+ anomaly detection precision.

Modeled data using Star Schema, Snowflake Schema, and Data Vault for OLAP/OLTP workloads.

Delivered compliance dashboards in Power BI, pulling data from both Snowflake and NoSQL backends for unified insights.

Automated ETL validation with Jenkins, Python, and GitHub Actions (reduced QA time by 30%).

Automated infrastructure validation and deployment using GitHub Actions, ensuring consistent release management.

Leveraged AWS Redshift and Snowflake for risk analytics with 20% lower latency in fraud data queries.

Infosys — Data Engineer

Hyderabad, India Apr 2021– Apr 2022

Built ingestion pipelines for 5M+ daily financial transactions using Kafka, Hive, and Snowflake.

Collaborated with software engineers and DevOps teams to optimize CI/CD deployments of data pipelines, ensuring alignment with Azure-native PaaS services.

Migrated 20+ legacy systems using Talend, Apache NiFi, and custom Python scripts.

Developed ML models using Scikit-learn, achieving 90%+ fraud prediction accuracy.

Constructed a hybrid data lake on AWS S3 and Azure Synapse for scalable access.

Built end-to-end Azure Data Factory (ADF) pipelines to ingest and transform healthcare data into a centralized Delta Lake on Azure Data Lake Storage Gen2, supporting both batch and near-real-time use cases

Automated infrastructure using Terraform, Ansible; enforced GDPR-compliant data policies.

KEY PROJECT

Architected a fraud detection platform with IBM DataStage, Kafka, Spark Streaming, and Snowflake.

Built ML pipelines in PySpark with XGBoost, enabling real-time anomaly detection (30% fewer false positives).

Delivered fraud dashboards in Tableau, Looker, Power BI for compliance visibility.

Ensured regulatory compliance (GDPR, CCPA) through metadata management, lineage tracking, and secure PII handling.

Deployed on Docker + Kubernetes across AWS, Azure, and GCP for high availability and scalability.

EDUCATION

Cleveland State University, Cleveland, OH

Master of Science in Information Systems – Data Science May 2022 – Dec 2023

St. Martin's Engineering College

Bachelor of Technology in Computer Science Jun 2017 – Jul 2021

TECHNICAL SKILLS

- Languages: Python, SQL, Bash, Java, PySpark

- Databases: Snowflake, SQL Server, MySQL, PostgreSQL, MongoDB, Cassandra, Redshift

- ETL & Orchestration: DBT, DataStage, Talend, Informatica, Airflow, NiFi, Glue, ADF

- Big Data & Streaming: Apache Spark, Hadoop, Kafka, Flume, Spark Streaming

- Cloud: AWS (S3, Lambda, EC2), Azure (Synapse, Data Lake), GCP

- Visualization: Tableau, Power BI, Looker, Plotly

- ML/AI: XGBoost, LightGBM, Scikit-learn, TensorFlow, A/B Testing

- DevOps & Infra: Docker, Kubernetes, Jenkins, Terraform, GitHub Actions, Ansible

- Compliance: GDPR, CCPA, Metadata Management, PII Handling

Contact this candidate