Data Engineer Fraud Detection

Location:

Jersey City, NJ

Posted:

October 15, 2025

Contact this candidate

Resume:

Rahul Kanth P.

Jersey City, NJ *****.*@**********.*** 929-***-**** LinkedIn

SUMMARY

Data Engineer with 5 years of experience building batch and streaming pipelines in Python, PySpark, and SQL, delivering analytics-ready data across healthcare, finance, and compliance domains. Developed real-time processing layers with Kafka, Kinesis, and Airflow, powering fraud detection and trade surveillance at scale. Helped uncover $12.5M in claim anomalies by integrating business rules directly into AWS- based data flows. Strong in designing query-optimized datasets for BI teams using Redshift, Athena, and Snowflake. EXPERIENCE

Blackrock Data Engineer - Marketing Analytics Pipelines Aug 2024 – Current Project: Investment Data Platform Modernization (Aladdin Ecosystem)

• Redesigned batch ETL workflows that processed daily holdings and transaction data from over 40 custodians; rewrote legacy logic in PySpark and SQL, bringing pipeline run-time down from 5 hours to under 90 minutes, enabling same-day reconciliation.

• Worked closely with portfolio analytics and risk teams to standardize ingestion of ESG data feeds from 6 third-party vendors; built validation layers and alerting logic that cut data quality issues by 70% and reduced downstream model drift incidents.

• Built a modular ingestion framework using Airflow and AWS Lambda to onboard new fixed-income benchmarks in under 2 weeks instead of 6; helped BlackRock meet client reporting deadlines for 3 sovereign wealth fund mandates.

• Optimized S3-to-Redshift sync by partitioning historical AUM datasets based on region and instrument type; reduced monthly storage costs by $6K and query latencies by over 40% for the internal fund strategy team.

• Teamed up with data governance and legal to embed field-level lineage and masking into core pipeline layers, ensuring GDPR compliance for 15+ million client records without slowing daily analytics jobs. Wipro Project Engineer - (Data Engineer) Sep 2020 - Nov 2022 Project: Pharmacy Claims Analytics & Fraud Detection Platform

• Built scalable ETL pipelines in PySpark on AWS Glue to ingest over 9M monthly pharmacy claims from PBM, Medicare Part D, and commercial sources; normalized schema and applied payer-specific logic for downstream analysis.

• Designed fraud detection logic using rule-based models and temporal pattern mining to flag anomalies in provider billing, refill frequency, and duplicate NDC usage; surfaced $12.5M in potential overpayments to the SIU team.

• Created secure, partitioned data lakes on Amazon S3 with AWS Lake Formation, applying field-level masking for PHI and NPI attributes across 9 payer-facing dashboards and audit trails to meet HIPAA compliance.

• Modeled dimension tables in Amazon Redshift, enabling fast queries for KPIs like claim denial rates, adjudication lag, and reimbursement turnaround across 4 claim processing engines.

• Automated Apache Airflow DAGs to refresh fraud scoring flags hourly, track task dependencies, and send failure alerts via Slack and SNS; reduced lag in SIU investigations by 42%.

• Migrated legacy on-prem SQL Server ETL workflows to EMR-based Spark jobs, using broadcast joins and Parquet compression to cut processing costs by $7.8K/month and reduce job runtime by 60%. Vivma Software Inc Data Engineer Jan 2019 - Aug 2020 Project: Real-Time Trade Surveillance & Compliance Analytics

• Configured Kafka topics and set up AWS Kinesis data streams to capture trade events from upstream systems; enabled real-time ingestion of equities and options data for compliance monitoring.

• Created Python rule logic to detect patterns like spoofing and wash trades, embedding them into modular detection scripts; improved alert precision and reduced noise for the compliance queue.

• Designed a partitioned S3 data lake integrated with Athena, allowing teams to query historical trade snapshots on demand; helped shrink audit review cycles from hours to under 30 minutes.

• Authored Airflow DAGs to stitch together trade events, flagged alerts, and user metadata, loading into Redshift for daily dashboards used by 2 compliance regions.

TECHNICAL SKILLS

Data Engineering & ETL: Apache Spark, Apache Airflow, PySpark, ETL Design, Data Modeling, Schema Design, Data Integration, Performance Tuning, Data Governance

Cloud & Big Data Platforms: AWS (Glue, Redshift, EMR, S3, Lambda, CloudwWatch), Azure (ADF, Synapse, Delta Lake), Snowflake, BigQuery, Hudi, Iceberg, Data Lakehouse Architecture Database & Warehousing: SQL Server, MySQL, NoSQL, SSIS, Partitioning, Indexing, Query Optimization, Metadata Management Automation & DevOps: Airflow DAGs, Jenkins, Azure DevOps, Docker, Kubernetes, Kafka, Prometheus, Monitoring & Alerting Programming & Scripting: Python, SQL, Shell Scripting (Bash, PowerShell), Object-Oriented Programming, JSON, DAX EDUCATION

Yeshiva University New York, NY

Master of Science in Data Analytics and Visualization Dec 2024

Contact this candidate