Data Engineer Business Intelligence

Location:

Ahmedabad, Gujarat, India

Posted:

September 10, 2025

Contact this candidate

Resume:

HEMANTH KUMAR

Email: ************@*****.***

Phone: +1 (331)- 356-4720

SUMMARY

Results-driven Data Engineer with 4+ years of experience designing and building scalable data pipelines across cloud and on-prem environments. Skilled in developing ELT workflows using Apache Spark, dbt, and Airflow, with strong hands-on expertise in AWS, Azure, and Snowflake. Proven track record of delivering reliable, analytics-ready datasets for business intelligence, risk analytics, and operational reporting. Adept at ensuring data quality, optimizing performance, and enabling self-service insights through robust data lakehouse architectures. Comfortable working in Agile teams and collaborating with cross-functional stakeholders to solve real-world data problems. TECHNICAL SKILLS

• Programming & Scripting: Python, SQL, PySpark, Bash

• Cloud Platforms: AWS (S3, Glue, Redshift, Lambda), Azure (Data Lake, Synapse, ADF)

• Big Data Tools & ETL: Apache Spark, dbt, Airflow, SSIS, Delta Lake, Hive, Kafka, Hadoop

• Data Warehousing & Modeling: Snowflake, Dimensional Modeling (Star/Snowflake), ER Diagrams

• DevOps & CI/CD: Git, GitHub Actions, Jenkins, Terraform, Docker

• Data Architecture, Storage & Quality: Data Lakes, Medallion Architecture, Great Expectations, dbt tests, PyTest

• BI & Visualization: Power BI, Tableau, Looker

• Monitoring & Observability: Prometheus, Grafana, AWS CloudWatch

• Data Governance & Security: Unity Catalog, Azure Purview, Data Catalog

• Documentation & Collaboration: Confluence, Jira, Markdown, Agile/Scrum EXPERIENCE

Wawanesa Group Jan 2023 – Current

Data Engineer

• Inherited a legacy set of siloed claims and policy databases, which caused reporting delays and data duplication. I helped design a centralized Lakehouse architecture on AWS using Delta Lake, organizing data to support clean lineage and performance analytics.

• Worked closely with product and underwriting teams to understand data pain points—claim delays, inconsistent policy versions, and missing customer contact details. Using PySpark and dbt, I built transformation models that cleaned and normalized the data for real-time reporting.

• Created custom dbt tests and Great Expectations suites to validate claim amount thresholds, null policy IDs, and expired coverage dates— catching dozens of data quality issues that previously went unnoticed by business users.

• Developed and maintained daily ingestion jobs using Apache Airflow, which pulled new claims data from Kafka topics, policy data from Snowflake, and staged it in S3. I set up retry logic, backfills, and Slack-based failure alerts to improve reliability.

• Partnered with the fraud analytics team to prepare Gold-level feature tables for suspicious claims detection, ensuring fields like first-notice- of-loss (FNOL) timestamps, payout frequency, and duplicate addresses were properly engineered.

• Integrated Power BI dashboards for claims handling time, fraud flag rate, and average claim payout per region, helping business leaders take weekly actions based on live metrics instead of relying on quarterly CSV dumps.

• Used Terraform and GitHub Actions to automate provisioning of Airflow DAGs, IAM roles, and S3 bucket policies—cutting manual deployment time by nearly 70% and reducing misconfiguration incidents during sprint deployments. Coforge Jan 2020 – Jul 2021

Junior Data Engineer / ETL Developer

• Worked with a team that was migrating loan origination workflows from Excel-based reports to a proper Data Warehouse. I designed ETL jobs using SSIS and T-SQL to extract borrower applications, credit scores, and payment schedules into SQL Server.

• Found inconsistencies in credit data pulled from multiple sources—some lacked SSNs, others had mismatched addresses. I wrote Python validation scripts and used fuzzy matching to clean and align customer identity records across systems.

• Built parameterized pipelines in Azure Data Factory to ingest data from the loan application system (CRM), core banking platform, and external credit bureau API. I created linked services, datasets, and mappings using JSON config files for portability.

• Created loan performance snapshots (monthly aggregates by loan type, region, and risk tier) to support predictive modeling for credit defaults and delinquency tracking. These datasets were consumed by the internal data science team.

• Participated in building dbt models to standardize transformation logic for underwriting scorecards and risk segmentation, while also implementing tests to detect schema drift, negative interest rates, and duplicate loan IDs.

• Documented key transformation logic and business rules in Confluence, then walked the analytics team through each pipeline step—making onboarding smoother and speeding up report development for quarterly audits.

• Automated nightly report refreshes for Power BI dashboards showing default risk by loan officer, outstanding balances by credit grade, and monthly collections, reducing dependency on the Excel team and saving ~6 hours per week. PROJECTS

Fraud Detection System for Financial Transactions

Tools & Technologies: AWS (S3, Lambda, Redshift, CloudWatch), Kafka, Apache Flink, Python, Snowflake, Tableau, GitHub Actions Description:

• Built a real-time fraud detection platform for a fintech company, capable of processing and analyzing transaction data streams for anomalies and triggering automated alerts.

Key Contributions:

• Integrated Kafka with Apache Flink to process real-time transactions from payment gateways, enabling fraud checks in under 5 seconds.

• Stored event logs and reference data in AWS S3 and Snowflake, enabling long-term pattern analysis.

• Developed Python-based rule engine to flag suspicious transactions based on configurable business logic and ML-driven risk scores.

• Scheduled and managed batch processes and alerts using AWS Lambda and CloudWatch Events.

• Maintained reproducible deployments using GitHub Actions and Infrastructure-as-Code templates.

• Created fraud trend analysis dashboards using Tableau, giving risk teams daily visibility into emerging threats. Impact:

• Achieved over 92% accuracy in fraud detection, reduced manual investigations by 60%, and saved approximately $1.5M in prevented fraudulent transactions within the first year.

EDUCATION

• Master’s in Engineering Data Science 2021 – 2023 University of Houston (Main Campus)

• Bachelors in Electronic and Communication Engineering 2017 – 2021 GITAM (Deemed to be University)

Contact this candidate