Post Job Free
Sign in

Data Engineer Financial Services

Location:
Hyderabad, Telangana, India
Salary:
80000
Posted:
September 10, 2025

Contact this candidate

Resume:

Sai Akhil

+1-336-***-**** ******************@*****.***

Summary:

Data Engineer with 4+ years of experience building scalable data solutions across AWS, Azure, Snowflake, and Databricks. Expert in PySpark, SQL, and Delta Lake for ETL/ELT development, SAP data integration, and implementing Medallion Architecture. Skilled in optimizing performance with adaptive query execution and partitioning, and streamlining CI/CD workflows using GitLab.

Skills

Programming & Scripting: Python, Spark, SQL (Spark SQL, T-SQL, PostgreSQL)

Data Warehousing & Modeling: Databricks, Snowflake, Delta Lake, Medallion Architecture, ETL/ELT,DBT, Schema Design

Databases & ETL Tools: NoSQL, MySQL, MongoDB, SSIS, Data Mining, Data Pipeline Design

Cloud & OS: AWS, Azure, GCP, Linux, Windows

Visualization & Reporting: Power BI, Tableau, Matplotlib, Excel

DevOps & Collaboration: Git, Jenkins, Bitbucket, GitLab CI/CD, Docker, Kubernetes

Experience

Data Engineer at Volvo Financial Services Domain: Financial Services Greensboro, NC August 2024 - Present

Led end-to-end migration of SAP ECC and SuccessFactors data into Databricks using PySpark, SQL, DBT, and custom Python pipelines, enabling automated ingestion from General Ledger, Accounts Payable, and Payroll modules, and aligning with financial reporting and audit needs.

Built a Delta Lake-based warehousing layer on AWS S3, significantly reducing ad hoc reporting turnaround time by 30% for payroll and finance operations, and enabling structured access across Bronze, Silver, and Gold layers.

Developed a modular API ingestion framework in Python, integrating with AWS Lambda and Databricks Workflows, reducing data latency by 20%, and powering near real-time finance dashboards used for cost control and forecasting.

Architected and enforced Medallion Architecture standards within Databricks to formalize data governance, transformation lineage, and auditability, boosting data processing efficiency by 33% and enabling better SOX compliance.

Tuned Spark workloads via adaptive query execution, dynamic partition pruning, and in-memory caching, leading to 45% faster transformations and reducing compute spend on Databricks clusters by 20%.

Implemented Unity Catalog for centralized metadata and access management, streamlining RBAC, encryption, and audit logging setup, accelerating security onboarding and simplifying regulatory audit processes by 20%.

Data Engineer at Value momentum Domain: Health Care Hyderabad, India January 2021 - December 2022

Automated scalable ETL pipelines using Python, SQL, and Apache Airflow, automating ingestion and transformation of EHR,and payer data across hybrid environments, reducing manual workflows by 35% while ensuring PHI/PII compliance.

Engineered data integration pipelines with AWS Glue, Lambda, and S3 to process HL7, FHIR, and claims data formats, achieving 100% HIPAA compliance and enhancing analytical throughput for clinical reporting teams.

Designed and deployed Kafka-based real-time streaming systems to capture telemetry from EMR systems and IoT medical devices, reducing event-to-insight latency and enabling proactive alerts for critical patient metrics.

Improved query efficiency on Snowflake and SQL Server through advanced techniques like partitioning, materialized views, and clustering, cutting dashboard load times by 50% and enhancing user experience for healthcare analysts.

Data Engineer at FactSet Domain: Financial Services Hyderabad, India June 2019 - December 2020

Architected and deployed large-scale data pipelines using Azure Data Factory, Informatica IICS, and Databricks, processing over 3TB of enterprise data daily, improving warehouse refresh cycles by 45% and achieving 99.95% uptime.

Engineered robust ETL/ELT frameworks integrating Oracle, SQL Server, DB2, REST APIs, and JSON data using PySpark and ADF, reducing end-to-end processing time by 40% and improving data quality by 35% through automated validation logic and exception handling.

Optimized enterprise data lake and warehousing infrastructure with Azure Synapse Analytics, Data Lake Gen2, and Cosmos DB, delivering 45% faster query performance and cutting storage costs by 22% through intelligent partitioning and adaptive execution plans.

Collaborated cross-functionally with business SMEs and data science teams to build ML model inputs and Power BI dashboards, resulting in 18% gain in operational efficiency and 40% faster decision-making across finance & operations.

Projects

Uber Data Analytics Pipeline

Optimized data pipelines using Mage AI, Python, and SQL to process over 10 million ride records from Uber datasets, implementing data validation and schema enforcement to boost pipeline reliability and reduce processing errors by 30%.

Architected scalable, cost-effective storage and analytics solutions with Google Cloud Storage and BigQuery, improving query performance by 40% and reducing data access latency, enabling faster insights for trend analysis.

Analyzed an unstructured JSON data and implemented a new structured relational data model

Analyzed unstructured JSON receipt data and designed a new structured relational data model, improving data accessibility and reducing processing errors by 25%.

Developed data transformation pipelines using Python’s JSON module and Pandas to convert unstructured JSON into clean datasets, identifying key brand and user engagement metrics and resolving 15% of data quality issues.

Education

University of North Texas Master of Science in Data Science Jan 2023 - May 2024 GPA: 3.8/4.0



Contact this candidate