Data Engineer Lead

Location:

Irving, TX

Posted:

April 02, 2025

Contact this candidate

Resume:

VINOD KUMAR SINGH - Lead Data Engineer

Dallas Tx, USA +1-609-***-**** ****************@*****.*** LinkedIn GitHub Professional Summary:

Results-driven Lead Data Engineer with 12+ years of experience delivering scalable data solutions using Python, SQL, PySpark, and Pandas across cloud and big data ecosystems. Proven expertise in designing and optimizing ETL/ELT pipelines leveraging tools such as SnapLogic, dbt, Databricks, Snowflake, AWS Glue, and Azure Data Factory, handling datasets exceeding 10TB. Skilled in building data lakes and warehouses on platforms such as Snowflake, Redshift, and RDS, with hands-on experience in data modeling, performance tuning, and automation. Adept at managing full data lifecycle across AWS and Azure, ensuring high availability, data quality, and security. Strong collaborator with a track record of delivering end-to-end pipeline solutions in Agile environments. Certifications:

AWS Certified Data Engineer – Valid: Dec 2024 to Dec 2027 Snowflake SnowPro Core Certified – Valid: Jan 2025 to Jan 2027 Education:

Bachelor of Computer Science Engineering

KIIT Bhubaneswar – Odisha, India — 2012

Technical Skills

Languages: Python, SQL, T-SQL, PL/SQL, Java, Bash

Frameworks/Libs: Pandas, PySpark, Snowpark, DBT

Cloud & Big Data: AWS (S3, Glue, Lambda, Redshift, RDS), Azure (ADF, Synapse, Blob), Hadoop, Spark, Hive, Kafka ETL & Orchestration: Apache Airflow, AWS Glue, Azure Data Factory, Dbt, SnapLogic, SSIS, Informatica Databases: Snowflake, Redshift, PostgreSQL, MySQL, Oracle, SQL Server, Databricks Reporting: Power BI, Tableau

Developer Tools: VS Code, IntelliJ, PyCharm, SQL Workbench, DBeaver, GitHub, Bitbucket Modeling Tools: Erwin, MySQL Workbench, SSMS

CI/CD & Infra: Jenkins, Terraform, Docker, Kubernetes Professional Experience:

Anblicks (Citi Bank), Dallas — Senior Data Engineer (Big Data PySpark Snowflake) 09/2024 – Present

• Designed and maintained scalable ETL pipelines to migrate data from Oracle to Hive, processing data across 250+ tables with reconciliation and validation checks that ensured 100% data accuracy.

• implemented a robust data quality framework to monitor pipeline health in real time, enabling automated anomaly detection, reducing data issues, and minimizing manual intervention.

• Developed PySpark solutions in Hadoop to perform large-scale data transformations and promotional analytics, boosting card sales by 12%.

• Automated data pipeline deployments via Bitbucket, Jenkins, and ServiceNow; scheduled jobs using AutoSys for high reliability.

• Built centralized Snowflake repositories for enterprise reporting, integrating multiple upstream sources.

• Created reusable Python-based utilities for pipeline configuration management, reducing workflow development time by 90% across the team.

• Enhanced Unix-based big data workflows for optimized job execution and operational stability. Mphasis (JPMC), Dallas — Python Data Engineer

11/2023 – 08/2024

• Designed and maintained scalable ETL pipelines using SnapLogic and PySpark to migrate data from Oracle to Hive, processing 250+ tables with reconciliation and validation checks to ensure 100% data accuracy.

• Processed and transformed large datasets using Databricks, converting Oracle PL/SQL logic into scalable PySpark and Python pipelines, reducing job execution time from 17 minutes to 5 minutes.

• Integrated a data quality framework into data pipelines, leveraging Jenkins and Bitbucket to enable automated validations and reduce downstream data issues.

• Designed YAML-based workflows and configured Airflow DAGs for orchestrated job execution, data validation checkpoints, and real-time monitoring.

• Developed Snowflake views and handled data handoff from Databricks for downstream consumption.

• Integrated SonarQube, Blue Ocean, and Git hooks for code quality, version control, and pipeline reliability.

• Conducted UAT and pre-prod validations using congo framework to ensure accuracy of migrated data workloads. PharmaACE, Pune — Lead Data Engineer

04/2016 – 10/2023

• Led end-to-end data engineering for global healthcare clients (e.g., Ferring, Bayer, Amgen, GSK, Regeneron), delivering robust data pipelines, ELT frameworks, and cloud integrations.

• Developed Python-based pipelines for ingesting and transforming data into staging, fact, dimension, and ARD layers, boosting data team productivity by 40% and speeding up business reporting.

• Built reusable transformation logic using PySpark and dbt, reducing processing time from 16 hours to 5.5 hours across 25 disease areas, while ensuring consistency and scalability across projects.

• Managed multi-cloud data lakes and warehouses using Snowflake, Redshift, Databricks, Dbt and SnapLogic, processing datasets over 10TB.

• Spearheaded a data product for claims data ingestion from IQVIA, with a React frontend and Python backend, automated inside Redshift DWH for real-time reporting.

• Enabled GenAI model (Genie) integration by preparing structured datasets to solve complex business problems in pharmaceutical marketing and R&D.

• Implemented AWS IAM, KMS, and logging standards to maintain data security, compliance, and auditing.

• Monitored data workflows and pipelines using CloudWatch, Airflow, Jenkins, and implemented alert-based handling. IDBI Bank (IDBI Intech Ltd), Mumbai — Oracle Database Developer 03/2015 – 03/2016

• Developed complex PL/SQL procedures on Oracle 11g, improving data transformation efficiency by 50% compared to SSIS-based workflows.

• Optimized SSIS ETL pipelines, reducing runtime by 20% and improving data reliability by 40%.

• Automated ETL job schedules and monitoring to reduce manual oversight.

• Supported Informatica workflows, identifying and resolving failures, reducing downtime by 50%.

• Built interactive dashboards in Power BI for MIS and analytics, integrating predictive capabilities. TCS (Bank of America), Mumbai — MS SQL Developer

09/2012 – 03/2015

• Developed SSRS dashboards and reports using bar, scatter, map, and pie visualizations; applied complex filters and calculations for strategic reporting.

• Optimized SQL Server databases for .NET-based enterprise apps; implemented secure authentication using OAuth/.NET Identity.

• Created ETL processes with SSIS, improving reliability and scaling with growing data volumes.

• Facilitated code migration and deployment between Dev, QA, and Prod environments, ensuring minimal downtime.

• Acted as key liaison for production support, identifying and resolving data issues in real-time.

Contact this candidate