Data Engineer Sql Server

Location:

Irving, TX

Posted:

September 10, 2025

Contact this candidate

Resume:

KISHVINKUMAR GODA

SR CLOUD DATA ENGINEER

+1-972-***-**** *****.****@*****.*** LinkedIn: linkedin.com/in/kumar-g-1689b62b0

PROFESSIONAL SUMMARY

Over 6 years of experience in data engineering, specializing in designing and implementing scalable data pipelines on Azure, AWS, and GCP.

Proficient in big data technologies such as Hadoop, hive, Spark, and Kafka for processing and analyzing large datasets.

Strong expertise in cloud platforms like Azure, AWS, and GCP, for deploying and managing data infrastructure.

Proficient in programming languages such as Python, Java, and Scala for data manipulation and processing.

Experienced and optimized complex SQL queries and stored procedures for data extraction, transformation, and loading.

Designed and optimized scalable ETL pipelines in Databricks using PySpark, implementing Delta Lake for efficient data storage and version control.

Designed, optimized, and maintained complex stored procedures, query tuning, and performance optimization in Microsoft SQL Server 2012+, improving query performance by 35%.

Expertise in advanced database features such as partitioning, indexing, and materialized views in Oracle, MS SQL Server, PostgreSQL, and MySQL to improve query performance and data management.

Implemented data processing workflows in Python using libraries such as Pandas, NumPy, and SQLAlchemy for analytics and reporting.

Experience with data lake architecture using technologies like Apache and Amazon S3 for centralized data storage.

Integrated DBT transformations with data warehouses (Snowflake / Redshift / BigQuery) for scalable ETL data workflows.

Troubleshot and resolved cloud application issues using CloudWatch, Snowflake logs, and Python-based monitoring scripts.

Developed and scheduled Databricks notebooks for batch and real-time processing, ensuring data quality through validation and transformation frameworks.

Expertise in custom data models and schemas within ERP systems to accommodate unique business requirements and data structures.

Proficient in data security best practices, including encryption, access control, and compliance with regulations such as PII, GDPR and HIPAA.

Designed and implemented Informatica MDM Hub architecture (Landing, Staging, Base, ORS, and IDD) for enterprise-wide master data management across customer and product domains.

Built and maintained data warehouses in Snowflake, leveraging features like Streams, Tasks, and Time Travel for data ingestion and CDC processing.

Experience with cloud-native data services like Azure, AWS Glue, GCP Dataflow, and Data Factory for ETL processing.

Worked with the Atlassian toolset (JIRA, Confluence, Rally, GitHub) for project tracking, documentation, CI/CD, and version control.

Developed and maintained ETL pipelines in Informatica & IICS, handling large-scale data ingestion, transformation, and loading processes across multiple domains.

Migrated legacy ETL processes to IBM DataStage (from Talend/SSIS), reducing batch run times and improving maintainability.

Experience with data anonymization and masking techniques to protect sensitive information.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem:

Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Sqoop, Oozie, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm, ERP.

Programming Languages:

Java, Python, Hibernate, JDBC, JSON, HTML, CSS.

Cloud Technologies:

AWS, GCP, Amazon, Azure, S3, EMR, Redshift, Lambda, Athena Composer, Big Query, and ERP.

Script Languages:

Python, Shell Script (bash, shell).

Databases:

Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB, NoSQL.

Version controls and Tools:

GIT, Maven, SBT, CBT.

Web/Application server:

Apache Tomcat, WebLogic, WebSphere

EDUCATION

Master’s in information technology, Central Michigan University, Michigan, USA

Bachelor’s in Electronics and Communication Engineering, Jawaharlal Nehru Technological University, India.

WORK EXPERIENCE

Sr Data Engineer/ ETL Developer Truist Bank Charlotte, NC Sep 2023 - Present

Project Description:

As a Sr. Data Engineer at Truist Bank, I contributed to designing, developing, and enhancing scalable backend and frontend systems. Involved in data migration project from on- premises, data lake to snowflake. My role involved integrating various technologies to create robust applications, optimizing performance, and streamlining deployment pipelines to ensure smooth and reliable software delivery. I collaborated with cross-functional teams to build secure, high-performance data solutions that supported the bank’s data governance and analytics initiatives.

Developed & Enhanced ETL pipelines iteratively using Agile methodologies to align with evolving business intelligence and regulatory compliance needs.

Developed ETL pipelines using Talend and IBM DataStage, pulling data from Hive EDL Lake and various Systems of Record (SORs) to populate Snowflake’s Silver and Gold layers.

Migrated enterprise data systems from on-prem databases (Oracle, MSSQL, Netezza) to Snowflake, following the Medallion Architecture (Bronze, Silver, Gold).

Enhanced ETL pipelines iteratively using Agile methodologies to align with evolving business intelligence and regulatory compliance needs.

gned and implemented cloud-based data pipelines and applications on AWS using Snowflake, DBT, and Python to meet evolving business and technical requirements.

Designed and deployed SCD Type 1 & Type 2 transformations using Talend Joblets and reusable mappings.

Utilized IBM InfoSphere Data Replication (IIDR) for real-time CDC implementation, configured Snowflake Streams and Tasks for incremental data loads.

Explored ChatGPT’s potential to automate documentation and enhance knowledge-sharing within the data team; evaluated outcomes and proposed integration strategy to leadership based on PoC results.

Performed data profiling and cleansing using Informatica Data Quality (IDQ), creating reusable rules and transformations for standardization and deduplication.

Conducted a Proof of Concept (PoC) using OpenAI’s ChatGPT API to test feasibility of integrating AI-based query generation into existing data pipelines, improving internal tool efficiency and user experience.

Optimized Java-based microservices to handle high-volume data transactions, reducing latency by 25%.

Created stored procedures in Snowflake for logic-driven transformations and used CA Workstation for scheduling production pipelines.

Involved the migration of data pipelines and reporting infrastructure from Azure Synapse and Data Factory to Microsoft Fabric, improving analytics performance and reducing dashboard latency by 25%.

Collaborated cross-functionally with business and engineering teams to integrate Microsoft 365 Suite tools (Excel, Power BI, SharePoint) with Fabric workspaces, enhancing self-service analytics and cloud-native collaboration.

Integrated Informatica Data Director (IDD) to enable data stewards with role-based UI for managing exceptions, manual merges, and data governance approvals.

Worked extensively in AWS environments, leveraging RDS and Redshift for scalable storage, querying, and data integration.

Built and maintained AI/ML-ready data sets to support predictive analytics, leveraging Python and Snowpark for data preparation.

Used Python and SQL for scripting, data wrangling, and custom validation tasks. performance and written SQL queries and PL/SQL procedures to perform database operations

Used DataStage Director for monitoring, debugging, and troubleshooting jobs in real time to minimize ETL failures.

Collaborated across data and business teams to align Salesforce CRM data with enterprise data models in Azure and Fabric, ensuring clean data flow and accurate analytics delivery.

Developed predictive models for claim fraud detection using Python (scikit-learn).

Built ML pipelines on Databricks MLflow for end-to-end lifecycle management.

Designed data models using Erwin and governed data definitions, naming conventions, and lineage documentation.

Integrated Python scripts with AWS S3, Redshift, Snowflake, and Hive to automate data extraction, transformation, and loading.

Designed and developed modular SQL-based transformations in DBT to build reliable data pipelines supporting ETL workflows.

Implemented source-to-target mappings and data models (staging, intermediate, and mart layers) using DBT for efficient transformation and reporting.

Automated data quality checks, tests, and schema validations in DBT to ensure accuracy, consistency, and trustworthiness of transformed datasets.

Delivered curated Snowflake Views and data marts for internal clients and reporting teams.

Built dashboards in Power BI for KPIs and operational reporting around complaints, fraud, disputes, and customer profiles.

Performed data modeling using ER Studio to standardize enterprise data assets, ensuring compliance with governance standards.

Ingested and transformed JSON data from upstream systems, ensuring accurate mapping, validation, and integration into downstream applications.

Participated in full development lifecycle from data ingestion to enrichment, including impact analysis and performance tuning.

Conducted performance tuning and cost optimization of cloud resources, improving query speeds and reducing compute spend by 20%.

Scheduled and monitored data pipelines using Apache Airflow for workflow orchestration

Environment: Python, Hadoop, Hive, GCP, BigQuery, Pig, AML, Sqoop, ADLS, Talend, Agile, Scala, Azure Databricks, Kafka, Flume, HBase, Pyspark, ADF, Hortonworks, Oracle 10g/11g/12C, Teradata, Cassandra, Impala, Pub/Sub, Data Lake, ERP, Spark, MapReduce, Ambari, Cloudera, Tableau, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar.

Data Engineer Western Union Denver, CO Jun 2022 – Sep 2023

Project Description:

At Western Union, I played a key role in developing and modernizing the organization's cloud-based data infrastructure to support real-time transaction monitoring, regulatory compliance, and business analytics. Developed an end-to-end real-time streaming pipeline to monitor and analyze Western Union’s money transfer events. The system uses Apache Kafka for event ingestion, Apache Airflow for orchestration, Apache Cassandra for real-time storage, and is fully containerized using Docker with a Confluent stack (Kafka, Zookeeper, Schema Registry, Control Center). The pipeline supports schema enforcement, near real-time alerting, fault tolerance, and lineage tracking.

Designed and implemented a real-time streaming pipeline using Apache Kafka, Databricks, and Python to ingest and process high-volume POS and fraud events with millisecond latency.

Developed Python-based Kafka producers and consumers to simulate transaction and fraud events, storing structured data in Cassandra and Databricks Delta tables for real-time analytics.

Developed scalable ETL pipelines using PySpark and Python to process structured and unstructured data from multiple source systems into data lakes and data warehouses.

Optimized Spark jobs by implementing partitioning, bucketing, and broadcast joins, improving performance and reducing processing time by 40%.

Built Databricks notebooks for ETL, data validation, and aggregation of POS and fraud event streams, enabling downstream dashboards and operational reporting.

Leveraged Kafka Schema Registry with Avro in Databricks to enforce schema consistency and safe evolution for dynamic POS and MWM event structures.

Orchestrated ETL and data quality workflows using Apache Airflow, including automated reconciliation of MWM work order completion events.

Implemented schema validation and data quality checks in PySpark to ensure accuracy and consistency across data pipelines.

Developed normalized OLTP schemas for transactional processing systems.

Designed and developed end-to-end ETL pipelines using SnapLogic for ingesting structured and semi-structured data from diverse sources (SQL Server, Oracle, REST APIs, S3, and Salesforce) into cloud data warehouses.

Built and optimized data pipelines using SnapLogic Snaps (File Reader, JSON Parser, Mapper, SQL Execute, Join, Aggregate, Router, and REST API Snaps) to ensure efficient data movement and transformation.

Implemented real-time monitoring and alerting dashboards in Databricks and Confluent Control Center to track event throughput, MWM API health, and SLA adherence.

Developed a Databricks Python pipeline to join transaction events with work order statuses, providing full event-to-resolution lineage for compliance reporting.

Conducted chaos testing and failure simulations for Kafka brokers, MWM APIs, and Databricks streaming jobs to ensure high availability and fault tolerance.

Migrated legacy ETL jobs from SQL/SSIS to PySpark-based pipelines, reducing job execution time and improving maintainability.

Automated data ingestion and transformation workflows by configuring SnapLogic Task Scheduler and integrated with external orchestrators (Airflow, Control-M).

Collaborated with field operations to define procedures for on-site fraud investigation and POS equipment replacement, integrating MWM and Databricks workflows for automated tracking and closure.

Environment: Azure, GCP, BigQuery, GCS Bucket, Python, AML, ADLS Gen2, Agile, G-Cloud Function, Pub/Sub, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Oracle MWM, Apache kafka, Cassandra, Command Line Utilities, ADF, ERP, Redshift, Dataproc, Databricks, Cloud SQL, MySQL, Postgres, SQL Server, Python, Scala, Spark, Hive, Spark -SQL.

Data Engineer Sonata Software India Jun 2019 – Dec 2021

Project Description:

At Sonata Software, I contributed to designing and implementing scalable data integration and migration solutions. Worked as a Business Intelligence Product Solutions Analyst for one of the leading hospitality and car rental clients. The role involved delivering and maintaining enterprise BI solutions that support hotel and travel brands in achieving their data-driven strategic and operational goals. Responsible for managing customer support tickets via Salesforce, executing data mapping for client onboarding, and customizing report delivery aligned to KPIs such as occupancy rates, booking velocity, and channel performance.

Provided consultative support and hands-on delivery of Business Intelligence (BI) solutions for top-tier hospitality and car rental clients, aligning product implementations with client business goals.

Managed and resolved over 100+ Salesforce service tickets monthly, ensuring timely and accurate communication between internal technical teams and enterprise travel clients.

Conducted data mapping and transformation activities for new client integrations, ensuring accurate alignment of source data with BI platform requirements.

Collaborated directly with client-side technical and business stakeholders to troubleshoot, update, and maintain Enterprise Edition subscriptions across multi-national travel brands.

Worked on root-cause analysis and resolution of product and client-specific bugs, contributing to the improvement of platform reliability and client satisfaction.

Monitored automated data quality alerts daily, identifying anomalies and implementing corrections to maintain 100% data integrity in BI reports and dashboards.

Used PySpark UDFs and Python functions for complex business logic transformations not supported natively in Spark SQL.

Created parameterized ETL jobs in PySpark allowing dynamic table ingestion across multiple environments.

Performed in-depth data quality research and database analysis, using SQL queries to investigate discrepancies and support issue resolution.

Optimized Power BI and SSAS Tabular queries using DAX Studio by analyzing query plans, execution times, and server timings, leading to improved report performance and faster refresh cycles.

Leveraged DAX Studio for debugging and fine-tuning complex DAX measures, validating data models, and ensuring accuracy in KPIs and business-critical dashboards.

Applied domain expertise in travel, leveraging analytical thinking and client engagement to ensure data solutions supported strategic and operational decisions for global hotel chains.

Environment: PowerBI, Oracle, Teradata SQL Assistant, (CI/CD) pipelines, PostgreSQL, C#, HDFS, MS SQL Server, PL/SQL, Agile Methodology, My SQL, Redshift, AWS, Jenkins.

Contact this candidate