Data Engineer Engineering

Location:

Dallas, TX

Posted:

May 28, 2025

Contact this candidate

Resume:

Samir Brahmbhatt

LinkedIn : https://www.linkedin.com/in/samirbrahmbhatt/

Professional Summary:

Passionate Data Engineer / Data Integration developer with 12+ years of experience in designing, developing and implementing data solutions having various level of complexities. Possess an unique combination of skills having expertise in Data Integration and Data Engineering building ETL/ELT pipelines for Cloud Data Platforms with a strong focus on building AWS S3 Datalake and AWS Redshift DataWareHouse. Adept at modernizing traditional ETL processes and migrating existing systems to a cloud-based architecture using ELT paradigms by leveraging DevOps best practices and Data Engineering methodologies best practice to drive efficiency and business value. Team player working in a collaborative and agile environment succeeding using unique and excellent inter-personal skills.

Core Competencies:

Data Engineering & ELT: Designed and built CI/CD data pipelines for AWS S3 Data Lake and AWS Redshift, transitioning from traditional ETL to modern ELT paradigms.

Cloud Technologies: Used AWS (S3, Glue, Lambda, Redshift, Spectrum, AppFlow, DBT) and containerization tools like Docker, Kubernetes.

Data Integration Tools and Technologies: Extensive experience with Informatica On-Prem and IICS (CDI/CAI/Mass Ingestion).

Programming & Automation: Python (3.8/3.9/3.11), Unix Shell Scripting, Win-Batch Scripting, SQL, PL/SQL, T-SQL.

Database Technologies: Hands-on experience with Redshift, SQL Server, Exadata, Oracle, DB2, and Kimball Dimensional Modeling (Star/Snowflake Schema, SCD Type 1 & 2).

CI/CD Tools and Technologies : Currently using GitLab, Terraform, Docker, Kubernetes, Apache Airflow, TeamCity, Kibana, ElasticSearch to streamline data workflows.

API Integration: REST, SOAP, WSDL, Swagger, Postman, ElasticSearch API

Technical Skills:

Cloud Platforms: AWS (S3, Glue, Lambda, Redshift, Spectrum, AppFlow)

ETL Tools: Informatica IICS (CDI/CAI), PowerCenter 10.x/9.x, SQL Server DTS, DBT

Streaming & Messaging: Kafka

Programming Languages: Python (3.8/3.9/3.11), Unix Shell Scripting, SQL (T-SQL, PL/SQL)

CI/CD: GitLab, Terraform, Docker, Kubernetes, Apache Airflow, TeamCity

Databases: Redshift, SQL Server, Exadata, Oracle, DB2

Professional Experience:

Data Engineer – Rewards Network - Chicago, IL (April 2015 – Present)

Work Insights & Responsibilities:

Currently working on building CI/CD data pipelines for AWS S3 Data Lake and Redshift, modernizing legacy ETL workflows into ELT architecture.

Designed, Developed, Tested and Deployed data solutions using Data Engineering best practices migrating from an on-prem SQL Server-based system to AWS S3 Datalake and Redshift Datawarehouse.

Used AWS Glue to extract data from MSSQL Server DB and Snowflakle and Load into AWS S3 Data Lake. Leveraged Glue’s PySpark engine, Dynamic Frames and auto-scaling features to process high data volumes writing into S3 in Parquet/Gzipped format enabling downstream pipelines to efficiently process data to load into Redshift Raw Layer

Used AWS AppFlow to extract data from SalesForce and load into AWS S3 Data Lake .

Used AWS Lambda to trigger AWS Glue and AppFlow jobs.

Designed, Developed, Tested and Deployed custom data pipelines using Python reading source S3 files in parquet format and loading the data into RedShift Raw layer. This turned out to be a robust, scalable and efficient solution making it a truly dynamic solution. Below are few features of this design:

oDynamically picks up newly added data sources (Tables/Kafka Topics) from S3 bucket.

oUses AWS Copy command to read the parquet files in S3 and load it into Redshift Raw staging table.

oDynamically handles Source schema changes and drop/create/alter/truncate/update and insert target tables in RedShift.

oHandles Type2 logic maintaining history as required by the Data Science team.

oHandles manual reprocessing/recovery of the load

oCan process Initial Historical Load and subsequently process daily incremental loads

Used Apache AirFlow to Schedule, Orchestrate and Execute these jobs using AirFlow DAGS.

Developed Informatica IICS CDI mappings to ingest real-time streaming data from Kafka, replacing legacy AS400 batch processing.

Developed Python-based API integration solutions to extract data from various API’s viz. Google Analytics, TypeForm, GravityForm, ElasticSearch, AMEX and VISA.

Designed, Developed and Deployed, highly visible and challenging Real Time Under Writing Process. The impact was significant in terms of comparing manual approval process [taking couple of months] to near real-time application approval [ranging from few hours to few days]. This proved to be a game changer as it enabled the Due Diligence team to be more efficient and enhanced the confidence of the Sales team on the ground.

Environment: GitLab, Terraform, Python 3.8.x/3.9.x/3.12.x, Docker, Kubernetes, Apache AirFlow, AWS S3, AWS GLUE, AWS LAMBDA, AWS AppFlow, AWS SPECTRUM, AWS REDSHIFT, Informatica Cloud Services [IICS – CAI & CDI], Informatica 9.5.1/10.1.1, Kafka, External API’s, Infa Power Exchange CDC, Sql-Server 2012-17, Micro Strategy 10/DOMO, AS400 (Source), Windows Batch Scripting, SalesForce (CRM), REST/SOAP Web Services [WSDL/SWAGGER], ElasticSearch 6.8, SalesForce, SnowFlake, DBT, Prometheus, Graffana, Atlassian [Jira/Confluence]

Portfolio Recovery Associates – Chicago, IL Oct, 2012 – Mar, 2015 ETL Developer

Work Insights:

Successfully migrated existing workflows reading source data from Oracle Streams CDC to Informatica CDC to read the change capture set from the source system.

Successfully changed the existing workflows connecting to oracle 11g db to exadata db

Designed and developed workflow to load aggregated fact table.

Developed various mappings to load the data from various sources and load into target, using different transformations such as SQ, Expression, Sorter, Lookup (Connected/Unconnected), Aggregator, Update Strategy, Joiner, Filter, Stored Procedure and Router Transformations.

Used Pre-Post / look up / target update sql overrides in the mappings

Used informatica pass through partitions to improve the throughput of snapshot workflows.

Developed Oracle packages to call its procedure in infa mappings

Developed several oracle stored procedures to use in infa mappings

Developed mapplets/UDF’s and re-usable transformations to leverage reusability in informatica mapping designer

Created workflows/worklets using various tasks like Email, Event-wait and Event-raise, Timer, Scheduler, Control, Decision, Session in the workflow manager.

Developed oracle tables having range/list/hash partitioning types for better data management and improve query performance.

Implemented efficient and effective performance tuning measures to improve the session throughput.

Used the parameter files and defined mapping parameters and connection information in the file.

Coordinated with the QA team to resolve any defects during UAT.

Environment: Informatica 9.1, Oracle Exadata, Oracle Streams, Infa CDC, Sql-Server, Windows Batch Scripting

Nike Inc. – Beaverton, OR Mar, 2012 – Sep, 2012

ETL Developer (Consultant - Infosys Technologies)

Responsibilities:

Developed data warehouse solutions based on star schema with the help of STM’s and Functional Specs provided by the business unit.

Developed various mappings to load the data from various sources and loading into target, using different transformations such as SQ, Expression, Sorter, Lookup (Connected/Unconnected), Aggregator, Update Strategy, Joiner, Filter and Router Transformations.

Developed various Type 1 and Type 2 SCD’s.

Implemented common ETL Load control tables framework to track the load status to better manage the restart ability of the loads in case of load failures.

Extensively used the debugger to debug the mapping and to determine the data discrepancy coming from the source or arising from the business rules being incorrectly applied.

Implemented efficient and effective performance tuning measures to improve the session throughput.

Used the parameter files and defined mapping parameters and connection information in the file.

Identified the bottlenecks at source level and used the sql override to improve the query performance.

Coordinated with the QA team to resolve any defects logged in Mercury Quality Center.

Environment: Informatica 8.6.1, Oracle 10g, UNIX, Autosys

Bank of Nova Scotia Informatica ETL Analyst Toronto, Canada Mar, 2010 – Feb, 2012)

Environment: Informatica 8.6.1, DB2, AIX, Tivoli Work Scheduler (Maestro)

Education: -

Dip. In Electronics Engineering (MSPM/BTE – India)

Bachelor In Computer Applications (MCI/ASU – India)

Contact this candidate