Data Engineer Warehouse

Location:

United States

Posted:

October 15, 2025

Contact this candidate

Resume:

AAKASH KUMAR PATEL

Senior Data Engineer

*******************@*****.*** +1-469-***-****

PROFESSIONAL SUMMARY:

Data Engineer with 9+ years of experience in interpreting and Analysing Data, in order to drive successful business solutions.

Proficient knowledge in Statistics, mathematics and Analytics. Excellent knowledge in business operations and analytics tools for effective data Analytics.

Extensive experience in Strategic development of a Data Warehouse and in Performing Data Analysis and Data Mappings for operational Data Stores, Enterprise Data Warehouse, Data Marts, other databases.

Expertise in conceptual, logical and physical designing and development of Data Models, Data Normalization, OLTP and OLAP Database and Data Warehouse.

Utilized SnowSQL for scripting, querying, and automating Snowflake database operations in data pipeline workflows.

Designed and implemented scalable data pipelines leveraging Snowflake for optimized ETL processing and data warehousing, ensuring high performance and reliability.

Integrated CI/CD pipelines with version control and orchestration tools to ensure seamless, continuous deployment of data pipelines and ETL processes.

Designed and implemented scalable data pipelines using AWS services such as AWS Glue, AWS Lambda, and Amazon S3 to automate ETL workflows and reduce processing time.

Developed and optimized complex data pipelines using Snowflake, leveraging its native SQL capabilities for large-scale data transformations and performance tuning.

Designed and implemented Big Data pipelines using Apache Spark and Hadoop to process over 5TB of structured and unstructured data daily, significantly improving data processing speed and reliability.

Developed and automated data ingestion pipelines using Bash scripts to streamline ETL processes, reducing manual intervention and improving data processing efficiency

Designed and maintained scalable PostgreSQL databases, ensuring high availability and performance for large-scale data analytics.

Designed and implemented complex data pipelines using SQL to transform and aggregate multi-source data for analytics and reporting.

Designed, deployed, and managed scalable data pipelines on AWS EC2 instances to efficiently process large datasets, ensuring high availability and fault tolerance.

Strong working experience with various python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling.

Developed and maintained scalable data pipelines using Apache NiFi to automate data ingestion, transformation, and routing from multiple sources, improving data processing efficiency

Extensive experience in creating and maintaining source to target data mapping documents.

Developed and maintained scalable data pipelines using Airflow to orchestrate complex workflows, ensuring timely and reliable data ingestion and transformation across multiple systems.

Designed and implemented scalable data pipelines leveraging AWS Services S3 for efficient storage and retrieval of large datasets in a cloud environment.

Developed and maintained real-time data pipelines using Apache Kafka, enabling seamless streaming of large volumes of data between multiple systems with minimal latency.

Experienced in all facets of the Software Development Life Cycle using Waterfall and Agile/Scrum methodologies.

Implemented serverless Lambda functions to trigger real-time data transformations and ingestion into data lakes, ensuring scalable and cost-effective workflows.

Created parameterized and modular shell scripts to orchestrate data flows between Hadoop, Hive, and relational databases.

TECHNICAL SKILLS

Programming Languages

Python, SQL

Frameworks

Django, Pyramid, MVT, Flask, Angular 15/13/12/11/10/9/8/7/6/5/4/2, Angular JS

IDE

My Eclipse, Py Charm

Databases

Oracle, MYSQL, PostgreSQL, MongoDB

Web Technologies

Angular, HTML5, CSS3, XML, AJAX, JSON

Methodologies

Agile, Scrum and Waterfall

Version control

SVN, GIT, GitHub,

Cloud Platform

Amazon Web Services

Bug Tracking tools

JIRA

Operating Systems

Windows, Linux

PROFESSIONAL EXPERIENCE

First National Financial Services, Dallas, TX Jun 2022 – Till Date

Senior Data Engineer

Worked closely with stakeholders and subject matter experts to elicit and gather business data requirements.

Used Pandas, NumPy, seaborne, SciPy, Matplotlib in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for data analysis.

Developed custom parsers using PyParsing to extract structured information from complex log files, XML, and non-standard text formats.

Utilized Python to design and automate end-to-end data pipelines, ensuring seamless integration between data ingestion, transformation, and storage systems.

Developed and maintained complex SQL queries, stored procedures, and user-defined functions within Snowflake, improving data accessibility for analytics teams.

Utilized Bash scripting to monitor, troubleshoot, and optimize large-scale data workflows, ensuring timely data availability and system reliability.

Designed and implemented data warehousing solutions on Snowflake, ensuring scalable architecture and efficient storage usage through features like micro-partitioning and zero-copy cloning.

Optimized performance of ETL workflows running on EC2 by configuring instance types and storage options, resulting in a reduction in data processing time.

Leveraged AWS Redshift and Amazon RDS for efficient data storage, query optimization, and real-time analytics enabling faster business insights.

Implemented data transformation and aggregation jobs in Hive to support ETL workflows, ensuring data accuracy and consistency across multiple data sources.

Integrated multiple data sources into Snowflake using Snowpipe and bulk loading techniques to facilitate near real-time data ingestion and transformation.

Developed and maintained Big Data ingestion workflows using Kafka and Flume, ensuring real-time data availability for analytics and reporting purposes.

Spearheaded the integration of Business Intelligence (BI) tools (e.g., Power BI, Tableau) with enterprise data warehouse solutions to deliver actionable insights to stakeholders.

Designed and implemented CI/CD pipelines to automate data workflow deployments, reducing manual errors and accelerating delivery cycles

Developed automated data quality checks using SQL, ensuring data integrity across ETL processes.

Implemented Kafka Connectors to integrate various data sources and sinks, automating data synchronization and reducing manual intervention.

Automated monitoring and alerting for ETL processes with Airflow DAGs, improving pipeline stability and reducing data latency by proactively addressing workflow failures.

Integrated Snowflake with various data sources (e.g., AWS S3, Kafka, and REST APIs) for seamless ingestion and real-time analytics, improving data availability across departments.

Collaborated with data analysts and scientists to write advanced SQL queries for exploratory data analysis and model input preparation.

Implemented real-time data flow monitoring and error handling in Apache NiFi, ensuring data quality and reducing pipeline failures by proactively troubleshooting issues.

Built machine learning workflows using Python frameworks like Scikit-learn, TensorFlow, and PyTorch to support model training and deployment.

Secure Life Insurance, Fort Worth, TX Jan 2021 –May 2022

Sr Data Engineer

Used Pandas, Numpy, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machine learning algorithms.

Reprocess the data by data cleaning and select appropriate fields using feature selection methods like Univariate selection using Scikit-learn library and feature importance using Extra tree classifier.

Designed and implemented enterprise-scale data warehouse architectures using Snowflake, Redshift, and BigQuery, supporting petabyte-scale data workloads and analytics.

Utilized GROUP BY and PARTITION BY clauses for advanced data aggregation, window functions, and report generation tasks.

Leveraged Python for data validation, anomaly detection, and quality assurance in AI datasets to maintain high model performance.

Optimized Snowflake data models by employing clustering keys, partitioning, and micro-partition pruning, significantly reducing query execution times.

Utilized Scala's functional programming paradigms to create modular and reusable code components, improving maintainability.

Automated error handling, logging, and notifications in ETL Informatica workflows to improve operational efficiency and reduce downtime.

Optimized Big Data storage solutions on AWS S3 and HDFS, reducing storage costs by 30% through efficient partitioning and data lifecycle management.

Managed role-based access controls and security policies within Snowflake to safeguard sensitive data and comply with organizational governance standards.

Developed and optimized complex data pipelines using Hive to efficiently process and analyze large-scale datasets on Hadoop clusters, improving query performance

Collaborated with data scientists to prepare ML-ready datasets using PySpark, ensuring data consistency and lineage through all processing stages.

Implemented backup, restore, and disaster recovery strategies for MongoDB and Cassandra, ensuring high availability and data resilience.

Automated batch job scheduling and error handling processes within ETL tools – Talend, enhancing operational efficiency.

Created and maintained Snowflake roles, users, and access controls to enforce enterprise-grade security and data governance policies.

Implemented robust unit testing and CI/CD pipelines for Python codebases using tools like PyTest, GitHub Actions, and Docker.

Collaborated with Data Scientists and Analysts to create Big Data models that powered advanced analytics and machine learning use cases across marketing and operations teams.

Automated ELT workflows using Snowflake tasks, streams, and stored procedures, significantly reducing manual intervention and improving data freshness.

Developed and maintained data pipelines using AWS Lambda to automate ETL processes, improving data processing efficiency

Mentored junior engineers on SQL best practices, performance tuning, and modular query writing for scalability.

Fidelity Banking Services, Hyderabad, India May 2019 – Dec 2020

Data Engineer

Applied Supervised Machine Learning Algorithms Logistic Regression, Decision Tree, and Random Forest for the predictive modelling various types of problems.

Used pandas, NumPy, Seaborne, SciPy, matplotlib, ski-kit-learn, NLTK in Python for developing various machine learning algorithms.

Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.

Setup storage and data analysis tools in AWS cloud computing infrastructure.

Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.

Collaborated with analytics and BI teams to deliver consistent data models on Snowflake, accelerating report generation and decision-making processes.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Implemented performance tuning and debugging of Scala applications to reduce job runtimes and optimize resource utilization.

Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).

Built Big Data monitoring and alerting systems using Airflow and Prometheus, ensuring high availability and quick incident response for critical data pipelines.

Collaborated with data scientists and analysts to architect Hadoop-based solutions that supported advanced analytics and machine learning workloads.

Collaborated with cross-functional teams to migrate legacy ETL workloads to Snowflake, enhancing scalability and reducing overall operational costs.

Implemented data quality checks and validation rules within ETL Informatica to ensure accuracy and completeness of critical business data.

Utilized PERL for parsing large log files and generating real-time analytics dashboards, enhancing monitoring and alerting capabilities.

Integrated DBT runs into CI/CD pipelines, enabling automated deployment and monitoring of data workflows in production environments.

Configured secure data sharing in Snowflake to enable real-time, governed data access for external partners and internal teams.

Created interactive dashboards and data visualizations in Python using Plotly and Dash for presenting AI insights to stakeholders.

Maintained a C++ based data quality validation tool that ensured consistency across multiple ETL stages.

Developed NLP models for Topic Extraction, Sentiment Analysis.

Led the migration of legacy ETL processes to modern Big Data frameworks, enhancing scalability, fault tolerance, and maintainability across the data infrastructure.

Monitored and optimized Talend job performance, identifying bottlenecks and tuning jobs to handle multi-million record datasets with minimal latency.

Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)

Used AWS S3, Dynamo DB, AWS lambda, AWS EC2 for data storage and models' deployment.

Programmed a utility in Python that used multiple packages (NumPy, SciPy, pandas)

Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.

Managed datasets using Panda data frames and MySQL, queried MYSQL relational database (RDBMS) queries from python using Python-MySQL connector MySQL dB package to retrieve information.

Performed data cleaning and feature selection using MLLib package in Spark and working with deep learning frameworks such as Tensor Flow.

Reliant Insurance Solutions, Pune, India Apr 2017 – Apr 2019

Data Engineer

Analyzed data using Python, PySpark, Spark SQL in order to do real time stream analytics.

Packages used Pandas, NumPy, matplotlib, Ski-kit-learn in Python for developing various machine learning algorithms.

Wrote Python code and actively participated in the procedure to automate processes.

Developed and designed Python based API (RESTful Web Service) to interact with company’s website.

Created Business Logic using Python to create Planning and Tracking functions.

Created a Git repository and added the project to GitHub.

Maintained version control and deployment automation for Talend projects using Git and CI/CD tools, ensuring smooth production rollouts.

Utilized Agile process and JIRA issue management to track sprint cycles.

Applied the normal distribution for data by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.

Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.

Implemented Agile Methodology for building an internal application.

Worked closely with other Data Scientists to know data requirements for the experiments and domain knowledge.

Axis Data Systems, Ahmedabad, India Mar 2016 – Mar 2017

Data Analyst

Work with users to identify the most appropriate source of record and profile the data required for sales and service.

Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.

Worked with internal architects and, assisting in the development of current and target state data architectures

Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines

Involved in defining the source to target data mappings, business rules, business and data definitions

Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.

Involved in configuration management in the process of creating and maintaining an up-to-date record of all the components of the development efforts in coding and designing schemas

Developed the financing reporting requirements by analysing the existing business objects reports

Responsible in maintaining the Enterprise Metadata Library with any changes or updates

Document data quality and traceability documents for each source interface

Educational Details:

Diploma in Computer Science – Parul University, Vadodara, Gujarat -2016

Contact this candidate