Post Job Free
Sign in

Data Engineer

Location:
Portland, OR
Posted:
April 29, 2026

Contact this candidate

Original resume on Jobvertise

Resume:

Jay M. Trivedi

***.***********@*****.***

+1-302-***-****

Summary:

Data engineer with over 8 years of hands-on experience in managing data platforms on AWS and Azure and ETL.

Worked with various companies to develop and maintain their data architectures, ensuring the reliability and

efficiency of their data systems.

Developed and deployed SSIS packages for seamless data extraction, transformation, and loading (ETL) processes.

Utilized SSAS to design and optimize multidimensional cubes and wrote MDX queries for complex data analysis,

enhancing business intelligence capabilities.

Over the years, I have developed a strong understanding of cloud-based data management, data integration, ETL

processing, data warehousing, and data governance.

Experience working with both AWS and Azure has allowed me to build a diverse skill set, including proficiency in

services such as AWS S3, EC2, EMR, Red-shift, Azure Blob Storage, Azure Data Factory, Azure Data-bricks, and

Azure Synapse Analytics.

Experience with Big Data technologies such as Hadoop, PySpark, Hive, and HDFS.

Proficient in frameworks such as Django, Flask, and FastAPI, with extensive frontend experience in React,

Bootstrap, and Vue.js, adept at building responsive and scalable web applications. Ddesigned and implemented a

bronze, silver, and gold layer testing framework for a large-scale Azure Data Lake.

Created data warehousing solutions that allow organizations to store large amounts of data while ensuring that

data quality is maintained.

Experienced in writing sophisticated Scala and Python scripts for automation and data manipulation, improving

operational workflows.

Strong proficiency in Agile methodologies, Test-Driven Development (TDD), and utilizing tools like JIRA,

Confluence, and Kafka for efficient project management and collaboration.

Advised clients on leveraging messaging MQ and Kafka Connect to build data integration solutions.

Furthermore, I have created data analytics platforms that provide business insights that help organizations make

data-driven decisions, and passionate about working with data and leveraging new technologies to create

solutions that meet the needs of businesses.

Exceptional analytical skills, attention to detail, and the ability to work under pressure have allowed me to excel in

my career as a data engineer.

Technical Skills:

Big Data Platform: HDFS, MapReduce, Hive, Spark, HBase, Pig, Sqoop, Airflow, Kafka, Snowflake, ADLS (Data

Lakes), Blob Storage, Logic App, ADF, Data bricks, Azure EventHub, S3, EMR, Redshift, Pentaho.

Tools: SSIS, SSRS, SSAS, DTS, Informatica 10.4 / 10.5

Cloud: AWS S3, EC2, EMR, Redshift, ETL

Azure Blob Storage, ADL, Data Factory (ADF), Databricks, Deta Lake, Pyspark, Azure Synapse

Scripting Languages: Shell Scripting, Unix Script

Programming Languages: Scala, Python, C#, SQL

RDBMS: MySQL 5.5, MSSQL 2014/2015/2019, Oracle 11G, Redshift, Synapse, Python, SQL, PL/SQL.

IDE s: IntelliJ, Microsoft Visual Studio, Jupyter

Virtual Machines: VMWare, Virtual Box

Operating Systems: Unix, Linux, Windows 7/8/10/11

Data Visualization: Power BI

Pipeline: CI/CD, Azure DevOps

``

Education:

1. Master of Computer Applications (IT), Chandra Mohan Jha University, India 2013

2. Bachelor of Commerce (Accounts), Ashwin Bhai A Patel Commerce College, India 2010

WORK EXPERIENCE:

Client: Walmart, New Jersey (Remote) June 2023 Present

Data Engineer

Responsibilities:

Implemented Big data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive for ingesting

data from diverse sources and processing Data-at-Rest.

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Used Hadoop technologies like spark and hive Including using the PySpark library to create spark data frames

and converting them to normal panda s data frames for analysis002E`

Played a key role in migrating Hadoop cluster on Azure and defined different read/write strategies.

Developed Spark, Python for regular expression (regex) project in the Hadoop/Hive environment wif

Linux/Windows for big data resources.

Worked with data investigation, discovery and mapping tools to scan every single data record from many

sources.

Imported millions of structured data from relational databases using Sqoop import to process using Spark and

stored the data into HDFS in ORC format.

Executed multiple Spark SQL queries after forming the Database to gather specific data corresponding to an

image.

Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV,

JSON, distributed files.

Implemented data ingestion from various source systems using Sqoop and PySpark.

Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and

Implemented to reprocess the failure messages in Kafka using offset id.

Extract Transform and Load data from Sources Systems to Azure Data.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and

processing the data in in Azure Data bricks.

Experienced in designing, developing, and maintaining end-to-end ETL workflows using Pentaho Data

Integration. Skilled in extracting, transforming, and loading large datasets from diverse data sources

(databases, APIs, flat files, cloud platforms) into data warehouses and analytical systems.

Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.

Analysed existing systems and propose improvements in processes and systems for usage of modern

scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.

Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.

Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure

Cloud.

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data

ingestion and transformation in Azure.

Developed scalable data pipelines using Databricks and Apache Spark technologies.

Collaborated with data scientists to optimize machine learning models on the Databricks platform.

Implemented data lake solutions leveraging Delta Lake for efficient data storage.

Automated ETL processes to enhance data processing efficiency and reliability.

Led cross-functional teams in deploying data-driven solutions across departments.

``

Conducted workshops to train teams on advanced Databricks functionalities and best practices.

Integrated Databricks with cloud services like AWS, Azure, or Google Cloud.

Analyzed large datasets to extract actionable insights and drive business decisions.

Designed and implemented real-time data streaming applications using Databricks.

Mentored junior engineers in data engineering and Databricks platform usage.

Development of web applications using Python frameworks such as Django.

Extensive working knowledge in the entire lifecycle of the projects including Design, Development and

Deployment, Testing and Implementation and support.

Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.

Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau

data analytics and reporting.

Create Tables, Function, View, Trigger, Index, SP, wring some complex query, make complex join, query

optimization etc.

Client: HCL Technologies, TX (Remote) Feb 2020 - May 2023

ETL Developer / Data Engineer

Responsibilities:

Creation, manipulation and supporting the SQL Server databases.

Involved in the Data modelling, Physical and Logical Design of Database

Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database

objects to obtain the required results.

Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)

Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.

Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc. to

SQL Server using SSIS/DTS using various features like data conversion etc. Also Created derived columns from

the present columns for the given requirements.

Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating

different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub

reports, and formatting them.

Created Spark applications with PySpark and utilized Python for data engineering in the Spark framework.

Created logging for ETL load at package level and task level to log number of records processed by each

package and each task in a package using SSIS.

Developed, monitored, and deployed SSIS packages.

Worked on all types of report types like tables, matrix, charts, sub reports etc.

Experienced on designing, implementing, and managing data solutions on the Azure platform.

Experienced on ingest data from various sources such as databases, files, APIs, and streaming platforms into

Azure. I use Azure Data Factory to orchestrate the extraction, transformation, and loading (ETL) of data from

an on-premises SQL Server database to Azure Blob Storage.

Implemented on data transformation and processing tasks to prepare the data for analysis and reporting.

Azure Databricks and Azure Synapse Analytics I used for large-scale data transformations using tools like

Apache Spark.

I implemented processing of streaming data from sources like IoT devices, social media feeds, or log files and

configure and optimize streaming pipelines to extract insights from live data streams.

Data Governance and Security: I set up data access controls, implement encryption, monitor data quality, and

establish data retention policies.

Proficient in creating transformation and job workflows, implementing data cleansing, aggregation, and

validation logic, and scheduling automation using Pentaho Scheduler. Hands-on experience with performance

tuning, error handling, and logging mechanisms to ensure high data quality and efficient processing.

Designed and implemented ETL pipelines using Pentaho Data Integration (PDI) to extract, transform, and load

large datasets from multiple sources into enterprise data warehouses.

``

Client: Abbott, IL (Remote) July 2018 - Jan 2020

MSBI Developer / Data Engineer

Responsibilities:

Designed SSIS packages to Extract, Transfer and load the (ETL) existing data into SQL server from different

environments for the SSAS cubes.

Created logical and physical designs of the database and ER Diagrams for Relational and Dimensional databases

using Erwin.

Developed entire frontend and backend modules using Python on Django Web Framework.

Extracted data from relational databases Oracle and flat files.

Developed complex transformations, Mapplets using Informatica Power Centre 8.6.1 to Extract, Transform and

load data into Operational Data Store (ODS).

Lead, created and launched new automated testing tools and accelerators for SOA services and data driven

automation built within our practice.

Designed complex mappings using Source Qualifier, Joiners, Lookups (Connected and Unconnected) and

Expression, Filters, Router, Aggregator, Sorter, Update Strategy, Stored procedure and Normalizer

transformations.

Ensured the data consistency by cross-checking sampled data upon migration between the database

environments.

Experience of using Python modules like NumPy, Matplotlib, Pickle, Pandas, SciPy, wxPython, PyTables, PyQt,

etc., for

Built Spark applications using PySpark, and used Python programming languages for data engineering in Spark

frame work.

Created Sessions, Sequential and Concurrent sessions for proper execution of mappings in workflow manager.

Provided SSRS and SSIS support for internal IT projects requiring report developments.

Designing and implementing scalable and secure data processing pipelines using Azure Data Factory, Azure

Databricks, and other Azure services.

PySpark-based pipelines were created utilizing spark data frame operations to load data into Data Lake, using

EMR for job execution and AWS S3 for storage.

Managing and optimizing data storage using Azure Data Lake Storage, Azure SQL Data Warehouse, and Azure

Cosmos DB.

Developing data modeDeveloped high-performance and scalable backend services using Python and FastAPI, improving

response times by 40%. ls and maintaining data architecture to support data analytics and business intelligence

reporting.

Ensuring data quality and consistency through data cleaning, transformation, and integration processes.

Monitoring and troubleshooting data-related issues within the Azure environment to maintain high availability

and performance.

Developed high-performance and scalable backend services using Python and FastAPI, improving response

times by 40%.

Implementing data security measures, including encryption, access controls, and auditing, to protect sensitive

information.

Utilized NumPy and Pandas for advanced data cleaning, transformation, and feature engineering tasks to

prepare data for analytics and machine learning workflows.

Developed and optimized fact tables and dimension tables based on business requirements, ensuring accurate

and efficient data modeling for reporting and analytics.

Implemented Star Schema and Snowflake Schema designs to support BI tools and performance-optimized

querying in the data warehouse.

Automating data pipelines and workflows to streamline data ingestion, processing, and distribution tasks.

Utilizing Azure's analytics services, such as Azure Synapse Analytics, to provide insights and support data-

driven decision-making.

Providing guidance and support for data governance, including metadata management, data lineage, and data

cataloging.

``

Client: One Main Financial, IN Oct 2017 - June 2018

Data Engineer/Big Data Engineer

Responsibilities:

Handson experience on ingest data from various sources such as databases, files, APIs, and streaming

platforms into Azure. I use Azure Data Factory to orchestrate the extraction, transformation, and loading (ETL)

of data from an on-premises SQL Server database to Azure Blob Storage.

I worked on data transformation and processing tasks to prepare the data for analysis and reporting. Azure

Databricks and Azure Synapse Analytics I used for large-scale data transformations using tools like Apache

Spark.

Experience on Data Storage and Management using Azure SQL Database or Azure Cosmos DB for structured

data, Azure Data Lake Storage for big data and unstructured data, and Azure Blob Storage for file storage.

I am implementing data warehousing solutions using Azure Synapse Analytics (formerly Azure SQL Data

Warehouse) It offers scalable and distributed analytics capabilities for handling large datasets.

I collaborate with data scientists and analysts to design and implement data models for analytical workloads. I

used Azure Analysis Services or Azure Databricks can be used for building and deploying models to support

interactive data analysis.

I implemented processing of streaming data from sources like IoT devices, social media feeds, or log files and

configure and optimize streaming pipelines to extract insights from live data streams.

Experience on Data Governance and Security, I set up data access controls, implement encryption, monitor

data quality, and establish data retention policies.

Automating data pipelines, monitoring data processes, and managing deployments. They may use tools like

Azure DevOps, Azure Monitor, and Azure Automation to achieve continuous integration and delivery.

Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV,

JSON, and distributed files.

Developed and maintained web applications using Python and Django framework, following MVC architecture

principles.

Implemented data ingestion from various source systems using Sqoop and PySpark.

Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and

Implemented to reprocess the failure messages in Kafka using offset id.

Reviewed Kafka cluster configurations and provided best practices to get peak performance.

Extract Transform and Load data from Sources Systems to Azure Data.

Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and

processing the data in in Azure Data bricks.

Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.

Automated data quality checks and validation processes within Pentaho to maintain high data accuracy and

consistency.

Collaborated with data analysts and BI teams to deliver reliable datasets for dashboards, KPIs, and ad-hoc

analysis.

Developed web applications using Python and Flask, focusing on efficient ORM with SQL Server databases via

SQLAlchemy.

Analysed existing systems and propose improvements in processes and systems for usage of modern

scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.

Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data

ingestion and transformation in Azure.

Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.

Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau

data analytics and reporting.



Contact this candidate