Jay M. Trivedi
***.***********@*****.***
Summary:
Data engineer with over 8 years of hands-on experience in managing data platforms on AWS and Azure and ETL.
Worked with various companies to develop and maintain their data architectures, ensuring the reliability and
efficiency of their data systems.
Developed and deployed SSIS packages for seamless data extraction, transformation, and loading (ETL) processes.
Utilized SSAS to design and optimize multidimensional cubes and wrote MDX queries for complex data analysis,
enhancing business intelligence capabilities.
Over the years, I have developed a strong understanding of cloud-based data management, data integration, ETL
processing, data warehousing, and data governance.
Experience working with both AWS and Azure has allowed me to build a diverse skill set, including proficiency in
services such as AWS S3, EC2, EMR, Red-shift, Azure Blob Storage, Azure Data Factory, Azure Data-bricks, and
Azure Synapse Analytics.
Experience with Big Data technologies such as Hadoop, PySpark, Hive, and HDFS.
Proficient in frameworks such as Django, Flask, and FastAPI, with extensive frontend experience in React,
Bootstrap, and Vue.js, adept at building responsive and scalable web applications. Ddesigned and implemented a
bronze, silver, and gold layer testing framework for a large-scale Azure Data Lake.
Created data warehousing solutions that allow organizations to store large amounts of data while ensuring that
data quality is maintained.
Experienced in writing sophisticated Scala and Python scripts for automation and data manipulation, improving
operational workflows.
Strong proficiency in Agile methodologies, Test-Driven Development (TDD), and utilizing tools like JIRA,
Confluence, and Kafka for efficient project management and collaboration.
Advised clients on leveraging messaging MQ and Kafka Connect to build data integration solutions.
Furthermore, I have created data analytics platforms that provide business insights that help organizations make
data-driven decisions, and passionate about working with data and leveraging new technologies to create
solutions that meet the needs of businesses.
Exceptional analytical skills, attention to detail, and the ability to work under pressure have allowed me to excel in
my career as a data engineer.
Technical Skills:
Big Data Platform: HDFS, MapReduce, Hive, Spark, HBase, Pig, Sqoop, Airflow, Kafka, Snowflake, ADLS (Data
Lakes), Blob Storage, Logic App, ADF, Data bricks, Azure EventHub, S3, EMR, Redshift, Pentaho.
Tools: SSIS, SSRS, SSAS, DTS, Informatica 10.4 / 10.5
Cloud: AWS S3, EC2, EMR, Redshift, ETL
Azure Blob Storage, ADL, Data Factory (ADF), Databricks, Deta Lake, Pyspark, Azure Synapse
Scripting Languages: Shell Scripting, Unix Script
Programming Languages: Scala, Python, C#, SQL
RDBMS: MySQL 5.5, MSSQL 2014/2015/2019, Oracle 11G, Redshift, Synapse, Python, SQL, PL/SQL.
IDE s: IntelliJ, Microsoft Visual Studio, Jupyter
Virtual Machines: VMWare, Virtual Box
Operating Systems: Unix, Linux, Windows 7/8/10/11
Data Visualization: Power BI
Pipeline: CI/CD, Azure DevOps
``
Education:
1. Master of Computer Applications (IT), Chandra Mohan Jha University, India 2013
2. Bachelor of Commerce (Accounts), Ashwin Bhai A Patel Commerce College, India 2010
WORK EXPERIENCE:
Client: Walmart, New Jersey (Remote) June 2023 Present
Data Engineer
Responsibilities:
Implemented Big data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive for ingesting
data from diverse sources and processing Data-at-Rest.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Used Hadoop technologies like spark and hive Including using the PySpark library to create spark data frames
and converting them to normal panda s data frames for analysis002E`
Played a key role in migrating Hadoop cluster on Azure and defined different read/write strategies.
Developed Spark, Python for regular expression (regex) project in the Hadoop/Hive environment wif
Linux/Windows for big data resources.
Worked with data investigation, discovery and mapping tools to scan every single data record from many
sources.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and
stored the data into HDFS in ORC format.
Executed multiple Spark SQL queries after forming the Database to gather specific data corresponding to an
image.
Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV,
JSON, distributed files.
Implemented data ingestion from various source systems using Sqoop and PySpark.
Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and
Implemented to reprocess the failure messages in Kafka using offset id.
Extract Transform and Load data from Sources Systems to Azure Data.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in in Azure Data bricks.
Experienced in designing, developing, and maintaining end-to-end ETL workflows using Pentaho Data
Integration. Skilled in extracting, transforming, and loading large datasets from diverse data sources
(databases, APIs, flat files, cloud platforms) into data warehouses and analytical systems.
Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
Analysed existing systems and propose improvements in processes and systems for usage of modern
scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.
Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure
Cloud.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data
ingestion and transformation in Azure.
Developed scalable data pipelines using Databricks and Apache Spark technologies.
Collaborated with data scientists to optimize machine learning models on the Databricks platform.
Implemented data lake solutions leveraging Delta Lake for efficient data storage.
Automated ETL processes to enhance data processing efficiency and reliability.
Led cross-functional teams in deploying data-driven solutions across departments.
``
Conducted workshops to train teams on advanced Databricks functionalities and best practices.
Integrated Databricks with cloud services like AWS, Azure, or Google Cloud.
Analyzed large datasets to extract actionable insights and drive business decisions.
Designed and implemented real-time data streaming applications using Databricks.
Mentored junior engineers in data engineering and Databricks platform usage.
Development of web applications using Python frameworks such as Django.
Extensive working knowledge in the entire lifecycle of the projects including Design, Development and
Deployment, Testing and Implementation and support.
Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau
data analytics and reporting.
Create Tables, Function, View, Trigger, Index, SP, wring some complex query, make complex join, query
optimization etc.
Client: HCL Technologies, TX (Remote) Feb 2020 - May 2023
ETL Developer / Data Engineer
Responsibilities:
Creation, manipulation and supporting the SQL Server databases.
Involved in the Data modelling, Physical and Logical Design of Database
Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database
objects to obtain the required results.
Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)
Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.
Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc. to
SQL Server using SSIS/DTS using various features like data conversion etc. Also Created derived columns from
the present columns for the given requirements.
Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating
different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub
reports, and formatting them.
Created Spark applications with PySpark and utilized Python for data engineering in the Spark framework.
Created logging for ETL load at package level and task level to log number of records processed by each
package and each task in a package using SSIS.
Developed, monitored, and deployed SSIS packages.
Worked on all types of report types like tables, matrix, charts, sub reports etc.
Experienced on designing, implementing, and managing data solutions on the Azure platform.
Experienced on ingest data from various sources such as databases, files, APIs, and streaming platforms into
Azure. I use Azure Data Factory to orchestrate the extraction, transformation, and loading (ETL) of data from
an on-premises SQL Server database to Azure Blob Storage.
Implemented on data transformation and processing tasks to prepare the data for analysis and reporting.
Azure Databricks and Azure Synapse Analytics I used for large-scale data transformations using tools like
Apache Spark.
I implemented processing of streaming data from sources like IoT devices, social media feeds, or log files and
configure and optimize streaming pipelines to extract insights from live data streams.
Data Governance and Security: I set up data access controls, implement encryption, monitor data quality, and
establish data retention policies.
Proficient in creating transformation and job workflows, implementing data cleansing, aggregation, and
validation logic, and scheduling automation using Pentaho Scheduler. Hands-on experience with performance
tuning, error handling, and logging mechanisms to ensure high data quality and efficient processing.
Designed and implemented ETL pipelines using Pentaho Data Integration (PDI) to extract, transform, and load
large datasets from multiple sources into enterprise data warehouses.
``
Client: Abbott, IL (Remote) July 2018 - Jan 2020
MSBI Developer / Data Engineer
Responsibilities:
Designed SSIS packages to Extract, Transfer and load the (ETL) existing data into SQL server from different
environments for the SSAS cubes.
Created logical and physical designs of the database and ER Diagrams for Relational and Dimensional databases
using Erwin.
Developed entire frontend and backend modules using Python on Django Web Framework.
Extracted data from relational databases Oracle and flat files.
Developed complex transformations, Mapplets using Informatica Power Centre 8.6.1 to Extract, Transform and
load data into Operational Data Store (ODS).
Lead, created and launched new automated testing tools and accelerators for SOA services and data driven
automation built within our practice.
Designed complex mappings using Source Qualifier, Joiners, Lookups (Connected and Unconnected) and
Expression, Filters, Router, Aggregator, Sorter, Update Strategy, Stored procedure and Normalizer
transformations.
Ensured the data consistency by cross-checking sampled data upon migration between the database
environments.
Experience of using Python modules like NumPy, Matplotlib, Pickle, Pandas, SciPy, wxPython, PyTables, PyQt,
etc., for
Built Spark applications using PySpark, and used Python programming languages for data engineering in Spark
frame work.
Created Sessions, Sequential and Concurrent sessions for proper execution of mappings in workflow manager.
Provided SSRS and SSIS support for internal IT projects requiring report developments.
Designing and implementing scalable and secure data processing pipelines using Azure Data Factory, Azure
Databricks, and other Azure services.
PySpark-based pipelines were created utilizing spark data frame operations to load data into Data Lake, using
EMR for job execution and AWS S3 for storage.
Managing and optimizing data storage using Azure Data Lake Storage, Azure SQL Data Warehouse, and Azure
Cosmos DB.
Developing data modeDeveloped high-performance and scalable backend services using Python and FastAPI, improving
response times by 40%. ls and maintaining data architecture to support data analytics and business intelligence
reporting.
Ensuring data quality and consistency through data cleaning, transformation, and integration processes.
Monitoring and troubleshooting data-related issues within the Azure environment to maintain high availability
and performance.
Developed high-performance and scalable backend services using Python and FastAPI, improving response
times by 40%.
Implementing data security measures, including encryption, access controls, and auditing, to protect sensitive
information.
Utilized NumPy and Pandas for advanced data cleaning, transformation, and feature engineering tasks to
prepare data for analytics and machine learning workflows.
Developed and optimized fact tables and dimension tables based on business requirements, ensuring accurate
and efficient data modeling for reporting and analytics.
Implemented Star Schema and Snowflake Schema designs to support BI tools and performance-optimized
querying in the data warehouse.
Automating data pipelines and workflows to streamline data ingestion, processing, and distribution tasks.
Utilizing Azure's analytics services, such as Azure Synapse Analytics, to provide insights and support data-
driven decision-making.
Providing guidance and support for data governance, including metadata management, data lineage, and data
cataloging.
``
Client: One Main Financial, IN Oct 2017 - June 2018
Data Engineer/Big Data Engineer
Responsibilities:
Handson experience on ingest data from various sources such as databases, files, APIs, and streaming
platforms into Azure. I use Azure Data Factory to orchestrate the extraction, transformation, and loading (ETL)
of data from an on-premises SQL Server database to Azure Blob Storage.
I worked on data transformation and processing tasks to prepare the data for analysis and reporting. Azure
Databricks and Azure Synapse Analytics I used for large-scale data transformations using tools like Apache
Spark.
Experience on Data Storage and Management using Azure SQL Database or Azure Cosmos DB for structured
data, Azure Data Lake Storage for big data and unstructured data, and Azure Blob Storage for file storage.
I am implementing data warehousing solutions using Azure Synapse Analytics (formerly Azure SQL Data
Warehouse) It offers scalable and distributed analytics capabilities for handling large datasets.
I collaborate with data scientists and analysts to design and implement data models for analytical workloads. I
used Azure Analysis Services or Azure Databricks can be used for building and deploying models to support
interactive data analysis.
I implemented processing of streaming data from sources like IoT devices, social media feeds, or log files and
configure and optimize streaming pipelines to extract insights from live data streams.
Experience on Data Governance and Security, I set up data access controls, implement encryption, monitor
data quality, and establish data retention policies.
Automating data pipelines, monitoring data processes, and managing deployments. They may use tools like
Azure DevOps, Azure Monitor, and Azure Automation to achieve continuous integration and delivery.
Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with CSV,
JSON, and distributed files.
Developed and maintained web applications using Python and Django framework, following MVC architecture
principles.
Implemented data ingestion from various source systems using Sqoop and PySpark.
Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and
Implemented to reprocess the failure messages in Kafka using offset id.
Reviewed Kafka cluster configurations and provided best practices to get peak performance.
Extract Transform and Load data from Sources Systems to Azure Data.
Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and
processing the data in in Azure Data bricks.
Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
Automated data quality checks and validation processes within Pentaho to maintain high data accuracy and
consistency.
Collaborated with data analysts and BI teams to deliver reliable datasets for dashboards, KPIs, and ad-hoc
analysis.
Developed web applications using Python and Flask, focusing on efficient ORM with SQL Server databases via
SQLAlchemy.
Analysed existing systems and propose improvements in processes and systems for usage of modern
scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data
ingestion and transformation in Azure.
Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau
data analytics and reporting.