Data Engineer Analyst

Location:

North Carolina

Posted:

March 18, 2025

Contact this candidate

Resume:

Lakshay Aggarwal

Email: *************@*****.***

PH: 614-***-****

Sr Data Engineer (Azure Certified 3x)

Professional Summary

Overall 9+ years of profession experience in Data Systems Development, Business Systems including designing and developing with Data Engineer and Data Analyst.

9+ years of extensive experience in Data Engineering, ETL/ELT pipelines, Data Warehousing, Business Intelligence (BI), and cloud-based data solutions across various industries.

Strong expertise in Databricks platform, including pipeline design, development, and maintenance, cluster administration, workspace management, and performance optimization.

Skilled in configuring and optimizing Databricks clusters and resources, leveraging Databricks monitoring and logging tools, and troubleshooting infrastructure, security, and integration issues efficiently.

Well-versed in Databricks security best practices, including authentication, authorization, encryption, and data governance for building secure and compliant data solutions.

Proficient in Power BI for end-to-end BI solution development, interactive dashboards and reports, row-level security (RLS), and application security layer modeling to ensure user-specific data access.

Experienced in data integration and transformation from multiple sources for comprehensive analytical and reporting solutions.

Expertise in Azure Data Engineering stack and AWS services, including Data Lakes, Redshift, Glue, EMR, and S3 for modern cloud-based data platforms.

Adept at System Administration with capabilities to set up and manage cost-effective infrastructure, ensuring scalability and operational efficiency.

Hands-on knowledge in System Performance Monitoring & Optimization, focusing on resource utilization, pipeline efficiency, and cost control.

Strong focus on Data Security, Governance, and Compliance, implementing best practices for data privacy, lineage, and quality management.

Proficient in Python, SQL, PySpark, and Scala for building robust data solutions and automation frameworks.

Experienced in Business Analysis & Requirement Gathering, translating complex business requirements into scalable technical specifications and data models.

Familiarity with retail domain datasets and workflows (Nice to have), with adaptability to other domains like finance, healthcare, and e-commerce.

Proven collaboration with cross-functional teams (data scientists, analysts, business stakeholders) to deliver insight-driven data products and solutions.

Strong leadership in mentoring, code reviews, architectural decisions, and promoting best engineering practices within teams.

Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP} using cloud native tools such as BIG query, Cloud Data Proc. Google Cloud Storage, Composer.

Proven ability to design, develop, and deploy cloud-native data architectures, leveraging Azure ecosystem, Kubernetes, and big data frameworks.

Skilled in orchestrating large-scale data processing workflows using Databricks, Spark, Python, and SQL to enable real-time analytics.

Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.

Excellent experience in developing and designing data integration and migration solutions in MS Azure.

Experience in big data processing, cloud-based data pipelines, and AI-driven solutions, including 2 years of hands-on experience in Generative AI.

Expertise in building scalable data pipelines using Spark, Kafka, and Airflow, integrating LLMs for AI-powered automation.

Technical Skills

Big Data Technologies: Hive, Apache Spark, HBase, Oozie, MongoDB, Kafka, Databricks, Glue

Programming Languages: Java, Python, PYSpark, SQL, DAX

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 9i/11g, Postgres

Data Modeling Tools: Erwin, Azure Data Catalog, Databricks Delta

BI Tools: Power BI, Tableau, Azure Analysis Services

Cloud Platforms: Microsoft Azure, GCP, AWS, Azure Databricks

Operating Systems: Windows, Linux/Unix

Version Control: Git, Bitbucket

Others: Agile Methodologies, CI/CD, Kafka Streaming, Data Governance, Security Best Practices

Professional Experience

Sr Data Engineer

Bank of America, Charlotte, NC Jan 2024 – Till date

Responsibilities:

Built production-ready batch data pipelines using Azure Data Factory, Azure Data Lake, and Azure Synapse Analytics with PySpark, SQL, and Python.

Optimized Azure Databricks workflows using techniques like Z-Order clustering, partitioning, and adaptive query execution.

Developed and optimized APIs on Azure using Azure API Management and Functions to ensure secure, scalable, and high-performance integrations between applications and data sources.

Implemented Spark Data frames and Spark SQL API for faster and efficient processing of data.

Worked extensively on Azure Data Factory to create batch data pipelines, enabling efficient data integration, transformation, and movement across multiple data sources.

Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.

Developed advanced PySpark solutions in Databricks, enabling large-scale distributed data processing for analytics and reporting.

Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.

Developed and optimized scalable PySpark pipelines, reducing data processing time by 40% for real-time analytics.

Built high-performance Spark SQL transformations, improving query execution efficiency by 30%.

Designed and deployed cloud-native big data solutions using Azure Data Factory, Databricks, and Delta Lake.

Developed Python-based data processing frameworks, streamlining ETL operations across distributed systems.

Created and managed large-scale distributed data processing workflows using PySpark, Airflow, and Kafka.

Engineered automated deployment pipelines for production-ready Spark applications using Kubernetes and Docker.

Developed and deployed RESTful APIs for seamless data integration and high-performance querying.

Optimized SQL queries and Spark jobs, reducing compute costs by 25% and enhancing performance.

Integrated Databricks with Azure Data Factory to orchestrate end-to-end migration workflows, increasing efficiency.

Utilized adaptive query execution and caching in Databricks, replacing PL/SQL-based optimizations for faster analytics.

Enabled real-time data processing in Databricks, replacing batch-based PL/SQL jobs, reducing reporting latency.

Re-engineered ETL logic using Spark SQL and DataFrames, reducing manual intervention by 60% in data transformations.

Conducted performance benchmarking between PL/SQL and Databricks, leading to a 5x increase in query speed.

Migrated business-critical reporting datasets from Oracle PL/SQL to Databricks, ensuring 100% report accuracy.

Developed monitoring dashboards using Databricks and Power BI, ensuring smooth post-migration data integrity checks.

Provided hands-on training and documentation for teams transitioning from PL/SQL to Databricks, enhancing adoption.

Achieved full migration within the planned timeline, eliminating technical debt and modernizing legacy ETL workflows.

Worked in Azure environment for development and deployment of Custom Hadoop Applications.

Created dimensional model based on star schemas and designed them using Erwin.

Involved in analyzing raw files from Azure Data Lake Storage (ADLS) using Azure Synapse Serverless SQL and Azure Data Factory, without loading the data into a database.

Worked with ETL tools to migrate data from various OLAP and OLTP databases to the data mart.

Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Deployed and managed data pipelines and ETL processes on Kubernetes using Helm charts and customized deployments, ensuring high availability, scalability, and fault tolerance

Worked in Azure environment for development and deployment of Custom Hadoop Applications.

Collaborated with cross-functional teams to design and implement data-driven architectures on Azure, improving data reliability and accessibility.

Enhanced data pipeline performance by utilizing Databricks optimization techniques and Delta caching, reducing execution time by 50%.

Developed and deployed Kubernetes operators to automate the management and lifecycle of complex data engineering applications, streamlining operations and reducing manual intervention.

Built Azure Data Warehouse Table Data sets for Power BI Reports.

Designed, developed, and optimized end-to-end Databricks pipelines to process large-scale banking datasets, supporting regulatory and compliance reporting at Bank of America.

Managed and administered Databricks clusters, workspaces, and notebooks ensuring high availability, security, and scalability in a sensitive banking environment.

Configured and fine-tuned Databricks clusters to handle complex banking data workflows efficiently, achieving a 30% improvement in processing time for financial data pipelines.

Monitored Databricks environment performance and resource utilization using native logging tools, proactively identifying and resolving potential bottlenecks impacting critical banking operations.

Implemented security best practices within Databricks for Bank of America, including role-based access control (RBAC), encryption at rest and in transit, and multi-factor authentication to safeguard confidential customer data.

Troubleshot and resolved Databricks infrastructure issues including cluster failures, security vulnerabilities, and integration challenges with banking core systems, reducing incident resolution time by 25%.

Led the integration of Databricks with various banking data sources (Data Lake, SQL Server, and third-party vendors) ensuring seamless data flow and regulatory compliance.

Governed sensitive financial data through Databricks data governance frameworks, ensuring compliance with banking regulations such as GDPR, SOX, and internal audit guidelines.

Sr Data Engineer

Walgreens, Chicago, IL Nov 2022 – Dec 2023

Responsibilities:

Designed, developed, and maintained robust Databricks pipelines for efficient data processing and analytics to support Walgreens' business intelligence and decision-making processes.

Managed and administered Databricks environments (clusters, workspaces, notebooks) to ensure seamless integration and performance, enabling Walgreens to leverage big data insights in real-time.

Optimized Databricks clusters and resources for cost-effective and high-performance data processing, significantly improving Walgreens' operational efficiency.

Configured Databricks clusters to handle massive datasets with minimum latency, ensuring smooth scalability for Walgreens' expanding data needs.

Utilized Databricks monitoring and logging tools to track system performance and identify potential issues, providing continuous improvements for Walgreens' data operations.

Troubleshot Databricks infrastructure issues related to data processing, security, and integration, ensuring uninterrupted data availability for Walgreens' retail and analytics functions.

Implemented security best practices in Databricks for authentication, authorization, and encryption, safeguarding Walgreens' sensitive customer and operational data.

Developed and enforced data governance frameworks within Databricks to comply with regulatory standards, ensuring Walgreens' data assets are secure and well-managed.

Designed and delivered BI solutions using Power BI, transforming raw data into actionable insights to support Walgreens' strategic decision-making in areas like inventory, sales, and customer analytics.

Created dynamic Power BI reports and dashboards for stakeholders across Walgreens' retail and corporate teams, helping drive business intelligence and performance tracking.

Implemented row-level security in Power BI, ensuring that different departments within Walgreens can access only the data relevant to their roles while maintaining security and compliance.

Developed Power BI Analysis Services models to provide efficient querying and reporting solutions, helping Walgreens' teams extract insights from complex datasets.

Optimized Power BI application security models, enforcing a strong security layer to protect sensitive data in the reports and dashboards created for Walgreens.

Integrated data from multiple sources into Power BI, providing Walgreens with a consolidated view of business performance and customer behavior across its retail ecosystem.

Worked closely with Walgreens' data engineering team to ensure that the data models, ETL pipelines, and BI reports align with business objectives and performance goals.

Collaborated with business users at Walgreens to understand their requirements, translating them into technical specifications and developing tailored BI solutions.

Optimized the ETL process using Databricks and Power BI, improving the speed and reliability of data processing and reporting for Walgreens' retail operations.

Automated data pipelines using Databricks and integrated them with Power BI for continuous and real-time reporting, ensuring Walgreens stays ahead in business intelligence.

Conducted end-to-end testing of data solutions in Databricks and Power BI, ensuring accuracy, reliability, and performance for Walgreens' analytics systems.

Mentored junior data engineers at Walgreens, sharing knowledge of best practices in Databricks, Power BI, and security to foster a collaborative and innovative data engineering environment.

Data Engineer

AMGEN INC. Thousand Oaks, California Jun 2020 – Oct 2022

Responsibilities:

Worked as a HIVE team member and involved in design of the High Availability for the Hive Server. Hive Server is the Single Point of Failure and is a Data Warehouse solution for querying and analysis on large sets of Big Data. Involved in the design review of the High Availability (HA) feature on Hive.

Worked as a HBASE team member. HBASE is the column-oriented database that is built over HDFS. Worked together with Apache HBase committers and mentored team members

Involved in Requirement Analysis, design and execution, automation of unit testcases for HDFS, Hive, Map Reduce and H Base in J unit.

Worked on PySpark Data sources, PySpark Data frames, Spark SQL and Streaming using Scala.

Good at Hadoop cluster setup, monitoring, administration.

Resolved all the customer queries related to installation, configuration, administration, etc.

Experience on Non-functional testing tools like Heap Dump Analyzers, Thread Dump Analyzers, GC log Analyzers, Profilers

Design and develop automation frameworks and automation suites using Java, Junit and Ant.

Worked on improving the Performance for many Huawei Hadoop versions.

Good knowledge on Linux commands and scripting.

Contributed the patches in Apache open source for HBase component for major bugs

Participated in product functional reviews, test specifications, document reviews.

Executing the Map Reduce jobs and building data lakes

Involved in converting Hive/SQL queries into PySpark transformations using Spark RDD, python.

Used PySpark SQL to Load JSON data, created Schema RDD, loaded it into Hive Tables, and handled structured data using Spark SQL.

Developed PySpark Programs using python and performed transformations and actions on RDD.

Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.

Used PySpark and Spark SQL to read the parquet data and create the tables in the hive using the python API.

Data Engineer

Whole Foods Market Inc. Austin, TX Mar 2018 - May 2020

Responsibilities:

Analyzing Functional Specifications Based on Project Requirement.

Ingested data from various data sources into Hadoop HDFS/Hive Tables using SQOOP, Flume, Kafka.

Extended Hive core functionality by writing custom UDFs using Java.

Developing Hive Queries for the user requirement.

Worked on multiple POCs in Implementing Data Lake for Multiple Data Sources ranging from Team Center, SAP, Workday, Machine logs.

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

Worked on MS Sql Server PDW migration for MSBI warehouse.

Planning, scheduling, and implementing Oracle to MS SQL server migrations for AMAT in house applications and tools.

Worked on Solr Search Engine to index incident reports data and developed dash boards in Banana Reporting tool.

Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of the organization.

Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.

Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.

Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.

Developed work flows in Live Compare to Analyze SAP Data and Reporting.

Worked on Java development and support and tools support for in house applications.

Participated in daily scrum meetings and iterative development.

Data Engineer

Zensar Technologies Bangalore, India Aug 2015 - Dec 2017

Responsibilities:

Create high- and low-level design documents for the various modules. Review the design to ensure adherence to standards, templates and corporate guidelines. Validate design specifications against the results from proof of concept and technical considerations.

Worked on implementing pipelines and analytical workloads using big data technologies such as Hadoop, Spark, Hive and HDFS.

Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala.

Perform Analysis on the existing source systems, understand the Informatica/Teradata based applications and provide the services which are required for development & maintenance of the applications.

Worked with Google Cloud (GCP) Services like Compute Engine, Cloud Functions, Cloud DNS, Cloud Storage and Cloud Deployment Manager and SaaS, PaaS and IaaS concepts of Cloud Computing and Implementation using GCP.

Coordinate with the Application support team and help them assist understand the business and necessary components for the Integration, Extraction, Transformation and load data.

Analyze and develop Data Integration templates to extract, cleanse, transform, integrate and load to data marts for user consumption. Review the code against standards and checklists.

Create a Deployment document for the developed code and provide support during the code migration phase.

Create Initial Unit Test Plan to demonstrate that the software, scripts and databases developed conforms to the Design Document.

Provides support during the integration testing and User Acceptance phase of the project. Also provide hyper care support post deployment.

Contact this candidate