A. SUMMARY OF PROFESSIONAL EXPERIENCE
Data Engineer & Analyst with around 5 years of IT experience, specializing in data migration, integration, and orchestration. I am skilled in business requirements gathering, data warehousing, and data modeling, with strong expertise in evaluating data sources and translating requirements into actionable technical specifications and application designs.
I gained strong hands-on expertise in AWS Glue, Azure Data Factory, Databricks, Talend, Data Lake, Hive, AWS S3, ESP, and CA Workstation, with proficiency in SQL, Python, Java, and Power BI, and solid experience as a Data Warehouse Developer.
Experienced in working with legacy systems such as Mainframe, with strong expertise in copybooks, proficiency in the Power Exchange tool, and the ability to interpret and leverage data mappers effectively.
Skilled in designing and automating end-to-end data pipelines using Talend workflow orchestrator.
Proficient in managing Mainframe binary files and integrating them into the silver layer of Snowflake using Talend.
Developed various types of data mappers for processing semi-structured files using Talend.
In-depth experience working with master and transactional datasets, including creating deltas using hash functions for efficient data processing.
Experience in analyzing, designing, and developing ELT strategies and processes, as well as writing ELT specifications for enterprise solutions.
Strong experience in automating end-to-end data pipelines using AWS Glue workflows and job triggers for seamless orchestration.
Skilled in troubleshooting and fine-tuning long-running AWS Glue jobs, optimizing Spark configurations for performance.
Expertise in Python, PySpark, and SQL on the AWS Cloud platform, with good understanding of modern Data Warehouses such as Snowflake.
Implemented SCD1 and SCD2 using hash functionality and created a dimension framework.
Developed complex SQL queries and optimized stored procedures in Snowflake to build scalable ELT pipelines, ensuring efficient transformations, joins, and aggregations across large datasets.
Strong experience working with Hive for advanced data analysis and query optimization.
Hands-on experience building data engineering applications with Python and Pandas for data extraction, transformation, cleansing, and analysis of large datasets.
Practical experience with AWS DataSync, AWS IAM, and AWS S3 for secure storage, data migration, and logging.
Proficient knowledge and hands on experience in writing UNIX job for ESP.
Strong understanding of Databricks as a distributed computer platform on cloud storage, with experience integrating HDFS and relational databases to enable seamless data processing across modern Lakehouse and legacy systems.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats and API’s for analysing & transforming the data to uncover insights into the data patterns.
Hands on experience on Unified Data Analytics with Databricks, Databricks workspace user interface, and Managing Databricks Notebooks.
Integrated Airflow with cloud and big data platforms (AWS S3, Azure Databricks, Snowflake) to create scalable, end-to-end data workflows supporting analytics and reporting.
Performed ETL on data from different source systems to Azure Data Storage services using Azure Data Factory and Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL) and processing.
Created ETL and orchestrated ADF pipelines with triggers, parameters, and dynamic content for reusable, scalable, and schedule-based data integration across cloud and on-premises systems.
I have experience in Azure tools such as Azure Data Lake, Azure Data Bricks, Azure Data Factory, Azure SQL Server, Azure DevOps.
Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python technologies.
Adequate knowledge and working experience in Agile and Waterfall Methodologies.
Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
Done POC in adopted technologies like AWS Glue Jobs, AWS Redshift, AWS Aurora, AWS Athena, CloudWatch, Power BI and Gitlab.
Have good interpersonal, communication skills, strong problem-solving skills, explore/adopt to new technologies with ease and a good team member.
Technical Skills
Languages
SQL, Python, Core Java
Databases
MySQL, SQL Server, PostgreSQL
Cloud Platforms
Azure, AWS
Data Warehouse
Snowflake, Databricks, Data Warehouse concepts in Azure Synapse & AWS Redshift, Data Lakehouse
ETL Tools
AWS Glue, Databricks, Airflow, Azure Data Factory, Microsoft Fabric, Talend
Reporting Tool
Power BI
Version Control
GIT
Certifications:
Snowflake Snowpro Core (SNOW00319699)
Microsoft DP-600 prep: Fabric Analytics Engineer Associate (Udemy)
B. DETAILS OF PROFESSIONAL HISTORY
Project #1:
Client: TRUIST Jan 2024 – Till date)
Data Engineer
Technologies: AWS Glue, Snowflake, informatica, power exchange tool, MY SQL, SQL Server, python, AWS S3, AWS data sync and power BI.
Responsibilities:
Designed and developed a reusable AWS Glue ELT framework to extract and ingest large datasets from diverse sources—including relational databases, mainframes, and flat files—into Snowflake.
Implemented both full snapshot and incremental data loading strategies in Glue to Snowflake, improving data accuracy by 98% and enabling real-time reporting for 50+ business users.
Executed the data extraction process using Glue jobs with raw file ingestion, followed by standardization and tokenization to ensure high data quality and compliance with governance standards.
Designed and implemented Talend Data Mappers to transform complex JSON documents into structured tabular formats, enabling seamless ingestion into the Snowflake Silver layer.
Developed Talend Data Mapper jobs to parse XML files using XSD schemas, converting them into CSV format for accurate validation and ingestion.
Hands-on experience in tuning Talend jobs for optimized performance and minimal runtime.
Performed advanced data transformations using Talend and SQL to integrate multi-source data into centralized repositories supporting enterprise analytics and BI reporting.
Processed large volumes of structured data using Talend and SQL, significantly improving the analytical capabilities of business teams.
Hands-on experience in tuning AWS Glue jobs and optimizing Spark configurations for high performance and minimal runtime.
Ensured data quality and consistency by implementing Glue ETL validation rules, reconciliation checks, and robust exception handling throughout the pipeline.
Configured and scheduled Glue Workflows and Triggers and monitored execution using CloudWatch Logs to ensure timely and reliable data delivery.
Performed advanced data transformations using Glue (PySpark) and SQL to integrate multi-source data into centralized repositories supporting enterprise analytics and BI reporting.
Conducted data aggregation and validation in Snowflake’s gold layer using Python, delivering clean and actionable insights.
Developed structured data interfaces via Glue Catalog for downstream analytics, reporting, and business intelligence platforms.
Created large, consolidated datasets by combining individual data sources using SQL joins (inner, left, right, full) and implemented sorting and merging techniques.
Created and maintained database objects (tables, views, indexes, constraints, stored procedures) using SQL to support scalable data pipelines.
Tuned and optimized SQL queries, views, and stored procedures to enhance processing speed and meet dynamic reporting requirements.
Authored detailed technical documentation covering Glue pipeline architecture, data mappings, transformations, scheduling logic, and recovery strategies.
Prepared comprehensive ETL design documentation outlining database structures, change data capture methods, error handling, restart logic, and refresh cycles.
Processed large volumes of structured and semi-structured data using Glue (PySpark) and SQL, significantly improving analytical capabilities for business teams.
Translated complex business requirements into logical and physical data models, enhancing data accessibility and reducing OLAP query processing times by 40%.
Provided architectural leadership in designing Big Data Analytics solutions using AWS Glue, driving Proof of Concept (POC) and Proof of Technology (POT) initiatives to implement scalable big data platforms.
Project #2:
Client: United State Cold Storage Aug 2021 – July 2022)
Data Engineer
Technologies: Azure Databricks, Apache airflow, Azure blob, Azure delta lake, Delta tables, API’s, flat File’s
Responsibilities:
Hands-on experience on Unified Data Analytics with Databricks, Databricks workspace user interface, and managing Databricks Notebooks.
I am responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster.
Diagnosed and resolved pipeline failures (e.g., OOM) by performing root cause analysis, optimizing resource allocation, and implementing performance tuning strategies to ensure reliable data processing and system stability.
Expertise in establishing connectivity from Databricks to Azure Blob Storage and Data Lake Gen2.
Analyzed data where it lives by mounting Azure Data Lake and Blob to Databricks.
Designed and implemented DAGs in Apache Airflow to orchestrate complex ETL pipelines, ensuring seamless data ingestion, transformation, and loading across multiple systems (Databricks, Snowflake, SQL).
Automated pipeline scheduling and dependency management using Airflow, reducing manual intervention and improving data pipeline reliability and timeliness.
Monitored and optimized Airflow workflows by leveraging task retries, logging, and alerting, resulting in improved system resilience and faster incident resolution.
Performed ETL operations in Azure Databricks by connecting to different relational database source systems using JDBC connectors.
Data processing in Azure Databricks after being ingested into one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL).
Developed Databricks ETL pipelines using notebooks, Spark Data Frames, Spark SQL, and Python scripting.
Created Databricks notebooks which extract and transform data as per business logic and store it back to Azure Data Lake in Delta format.
Created Databricks Notebooks to streamline data and curate data for various business use cases.
Created several Databricks Spark jobs with PySpark to perform several table-to-table operations.
Developed PySpark, Python for regular expression (regex) in Databricks.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Used Databricks utilities called widgets to pass runtime parameters in notebooks.
In the Databricks environment, Spark SQL and Spark scripts (PySpark) are used to validate the monthly account-level customer data. Also created Spark clusters and configured high-concurrency clusters to speed up preparation of high-quality data.
Worked on performance optimization in choosing Databricks clusters in batch jobs.
Working experience on Azure Databricks clouds to organize the data into notebooks and make it easy to visualize data using dashboards.
Project #3:
Cognizant Technology Solutions Aug 2019 – July 2021)
Azure Data Engineer
Technologies: SQL, Python, Azure Data Lake, Azure Data factory, Azure Active Directory, Key Vault, SQL server, Netezza, Snowflake and Azure DevOps.
Responsibilities:
Maintained high-quality reference data in source systems by performing data cleaning, transformation, and integrity checks in a relational database environment in close collaboration with business stakeholders.
Hands-on experience with Azure services including Data Factory, Data Lake, Storage Accounts, and Azure SQL Database.
Designed and developed ADF pipelines for both full and incremental data loads from a variety of source systems.
Extracted, transformed, cleansed, and validated data from heterogeneous source systems before loading into target environments.
Created and configured Linked Services in Azure Data Factory to connect with various source and target systems.
Designed reusable and parameterized Datasets and Linked Services using pipeline parameters and variables to enable flexible and scalable data integration.
Worked extensively in Snowflake to load and manage batch data pipelines from diverse sources.
Applied Snowflake’s data warehousing capabilities to organize and structure logic for enhanced data clarity and downstream analytics.
Built robust data flows to move data from Azure Data Lake to Snowflake using ADF, ensuring secure, efficient, and scheduled transfers.
Developed, tested, and documented end-to-end data warehouse processes and tasks for consistency, reliability, and team collaboration.
Actively contributed to the creation of technical and functional documentation supporting pipeline design, data flows, and architecture.
Authored comprehensive documentation detailing database schema, table definitions, relationships, and business rules—providing a reference for team members and onboarding new developers.
Get an opportunity to learn about GCP and its infrastructure.
C. EDUCATION
Master’s in computer science, University of Alabama at Birmingham, December 2023.
B-Tech in Electronic and communication engineering, KL University, Vijayawada, April 2020.