Data Engineer Senior

Location:

Pittsburgh, PA

Posted:

February 26, 2025

Contact this candidate

Resume:

Full Name CHARAN DULIPALA

Email: ***************@*****.***

LinkedIn : linkedin.com/in/dulipala-charan-13006a256

PH: 904-***-****

Senior Data Engineer

Professional Summary

Data Engineer with 9+ years of experience in building data-intensive applications and creating pipelines using Python with good experience on Amazon Web Services (AWS) and Azure.

Good experience in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Data Transformation Services (DTS).

Hands-on technical experience in Python, Java, DB2, SQL, and R programming.

Good experience with Python libraries such as Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn to analyze the insights of data and perform data cleaning, data visualization, and building models.

Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, and Kafka.

Experience in evaluating the technology stack for building Cloud-Based Analytics solutions by conducting research and identifying appropriate strategies, tools, and methodologies for developing end-to-end analytics solutions and assisting in the development of a technology roadmap for Data Ingestion, Data Lakes, Data Processing, and Data Visualization.

Experience with Amazon Web Services (Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SQS, AWS Identity and access management, Amazon SNS, AWS Cloud Watch, Amazon EBS, Amazon CloudFront, VPC, DynamoDB, Lambda and Redshift)

Good experience in developing AWS centralized logging solutions for security teams to consolidate AWS logs and analyze them to detect incidents, using elastic search, EC2 server logs.

Hands-on experience on MS Azure Cloud suite: Azure SQL Database, Azure Data Lake (ADL), Azure Data Factory (ADF), Azure SQL Data Warehouse, Azure Service Bus, Azure Key Vault, Azure Analysis Service (AAS), Azure Blob Storage, Azure Search, Azure App Service, and Azure Data Platform Services.

Good experience on Data Modelling (Dimensional and Relational) concepts like Star-Schema Modelling, Snowflake Schema Modelling, and Fact and Dimension Tables.

Experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications Django and Flask.

Working knowledge on Kubernetes to deploy scale, load balance, and manage Docker containers

Experience in Database Design and development with Business Intelligence using SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema and Snowflake Schema.

Experience working on Data Ingestion to Azure Services and processing the data in Azure Databricks.

Expertise in creating and enhancing CI/CD pipelines to ensure Business Analysts can build, test, and deploy quickly.

Extensive knowledge on Exploratory Data Analysis, Big Data Analytics using Spark, Predictive analysis using Linear and Logistic Regression models and a good understanding in supervised and unsupervised algorithms.

Experienced Data Engineer with expertise in Quicksight, Power BI, and Tableau for creating impactful data visualizations and dashboards to drive insights and decision-making.

Analyzed the requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and State Machine Diagrams.

Strong understanding of software development life cycle (SDLC) and Agile methodologies, with familiarity with Git and Jira for version control and project management

T echnical Skills:

BigData/Hadoop Technologies

Map Reduce, Spark, SparkSQL, Spark Streaming, Kafka, PySpark,,Pig, Hive, HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Languages

Python, R, SQL, Java, Scala, Javascript

NO SQL Databases

Cassandra, HBase, MongoDB

Web Design Tools

HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Azure Data bricks, Eclipse, NetBeans.

Public Cloud

EC2, IAM, S3, Auto scaling, Cloud Watch, Route53, EMR, Redshift

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools

Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.

Databases

Microsoft SQL Server, MySQL, Oracle 11g, 12c, DB2, Teradata, Netezza

Operating Systems

All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

Professional Experience

SENIOR DATA ENGINEER

Master Card, Arlington, VA March 2024 to Present

Responsibilities: -

Building and maintaining production-level data engineering pipelines to optimize ETL/ELT tasks for payment transactions, fraud detection, and compliance reporting. Sources include Azure SQL, Azure Data Factory, Blob Storage, Azure SQL Data Warehouse, and Azure Data Lake Analytics.

Utilized Python to create DAGs in Apache Airflow, managing and scheduling data workflows for real-time payment processing and settlement systems.

Extracted and loaded transaction data using SHIR (Self-Hosted Integration Runtime) in Azure Data Factory, connecting to on-premises payment processing systems and legacy databases.

Extensive expertise in designing and implementing data integration and migration solutions for high-volume payment transactions using Microsoft Azure and SDLC methodologies, including Agile.

Deep understanding of the Azure Data Engineering stack, enabling secure and compliant data processing in accordance with PCI DSS and regulatory standards.

Developed a framework for Azure Blob Storage to automate the generation of new snapshots and deletion of old snapshots, implementing lifecycle policies for backing up payment transaction logs and audit trails from Delta Lakes.

Designed and implemented enterprise data warehouses for payment data analytics using SQL Server and Azure Synapse Analytics, ensuring high availability and reliability.

Led the ETL development strategy, aligning with business requirements for fraud risk analysis, chargeback reporting, and transaction monitoring.

Created complex stored procedures, functions, views, CTEs, and indexes, optimizing slow-running queries for real-time payment authorization and settlement processing.

Expert in OLTP and OLAP data models, ensuring efficient transaction processing and analytical insights for merchant performance analysis and compliance auditing.

Leveraged Apache Spark (Spark SQL, Spark DataFrames) for processing large-scale payment data sets, enhancing data transformation and enrichment processes.

Implemented Lakehouse architecture using Azure Databricks and Delta Lake, improving payment data accessibility, governance, and scalability.

Developed ETL pipelines using Azure Databricks and Apache Spark, integrating payment gateway data, merchant transaction logs, and fraud detection models.

Designed schemas and data models for payment data warehouses to improve reconciliation, settlement, and chargeback resolution accuracy.

Automated data quality checks to detect anomalies in transaction data, reducing fraud and ensuring regulatory compliance.

Collaborated with stakeholders in risk, fraud, and compliance teams to develop tailored data solutions for AML (Anti-Money Laundering) and fraud prevention.

Led the implementation of a new payment data warehousing solution using Azure Databricks, enhancing performance, efficiency, and scalability.

Designed scalable data storage solutions in Azure, including Azure Data Lake Storage and Azure Synapse Analytics, ensuring compliance with financial regulations.

Ingested payment transaction data from on-premise SQL databases to Azure Synapse Analytics and Azure SQL DB, supporting real-time analytics and reporting.

Developed control flow structures for multi-dataset pipelines, optimizing batch processing for high-volume payment transaction datasets.

Created stored procedures for full and historical payment data loads, ensuring data completeness for regulatory audits.

Successfully developed POCs (Proof of Concepts) in a development database to benchmark ETL loads for payment processing efficiency.

Worked cross-functionally with engineering, compliance, and fraud teams to ensure timely and cost-effective delivery of projects, improving customer satisfaction.

Utilized DAX functions (Aggregate, Logical, Counting, and Time Intelligence) for advanced financial and transaction modeling in Power BI.

Implemented Power BI filtering strategies, including Page Level, Report Level, and Visual Level Filters, for merchant performance reporting.

Developed Power BI dashboards for transaction trends, chargeback analysis, fraud detection, and merchant performance monitoring using matrix tables, cards, and pie charts.

Utilized Import, Live, and Direct Query Data Connectivity modes in Power BI, optimizing real-time data access for payments analytics.

Created Power BI Premium workspaces and scheduled dataset refreshes, ensuring up-to-date insights into payment trends and fraud detection.

Managed relationships in Power BI to streamline transaction linkages across payment entities.

Developed time-series analysis with DAX in Power BI, improving fraud detection and revenue forecasting models.

Implemented Row-Level Security (RLS) in Power BI, restricting access based on user roles, ensuring compliance with PCI DSS.

Published Power BI reports for finance, risk, and compliance teams, enabling real-time decision-making for payment risk mitigation.

Environment: Microsoft Azure, Data Lakes, Azure Data Lake, Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure SQL, Azure functions, Azure Streaming Analytics, Azure Event Hubs, Azure DevOps, Azure logic apps, Data Warehousing, Data Frames, ETL Tools, PY Spark, Spark SQL, Spark Data Frames, Azure HDInsight, Agile Methodologies, Power BI.

AWS Data Engineer

Centene Corporation, St Louis, MO October 2022 to February 2024

Developed Data Ingestion Modules using AWS Glue to extract and load data into AWS S3 from APIs, ensuring seamless data integration for healthcare analytics.

Created custom ETL scripts using PySpark to handle complex data transformations on Amazon EMR clusters, optimizing healthcare claims processing and patient data analysis.

Implemented a schema-on-read approach using Amazon Athena to query healthcare records stored in S3, enabling efficient clinical and operational data analysis.

Utilized AWS Redshift for large-scale healthcare data analytics, optimizing query performance and scaling clusters based on workload demands.

Automated healthcare data pipelines using Apache Airflow and Python, improving data processing for electronic health records (EHRs), claims, and compliance reporting.

Optimized big data processing performance on AWS EMR, distributing workloads across nodes to enhance population health analytics and predictive modeling.

Configured and monitored real-time data pipelines using AWS CloudWatch, ensuring timely data processing for regulatory compliance, such as HIPAA and HL7 standards.

Developed AWS Lambda functions integrated with EventBridge for real-time healthcare event processing, including patient monitoring and alert systems.

Designed interactive healthcare dashboards using AWS QuickSight, providing visual insights into patient outcomes, hospital performance, and medical claims.

Implemented CI/CD pipelines using GitHub and Jenkins, automating healthcare data application deployments while maintaining compliance with industry security protocols.

Secured Protected Health Information (PHI) using AWS Secrets Manager, ensuring compliance with HIPAA, GDPR, and HITRUST regulations.

Applied Agile methodology in healthcare IT projects, facilitating sprint planning, daily stand-ups, sprint reviews, and retrospectives for data-driven clinical decision-making solutions.

Environment: AWS Glue, AWS S3, Amazon EMR Clusters, Athena, Airflow, AWS CloudWatch, AWS Lambda Functions, AWS Quicksight, Event bridge, AWS Secret Manager, GitHub, Jenkins.

Data Engineer

State of MD, Maryland, June 2020 to September 2022

Implemented Responsible AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, and Auto scaling groups, Optimized volumes and EC2 instances.

Wrote Terraform templates for AWS Infrastructure as a code to build staging, and production environments & set up build & automation for Jenkins.

Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persist the data in HDFS.

Designed and Developed Real-Time Stream Processing Application using Spark, Kafka, Scala, and Hive to perform Streaming ETL and apply Machine Learning.

Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server using Python.

Utilized S3 bucket and Glacier for storage and backup on AWS.

Using Amazon Identity Access Management (IAM) tool created groups & permissions for users to work collaboratively.

Implemented /set up continuous project build and deployment delivery process using Subversion and Tomcat.

Day-to-day responsibility includes developing ETL Pipelines in and out of the data warehouse, and developing major regulatory and financial reports using advanced SQL queries in Snowflake.

Implement Time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.

Subscribing the Kafka topic with Kafka consumer client and processing the events in real-time using spark. Designed Kafka producer client using Confluent Kafka and produced events into Kafka topic.

Environment: Python, PySpark, AWS, ETL, Airflow, SQL, UNIX, NoSQL, Sqoop, Pig,

ETL / Data Engineer

Lululemon, Seattle, WA February 2018 to May 2020

Responsibilities:

Reviewed functional and non-functional requirements, identifying gaps before designing DataStage ETL jobs. Participated in design discussions to ensure alignment with business objectives.

Wrote complex SnowSQL queries for business analysis and reporting in the Snowflake cloud data warehouse. Implemented zero-copy cloning for efficient data replication.

Developed ETL pipelines using IBM InfoSphere DataStage, DBT (Data Build Tool), and Snowpipe for continuous data ingestion from AWS S3 into Snowflake.

Converted SAS code logic into ETL processes using DataStage. Led cloud data migration initiatives, moving on-premises data to Snowflake and optimizing transformations using DBT.

Followed Agile methodologies, participating in sprint planning, stand-ups, and retrospectives. Managed version control and collaborative development using GitLab.

Reviewed and analyzed system specifications, assessing the impact of changes on ETL applications and processes. Managed environment variables and configurations in DataStage.

Automated ETL workflows with Unix shell scripting. Scheduled DataStage and DBT jobs using Control-M for workload automation.

Developed Python scripts to validate and compare data between Datamarts and spreadsheets, ensuring data accuracy and integrity.

Environment: DBT, AWS S3, EC2, Snowflake, JIRA, IBM Infosphere DataStage11.7, DB2, Control-M tool, Python Scripting, UNIX scripting, GitLab, IBM UCD.

FedEx, Hyderabad, India

Role: Data Analyst July 2015 to November 2017

Involved in Requirements Analysis and design an Object-oriented domain model.

Worked on data analysis, data profiling, source to target mapping, Data specification document for the conversion process.

Worked with system architects to create functional code set cross walks from source to target systems.

Created Excel charts and pivot tables for the Ad-hoc Data pull.

Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.

Created Column Store indexes on dimension and fact tables in the OLTP database to enhance read operation.

Involvement in the detailed Documentation, written functional specifications of the module.

Wrote ETL transformation rules to assist the SQL developer.

Periodically interacted with Business and the Configuration teams to gather requirements, address design issues and made data driven decision and proposed solutions.

Performed component integration testing to check if the logics had been applied correctly from one system to other system.

Built reports and report models using SSRS to enable end user report builder usage.

Implemented SQL functions to receive user information from front end C# GUIs and store it into database

Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.

Environment: Hadoop, SQL, SSRS, SSIS, OLTP, PL/SQL, Oracle, Log4j, ANT, Clear-case, Windows.

Contact this candidate