Data Integration Engineer

Location:

King of Prussia, PA, 19406

Posted:

March 17, 2025

Contact this candidate

Resume:

HARSHAVARDHAN REDDY MEDAM

Philadelphia, PA

Email: ***************@*****.***

Phone: +1-484-***-****

PROFESSIONAL SUMMARY

Results-driven ETL Data Engineer with over 11 years of experience in data integration, Snowflake database management, SQL performance tuning, and Tableau reporting. Expertise in designing, implementing, and optimizing data pipelines to ensure seamless data flow and insightful reporting. Proven ability to collaborate with cross-functional teams to deliver scalable and high-performing data solutions.

ETL Development: Experience in designing, developing, and deploying ETL workflows using Talend Open Studio and Talend Data Integration.

Data Extraction & Transformation: Extracting data from various sources (databases, APIs, files) and transforming it for business needs.

Data Loading: Loading processed data into data warehouses like Snowflake, Redshift, or SQL Server.

Job Orchestration: Designing Talend jobs for batch and real-time processing, handling dependencies, and scheduling execution.

Performance Optimization: Optimizing Talend jobs by using parallel processing, indexing, and efficient lookup strategies.

Error Handling & Logging: Implementing exception handling, logging mechanisms, and monitoring Talend jobs for reliability.

Cloud & Big Data Integration: Experience in integrating Talend with Azure, AWS, Hadoop, and Spark.

REST & SOAP API Integration: Using Talend components to connect and extract data from APIs.

Data Governance & Security: Implementing data quality rules, validation, and compliance using Talend Data Quality.

CI/CD with Talend: Automating deployments and version control using Git, Jenkins, or Azure DevOps.

Proficient in end-to-end data pipeline orchestration, cloud migration, and optimizing data integration processes.

Skilled in leveraging Azure services such as Data Lake, Synapse Analytics, and Key Vault to deliver scalable, secure, and high-performance solutions.

Adept at transforming complex business requirements into actionable insights, with a strong focus on automation, performance tuning, and data governance, development involving building and implementing scalable data intake pipelines leveraging Big Data, Python, Spark, and on-prem to Azure Cloud Solutions.

Proven ability to use Azure Databricks for distributed data processing, transformation, validation, and cleaning while maintaining data quality and integrity.

Developed end-to-end data processes using Azure Logic Apps, Azure Functions, Azure Data Explorer, and serverless technologies.

Proficient in ingesting real-time streaming data using Azure Event Hub.

Skilled in managing and orchestrating complicated data integration and transformation operations using Azure Synapse Pipelines.

Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase, and SQL Server databases.

Proficient in AWS Cloud Platform which includes services like EC2, S3, VPC, ELB, DynamoDB, CloudFront, CloudWatch, Route 53, Security Groups, Redshift, CloudFormation, and more.

Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage.

Proficient in developing robust Data Architectures that drive business intelligence solutions and enable seamless data flow across various systems.

Experienced in maintaining the Hadoop cluster on AWS EMR.

Used Azure DevOps to expedite software delivery by streamlining the procedures for developing and deploying software, including effective collaboration, version control, and automated CI/CD pipelines.

Extensive experience with Terraform across multiple cloud providers, including AWS, Azure, and Google Cloud.

Strong expertise in implementing and maintaining modern Data Architectures that support large-scale data pipelines, ensuring high availability, reliability, and consistency.

Developed ETL jobs using Spark and Scala to migrate data from Oracle to new MySQL tables.

Skilled in scripting languages like Python, PySpark, and Scala, allowing for the easy integration of unique functionality into data pipelines.

Experienced in creating and managing Azure DevOps tools for continuous integration and deployment (CI/CD) pipelines.

Hands-on experience in developing large-scale data pipelines using Spark and Hive.

Utilized Apache Sqoop for importing and exporting data to and from HDFS and Hive.

Proficient in setting up workflows using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.

Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, and other AWS services.

Developed Spark scripts using Scala shell commands as per requirements.

Worked on real-time streaming with Kafka as a data pipeline using Spark Streaming module.

Experienced with Apache Kafka and Azure Event Hubs for messaging and streaming applications using Scala.

Familiarity with data serialization formats, including Parquet, ORC, AVRO, JSON, and CSV.

Optimized Spark jobs and workflows through tuning Spark configurations, partitioning, and memory allocation settings.

Developed Snowflake database applications, improved them, and maintained them. This included creating logical and physical data models and adding any necessary updates and enhancements.

Extensive experience in developing, maintaining, and implementing EDW, Data Marts, ODS, and Data warehouses with Star schema and Snowflake schema.

Hands-on experience with GitHub for code version control.

Well-versed in the Software Development Life Cycle and experienced in Agile Methodology.

Skilled in optimizing query performance in Hive using bucketing and partitioning techniques.

Worked on real-time streaming with Kafka as a data pipeline using Spark Streaming module.

Technical Skills:

Core Skills

Snow flake, Azure Data Lake Storage, Azure Data factory, Azure data bricks, Azure Functions, Azure SQL Database and Managed Instances, Azure Synapse Analytics.

Advanced Skills

Snowflake architecture, Data modeling, Data pipe line management, Data modeling using Snowflake Data Vault methodology, Snow pipe for automated data ingestion,

Advanced Spark Architecture, Spark SQL and Optimization, Advanced Transformations, Custom UDFs (User-Defined Functions), Azure Databricks Workspace, Cluster Management and Optimization, Data Ingestion and Integration, Delta Lake, Job Scheduling and Automation, Databricks REST APIs.

Cloud Environment

Amazon Web Services (AWS), Microsoft Azure, Oracle Cloud, Google cloud platform (GCP).

Azure Services

Azure data Factory, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure Data Explorer, Azure DevOps.

AWS

EC2, EMR, S3, Redshift, Lambda, Kinesis Glue, Data Pipeline, API gateway

Big Data Technologies

Oracle Goldengate, Apache spark, Hive, MapReduce, Teg, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Terraform, Zookeeper

Hadoop Distribution

Cloudera, Horton Works.

Languages

Java, SQL, PL/SQL, Python, HiveQL, Scala.

Web Technologies

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Data Visualization

Tableau, Power BI, Looker.

Version Control

GIT, GitHub, Bitbucket.

IDE & Build Tools, Design

Eclipse, IntelliJ, Visual Studio, Informatica, IICS, CDQ, CAI

Data Bases

Snowflake, Oracle, SQL Server, PostgreSQL, MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Cosmos DB, Big Query.

WORK EXPERIENCE

Project Name

UDH (University Data Hub)

Client

DeVry University Inc.

Role

Senior Azure Data Engineer

Domain

Education

Technologies

Azure Data Factory, Azure Data Bricks, Azure SQL Data Base, Azure Functions, Logic Apps, Azure Data Lake Storage Gen 2, SSMS, MS Power BI, MS Visual Studio Code, Terraform, Kafka, Jenkins, ADF Pipelines, GIT Hub, Tableau.

Data base

Azure SQL DB

Job Scheduler

App Worx

Methodology

Agile

Team Size

UDH, Dallas, TX Jan 2023 - Present

Role: Senior Azure Data Engineer

Description: UDH is the central repository for DeVry University located in the cloud, contains the data from various source systems that are located in cloud and on-premises. UDH data is used by different applications and visual tools to generate the reports. Ability to visualize daily data, weekly data and monthly data on demand.

Responsibilities:

Developed and maintained end-to-end operations of ETL data pipeline, handling large datasets in Azure Data Factory.

Implemented optimized queries and indexing techniques to enhance data fetching efficiency. Wrote SQL queries (DDL, DML) and implemented indexes, triggers, views, stored procedures, functions, and packages.

Build and manage ETL workflows and pipelines to extract data from multiple sources, perform transformations, and load it into desired destinations.

Worked on Azure Data Factory connectors to connect to various data sources and destinations, including on-premises databases, cloud-based storage, software-as-a-service (SaaS) applications, and more. This allows organizations to integrate data from diverse systems and platforms.

Involved in data transformation capabilities to clean, enrich, and shape the data during the ETL process. These transformations can be performed using activities like mapping, filtering, aggregating, joining, and more.

Developed and maintained cloud-based Data Architectures to support real-time analytics, integrating various data sources to ensure comprehensive and timely insights for decision-making.

Designed Azure ETL to handle large volumes of data and supports scalable data processing.

Integrated on-premises (MYSQL, Cassandra) and cloud data (Blob storage, Azure SQL DB) using Azure Data Factory and applied transformations for loading into Snowflake.

Deployed Data Factory to create data pipelines for orchestrating data into SQL databases.

Modeled data in Snowflake using data warehousing techniques, performed data cleansing, managed Slowly Changing Dimensions, assigned surrogate keys, and implemented change data capture.

Utilized Azure Data Factory, Data Lake, Azure Data Explorer and Azure Synapse to solve business problems with an analytical approach.

Developed ELT/ETL pipelines for data movement between Snowflake and Python using Snowflake Snow SQL.

Implemented ETL transformations and validation using Spark-SQL/Spark Data Frames with Azure Databricks and Azure Data Factory.

Collaborated with Azure Logic Apps administrators to monitor and troubleshoot process automation and data processing pipeline issues.

Optimized code for Azure Functions to extract, transform, and load data from diverse sources like databases, APIs, and file systems.

Designed, built, and maintained data integration programs within a Hadoop and RDBMS environment.

Integrated Terraform with continuous integration/continuous deployment (CI/CD) pipelines for seamless automation.

Established a CI/CD framework for data pipelines using the Jenkins tool.

Collaborated with DevOps engineers to develop automated CI/CD and test-driven development pipelines using Azure as per client requirements.

Leveraged scripting languages such as Python and Scala for hands-on programming experience.

Executed Hive scripts through Hive on Spark and Spark SQL.

Ensured data integrity and pipeline stability by collaborating on ETL tasks.

Utilized Kafka and Spark Streaming to process streaming data in specific use cases.

Developed a data pipeline using Kafka, Spark, and Hive for data ingestion, transformation, and analysis.

Led the software engineering lifecycle, including requirements analysis, application design, and code development & testing.

Created efficient Spark core and Spark SQL scripts using Scala for accelerated data processing

Utilized JIRA to report on projects and created sub-tasks for development, QA, and partner validation.

Proficient in Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning.

Key Achievements:

Enhanced reporting accuracy by integrating real-time data feeds into Tableau dashboards.

Optimized Snowflake storage costs by implementing data retention and archiving strategies.

Trained junior team members on Snowflake and Tableau best practices, improving team productivity.

Project Name

The Ban-Mak-Cad bank-CFCPF

Client

The bank of Nova Scotia

Role

Azure Data Engineer

Domain

Banking

Technologies

Azure Data Factory, Azure Data Bricks, Azure SQL Data Base, Azure Functions, Logic Apps, Azure Data Lake Storage Gen 2, SSMS, MS Power BI, MS Visual Studio Code, IntelliJ and Git Hub, Tableau.

Data base

Azure SQL DB

Methodology

Agile

Team Size

The Ban-Mak-Cad bank-CFCPF Hyderabad, India July 2019 – March 2022

Role: Azure Data Engineer

Description: The Bank of Nova Scotia is a leading Bank in Canada and a leading financial services provider in the Americas. We are here for every future. We help our customers, their families and their communities achieve success through a broad range of advice, products and services, including personal and commercial banking, wealth management and private banking, corporate and investment banking, and capital markets.

Responsibilities:

As part of Agile participated in daily stand up and scrum call.

Implemented code and scripts for Acquisition and Transformation from different sources.

Moved all Meta data files generated from various source systems to ADLS for further processing.

Imported data from different sources like ADLSs and BLOB for computation using Spark.

Implemented Spark using Scala in Data Bricks utilizing Data Frames and Spark-SQL API for faster Processing of data.

Used .CSV, Avro, JASON, Parquet and ORC data formats to store data into ADLSs.

Created data driven work flows for data movement and transformation using Data Factory.

Extracted huge files by using Azure Storage Explorer from the Data Lake.

Design & implement migration strategies for traditional systems on Azure. Worked on Azure SQL

Database, Azure Data Lake(ADLS), Azure Data Factory(ADF), Azure SQL DW, Azure Service Bus, Azure Key Vault, Blob storage, Azure App service.

Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files.

Collaborated with cross-functional teams to define and deploy Data Architectures that adhere to best practices for data security, governance, and compliance in complex enterprise environments.

Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.

Transform and analyze the data using Pyspark, HIVE, based on ETL mappings.

Developed PySpark programs and created the data frames and worked on transformations.

Worked on Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.

Involved in all the Acquisition, Transformation and Model phases of the project.

Created DAX expressions as per the requirement in the sprints.

Connected to analysis server and creating reports using Tableau.

Generated reports to view by end customers with required data using Power BI reporting tool.

Monitoring the Spark Jobs in the cluster environment debugging for the failed jobs.

Collaborated with Azure Logic Apps administrators to monitor and troubleshoot process automation and data processing pipeline issues.

Optimized code for Azure Functions to extract, transform, and load data from diverse sources like databases, APIs, and file systems.

Designed, built, and maintained data integration programs within a Hadoop and RDBMS environment.

Interacted closely with business users, providing end to end support.

Project Name

Woolies Data Mart

Client

Woolworths

Role

Data engineer

Domain

Manufacturing

Technologies

SQL SERVER, Azure Data Factory and Azure Databricks, Talend, Tableau.

Data base

Snowflake Database

Methodology

Agile

Team Size

Description: Woolworths is a project that acts as an interface between supplier and manufacturer that permits the Organization to collect, classify and track Usage and Demand data. It created statistical-based Forecasts of future requirements as well.

Woolies Data Mart /Woolworths, Bangalore, India Dec 2014 - Jun 2019

Role: Azure Data Engineer

Responsibilities:

Extracted Data from CSV, Excel and SQL server sources to Staging Tables Dynamically using ADF pipelines.

Implemented Control flow activities: Copy activity, Pipeline, Get Meta data, If Condition, Lookup, Set Variable, Filter, For Each Pipeline Activities for On-cloud ETL processing

Created the Linked Services for various Source Systems and Target Destinations. Primarily involved in Data Migration using SQL, SQL Azure, Azure Data Lake, and Azure Data Factory.

Professional in creating a data warehouse, design-related extraction, loading data functions, testing designs, data modelling, and ensure the smooth running of applications.

Responsible for extracting the data from OLTP and OLAP using Azure Data factory and Data bricks to Data Lake.

Developed pipelines that can extract data from various sources and merge into single source datasets in Data Lake using Data bricks.

Created the Datasets for various Source Systems and Target Destinations.

Implemented Incremental load strategy for loading on daily basis.

Parameterized the Datasets and Linked Services using the Parameters and Variables.

Connect to different Data sources from Tableau.

Developed Dashboard Reports using Tableau.

Created New Calculated Columns and Calculated Measures using DAX Expressions.

Experience in creating calculated measures and columns with DAX in MS POWER BI DESKTOP.

Experience in creating different visualizations like line, bar, histograms, scatter, water, Bullet, Heat maps, tree maps etc.

Experience in Configuring Gateways in the power BI services.

Knowledge in Creating Workspaces and Subscriptions.

Troubleshoot data quality issues and perform root cause analysis to proactively resolve product and operational issues

Develop Data visualization Dashboards using looker using looker platform as per specified requirements to communicate the KPIs and metrics with the shareholders

Installed and configured Apache airflow and created dags to run the airflow

Automated scripts and workflow using Apache airflow and shell scripting to ensure daily execution in production

Created data models and designed the database, performed query optimization, index management, integrity checks based up on business and engineering needs

Able to handle multiple tasks in a fast-paced environment and Excellent verbal, written, and interpersonal communication skills.

Extracted huge files by using Azure Storage Explorer from the Data Lake.

Involved in all the Acquisition, Transformation and Model phases of the project.

Generated reports to view by end customers with required data using Tableau reporting tool.

Education:

Master of Science, Indiana Wesleyan University

December /2023 Indiana, USA Computer Science

Bachelor of Technology, Holy Mary Institute of Technology & Science

May/2015 Hyderabad, India Computer Science

Certifications:

DP-203: Data Engineering on Microsoft Azure.

Snowflake Certification.

Data Warehouse ETL Testing & Data Quality Management A-Z (Udemy)

Contact this candidate