Resume

Azure Data Engineer

Location:

Raleigh, NC

Posted:

March 01, 2024

Contact this candidate

Resume:

KARTHIK BANTUPALLI Senior Data Engineer Phone: 414-***-**** Email: ad31fe@r.postjobfree.com

Linkedin: linkedin.com/in/karthikbantupalli

PROFESSIONAL SUMMARY

** ***** ** ** ********** in various technologies with requirement gathering, Data engineering, Data Modeling, Analysis, ETL (Extraction, Transformation, and Loading) Development, Validation, Deployment, Monitoring, and Visualization reports.

Implementation of Azure Cloud Components-Azure Data Factory, Azure snowflake, Azure Data Lake, Azure Blob storage, Azure Databricks, Azure Synapse Analytics, Azure SQL DB/DW, Azure Cosmos DB, Power BI.

Project roles incorporated in the full project life cycle of Analysis, Design, Build Testing, Deployment, Data Migration, and maintenance using SDLC and Agile& Waterfall methodology. Strong experience with T-SQL (DDL & DML, TCL, DCL) in implementing & developing stored procedures, triggers, Nested Queries, Joins, Cursors, Views, User Defined Functions, Indexes, User Profiles, Relational Database Models, Creating & Updating tables.

I am proficient in creating intricate mappings, reusable transformations, sessions, and workflows with the Informatica ETL tool, enabling efficient data extraction from diverse sources and its subsequent loading into designated targets.

Expertise in developing Spark applications through Spark-SQL on Databricks, focusing on extracting, transforming, and aggregating data from a range of file formats. Managed the scheduling of production jobs using tools such as Airflow and IBM Tivoli.

Experienced in handling a variety of file formats, including Avro, Parquet, Sequence, JSON, ORC, CSV, and plain text. This involves loading data, parsing, collecting, and executing transformations.

Created Databricks notebooks using Python (PySpark) and SparkSQL for transforming the data that is stored in Azure Data Lake storage Gen2 from Raw to Stage and curated zones.

Experience in RDBMS concepts such as Tables, User Defined Data Types, Indexes, Indexed Views, Functions, Table Variables and Stored Procedures.

Have extensive experience in creating pipeline jobs, scheduling triggers, and Mapping data flows using Azure Data Factory (V2) and using Key Vaults to store credentials.

Developed and maintained data pipelines using AWS Glue, effectively handling data extraction, transformation, and loading processes (ETL) across various data sources, resulting in optimized data workflows and enhanced data availability.

Managed and administered databases in Amazon RDS for Oracle, ensuring high availability, automated backups, and efficient performance tuning, leading to improved database reliability and reduced downtime.

Demonstrated proficiency in Oracle database management, executing complex SQL queries, performance optimization, and implementing security measures, contributing to robust and secure data management practices.

Utilized PostgreSQL for database design and implementation, showcasing skills in advanced SQL, database tuning, and maintenance, which facilitated efficient data storage and retrieval operations.

Orchestrated data integration tasks using AWS Glue, enabling seamless data consolidation from disparate sources into a centralized data repository, thus enhancing data analytics and reporting capabilities.

Employed best practices in database management and migration, including Schema design, indexing, and query optimization in both Oracle and PostgreSQL environments, leading to significant improvements in system performance and query execution time.

Collaborated in cross-functional teams to design and implement scalable and resilient database solutions on Amazon RDS for Oracle, aligning with business requirements and ensuring data integrity and consistency.

Automated routine database tasks using AWS Glue scripts, reducing manual intervention and increasing efficiency in data processing and management workflows.

Conducted regular database health checks and performance assessments in Oracle and PostgreSQL systems, identifying and resolving issues proactively to maintain optimal database health and performance.

Continuously updated technical knowledge and skills in AWS Glue, Amazon RDS for Oracle, and PostgreSQL, staying abreast of the latest features, best practices, and industry trends to drive innovation and efficiency in database management.

I worked on various ETL/ELT/EDL tools for developing workflows for extracting, transforming, and loading data into different database systems.

Have experience in connecting AWS resources like S3 bucket, RDS, Redshift and creating pipelines to move data from AWS to Azure.

Led a team of 12 in implementing new requirements by providing ETL mapping documents and helped to resolve technical related issues.

Experience in migration of on-premises databases to Microsoft Azure environment (Blobs, Azure Data Warehouse, Azure SQL Server, PowerShell Azure components, SSIS Azure components).

Understanding of RDD operations in Apache Spark (Transformations, Actions & Persistence)

Experience in developing Dashboard, and Parameterized Reports using SSRS, Tableau and Power BI.

Experience in Agile Program management, translating user stories for sprints into techno - functional deliverables, technical and functional specifications, stakeholder management, Process re-engineering, change request management and project delivery.

Good hands-on experience in Spark Core, Spark SQL, Scala, Spark Streaming, and Implemented Snowflake architecture which store and analyzed all data records in one place.

Utilized Snowflake Clone and Time Travel functionalities effectively.

I possess significant expertise in the creation, upkeep, and implementation of EDW, Data Marts, ODS, and Data warehouses utilizing both Star schema and Snowflake schema.

I actively contribute to the advancement, enhancement, and upkeep of Snowflake database applications.

I have practical experience in utilizing version control tools such as GitHub, Azure DevOps, Bitbucket, Git Labs, and ARM templates.

Highly skilled in Agile methodologies and proficient in utilizing JIRA/ADO for project management and reporting.

TECHNICAL SKILLS

Azure Services

Azure Data Factory, Azure DataBricks, Logic Apps, Functional Apps, Snowflake, Azure DevOps, Azure SQL Database, Azure Synapse Analytics, Azure Data Lake Storage.

Hadoop Distribution

Cloudera, Horton Works.

Big Data Technologies

MapReduce, Hive, Tez, Python, PySpark, Scala, Kafka, Spark Streaming, Oozie, Pig, Flume, Sqoop, Zookeeper.

ETL Tools

Azure Data Factory (V2), SSIS (SQL Server Integration Services), Informatica PowerCenter/Power Exchange, IICS (Informatica Intelligent Cloud Services), SSMS, AWS Glue, Databricks, Spark/Hive (for Big Data)

Languages

Java, SQL, R, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), PySpark, PySpark SQL, SAS, R Programming, RStudio, PL/SQL, Linux shell scripts, Scala. C#

Web Technologies

HTML, CSS, JavaScript, XML, JSP, Restful, SOAP.

Operating Systems

Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Data Visualization

Power Bi, Tableau.

Database

MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse, MS Excel, MS Access, Oracle, Cosmos DB.

Build Automation tools

Ant, Maven.

Version Control

GIT, GitHub, Azure DevOps server

Methodology

Agile, Scrum.

IDE & Build Tools, Design

Eclipse, Visual Studio.

EDUCATION

Bachelor of Engineering in Electronics and Communication Engineering from JNTU Kakinada, India.

Certification:

Microsoft Azure Data Engineer Associate DP-203

WORK EXPERIENCE

Role: Data Engineer April 2022 – Present

Client: Elevance Health, Indianapolis, IN

Responsibilities:

Created and deployed scalable data ingestion pipelines using Azure Data Factory, enabling the collection of data from various sources like SQL databases, CSV files, and REST APIs.

Developed data processing workflows utilizing Azure Databricks, harnessing the power of Spark for distributed data processing and transforming tasks.

Ensured the quality and integrity of data through data validation, cleansing, and transformation operations carried out with Azure Data Factory and Databricks.

Created serverless computing solutions that can dynamically scale based on demand, utilizing Azure Functions.

Utilized Azure Data Factory, Azure Databricks, PySpark, Spark SQL, and U-SQL Azure Data Lake Analytics to extract, transform, and load data from source systems into Azure Data Storage services.

Utilized Azure Logic Apps to automate workflows, integrate systems, data, and services across various platforms and environments.

Efficiently ran extensive parallel and high-performance computing applications through the utilization of Azure Batch.

Built streaming ETL pipelines using Spark Streaming to extract data from multiple sources, perform real-time transformations, and load it into a data warehouse like Azure Synapse Analytics.

Developed a cloud-based data warehouse solution on Azure using Snowflake, capitalizing on its scalability and high-performance capabilities.

Designed and implemented efficient Snowflake schemas, tables, and views to optimize data storage and retrieval for analytics and reporting objectives.

Collaborated closely with data analysts and business stakeholders, understanding their requirements and implementing suitable data models and structures within Snowflake.

Created and enhanced Spark jobs to execute data transformations, aggregations, and machine learning tasks on large-scale datasets.

Utilized Azure Databricks or HDInsight to scale out the Spark Streaming cluster as required.

Leveraged Azure Synapse Analytics to seamlessly integrate big data processing and analytics capabilities, facilitating effortless data exploration and generation of insights.

Automated data pipelines and workflows through the configuration of event-based triggers and scheduling mechanisms.

Implemented solutions for data lineage and metadata management to track and monitor data flow and transformations effectively.

Identified and resolved issues causing performance bottlenecks in the data processing and storage layers, resulting in improved query execution and reduced data latency.

Implemented effective techniques like partitioning, indexing, and caching strategies in Snowflake and Azure services to optimize query performance and minimize processing time.

Conducted thorough performance tuning and capacity planning exercises to ensure the scalability and efficiency of the data infrastructure.

Developed a Java-based Spark job that indexes data from external Hive tables in HDFS into Azure Functions.

Developed a robust CI/CD framework for data pipelines using the Jenkins tool.

Collaborated closely with DevOps engineers to fulfill the client's requirements by creating automated CI/CD and test-driven development pipelines using Azure.

Demonstrated proficiency in scripting languages such as Python and Scala through hands-on programming experience.

Utilized Hive scripts through Hive on Spark and SparkSQL for seamless data processing.

Actively collaborated on ETL tasks, prioritizing data integrity and consistently verifying the stability of the pipelines.

Proficiently utilized Kafka and Spark streaming technologies to handle and process streaming data in specific use cases.

Designed, implemented, and managed a data pipeline utilizing Kafka, Spark, and Hive for efficient data ingestion, transformation, and analysis.

Designed real-time data processing solutions using Kafka and Spark Streaming, enabling the ingestion, transformation, and analysis of high-volume streaming data.

Developed efficient Spark core and Spark SQL scripts using Scala to expedite data processing.

Utilized JIRA for project tracking and reporting, creating sub-tasks for development, QA, and partner validation purposes.

Extensive experience in practicing Agile methodologies, actively participating in a wide range of Agile ceremonies, including daily stand-ups and internationally coordinated PI Planning.

Environment: Informatica PowerCenter/Power Exchange, IICS (Informatica Intelligent Cloud Services SSMS, AWS Azure Databricks, Data Factory, Logic Apps, Azure EventHub, Containerization, Spark Streaming, Data Pipeline, Terraform, Azure DevOps, Oracle, HDFS, MapReduce, Spark, Hive, SQL, Python, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Role: Data Engineer/ ETL developer Oct 2019 – Mar 2022

Client: Infosys (Verizon), North Carolina, Raleigh

Responsibilities:

Spearheaded the deployment of Docker environments, significantly enhancing scalability and data processing efficiency.

Architected and implemented robust data pipelines using Azure EventHub, expertly managing high-volume, real-time data streams.

Leveraged Spark Streaming for accelerated real-time data analysis, facilitating rapid insight extraction from streaming sources.

Designed and executed end-to-end data pipelines, ensuring seamless integration and data flow across diverse systems.

Developed a cutting-edge Spark Streaming application, integrating Azure Functions and Azure Logic Apps for dynamic, event-driven data processing.

Performed sophisticated ETL operations within Azure Databricks, utilizing JDBC connectors for effective relational database integration.

Utilized Terraform for automated cloud infrastructure provisioning, optimizing deployment processes and operational efficiency.

Implemented Azure DevOps for continuous integration and deployment, significantly improving operational agility and system reliability.

Managed and optimized YAML pipelines in Azure DevOps, ensuring efficient workflows for build, test, and deployment processes.

Orchestrated comprehensive data acquisition and transformation processes, upholding data integrity and quality standards.

Collaborated with cross-functional teams to gather requirements, design data models, and deliver tailored solutions aligned with business goals.

Conducted in-depth data analysis and profiling, deriving actionable insights to support strategic decision-making and business objectives.

Implemented stringent data governance and security protocols to comply with industry standards and protect sensitive information.

Tuned and optimized data pipelines and queries, significantly enhancing system efficiency.

Provided technical guidance and mentorship, promoting knowledge sharing and collaborative development.

Fine-tuned Spark jobs within Databricks for enhanced performance and resource efficiency.

Expertise in Hive querying, table creation, and employing HiveQL for replicating MapReduce functionality.

Participated in the strategic migration of ETL processes from Oracle to Hive, demonstrating proficiency in data manipulation.

Utilized PySpark for sophisticated data analysis and processing and leveraged Spark Streaming for effective batch segmentation.

Implemented CI/CD pipelines for streamlined build and deployment in Hadoop ecosystems.

Proficient in using JIRA for project management and Git for version control, ensuring smooth workflow and code repository maintenance.

Environment: SQL Server, SSIS, SSMS, Azure Databricks, Data Factory, Logic Apps, Azure EventHub, Containerization, Spark Streaming, Data Pipeline, Terraform, Azure DevOps, Oracle, HDFS, MapReduce, YARN, Spark, Hive, SQL, Python, Scala, PySpark, GIT, JIRA, Jenkins, Kafka, ADF Pipeline, Power Bi.

Role: ETL Developer Aug 2016 – Sep 2019

Client: Credit Suisse - New York, NY

Responsibilities:

On a regular basis, imported data from MySQL into HDFS using Sqoop for efficient loading.

Conducted aggregations on large data volumes using Apache Spark and Scala, storing the results in the Hive data warehouse for subsequent analysis.

Extensively worked with big data ecosystems such as Hadoop, Spark, Hortonworks, and Cloudera within Data Lakes.

Efficiently loaded and transformed structured, semi-structured, and unstructured datasets.

Developed customized Hive queries to analyse data and meet specific business needs.

Leveraged HBASE integration with Hive to establish HBASE tables in the Analytics Zone.

Utilized Kafka and Spark Streaming to process streaming data for specific use cases.

Created data pipelines using Flume and Sqoop to ingest customer behavioural data into HDFS for analysis.

Made use of various big data analytic tools like Hive and MapReduce to analyse Hadoop clusters.

Implemented a data pipeline with Kafka, Spark, and Hive to handle ingestion, transformation, and analysis of data.

Wrote Hive queries and used Hive QL to simulate MapReduce functionalities for data analysis and processing.

Migrated data from RDBMS (Oracle) to Hadoop using Sqoop for efficient data processing.

Utilized Pig as the ETL tool to execute transformations, event joins, filtering, and pre-aggregations, enabling efficient data processing and integration.

Developed custom scripts and tools using Oracle's PL/SQL language to automate data validation, cleansing, and transformation processes.

Continuous Integration/Continuous Deployment (CI/CD) pipelines were established to build and deploy projects in the Hadoop environment.

Utilized JIRA for issue and project workflow management.

PySpark and Spark SQL were utilized to accelerate data testing and processing in Spark.

Developed ETL pipelines using Spark and Hive to execute specific business transformations.

Spark Streaming was applied to process streaming data in batches, optimizing batch processing effectiveness.

Zookeeper was leveraged to coordinate, synchronize, and serialize servers in clusters.

The Oozie workflow engine was leveraged for scheduling jobs in Hadoop.

Utilized PySpark in SparkSQL for data analysis and processing.

Git was utilized as a version control tool to maintain the code repository.

Environment: Sqoop, MYSQL, HDFS, Apache Spark Scala, Hive Hadoop, Cloudera, Kafka, MapReduce, Zookeeper, Oozie, Data Pipelines, RDBMS, Python, PySpark, Ambari, JIRA.

Role: Data Engineer Nov 2013 – Jul 2016

Client: Mayo Clinic – Rochester, MN

Responsibilities:

Developed the ETL workflows using the Informatica tool to extract data from Oracle databases and flat files, and then efficiently load it into the target Oracle Database.

Generating employment opportunities, configuring the SQL Mail Agent, setting up notifications, and scheduling automated processes for DTS/SSIS packages.

Effectively managing and updating Erwin models for logical and physical data modelling of the Consolidated Data Store (CDS), Actuarial Data Mart (ADM), and Reference DB, in accordance with user requirements.

Created and implemented SSIS and SSRS packages to extract, modify, and load data from various sources, including DB2, SQL, Oracle, flat files (CSV, delimited), APIs, XML, and JSON.

Utilizing TFS for source control and to track the deployment of environment-specific scripts.

Converting existing data models from Erwin to PDF format and publishing them on SharePoint to enable user access.

Creating, overseeing, and managing databases like the Consolidated Data Store, Reference Database, and Actuarial Data Mart.

Writing triggers, stored procedures, and functions using Transact-SQL (T-SQL) and maintaining physical database structures.

Deploying scripts in various environments based on Configuration Management and Playbook requirements.

Establishing and maintaining files and file groups, establishing table/index relationships, and optimizing query performance through tuning.

Tracking and resolving defects using Quality Center to ensure effective issue management.

Maintaining user accounts, roles, and permissions within the SQL Server environment.

Environment: SQL Server, SSRS, SSIS, T-SQL, Windows Server 2003, Performance Point Server 2007, Oracle 10g, Visual Studio.

Role: Data Warehouse Developer Jul 2012 – Sep 2013

Client: OG Software Solutions, Chennai, India

Responsabilités :

Proficient in crafting ETL data flows through SSIS, constructing mappings and workflows to retrieve data from SQL Server, and performing data migration and transformation from Access/Excel Sheets utilizing SQL Server SSIS.

Skilled in the art of Dimensional Data Modelling to design Data Marts, adept at recognizing Facts and Dimensions, and proficient in developing fact tables, dimension tables, while incorporating Slowly Changing Dimensions (SCD).

Well-versed in managing errors and events with tools such as Precedence Constraints, Break Points, Check Points, and Logging.

Skilled in constructing cubes and dimensions using various architectures and data sources for business intelligence, as well as proficient in writing MDX scripting.

In-depth understanding of the characteristics, structure, attributes, hierarchies, and star/snowflake schemas of data marts.

Proficient in developing SSAS cubes, including aggregation, key performance indicators (KPIs), measures, cube partitioning, data mining models, and deploying/processing SSAS objects.

Experienced in generating ad hoc reports and reports with complex formulas, as well as querying databases for business intelligence purposes.

Expertise in creating parameterized, chart/graph, linked, dashboard, scorecards, and drill-down/drill-through reports on SSAS cubes using SSRS, with a focus on cascading reports.

Possessing flexibility, enthusiasm, and a project-oriented mindset, while being a valuable team player with exceptional written, verbal communication, and leadership skills to devise innovative solutions for challenging client requirements.

Environment: SQL Server 2008/2012 Enterprise Edition, Visual Studio 2010, SSIS, Share point, MS Access, Team Foundation server, Git.

Contact this candidate