Post Job Free

Resume

Sign in

Data Engineer Azure

Location:
Philadelphia, PA
Salary:
160000
Posted:
April 10, 2024

Contact this candidate

Resume:

Name: Shanmuka Siva Varma Chekuri

Azure Data Engineer / ETL developer

Contact: +1-856-***-****

Email: ad4xce@r.postjobfree.com

LinkedIn

OBJECTIVE

To work in a demanding workplace by showcasing my efficiency, displaying my intellect, and utilizing my software professional talents and adept IT professional with 5+ years of professional IT experience with Data Warehousing/Big Data which includes experience in Big Data ecosystem related technologies like Spark, Data Visualization, Reporting and data quality solutions.

Professional Summary

Specialized in designing cloud-based solutions in Azure, creating Azure SQL databases, setting up Elastic pool jobs, and designing tabular models in Azure Analysis Services.

Expertise in building CI/CD pipelines on Azure using Azure DevOps for efficient code management and deployment.

Proficient in relational and dimensional data modeling, with hands-on experience in designing ETL processes for data migration into data warehouses.

Skilled in handling large databases, producing tables, reports, and graphs, and conducting complex data manipulations.

Strong background in real-time data analytics using Spark Streaming, Kafka, and Flume.

Proficient in data analytics, ad-hoc reporting, and visualization tools such as Tableau, ggplot2, PowerBI, dash, and Flask for creating impactful dashboards.

Experienced in the Software Development Lifecycle (SDLC) using SCRUM and Agile methodologies.

Demonstrated ability to prepare comprehensive project documentation, track project progress, and communicate effectively with stakeholders.

Proven track record in migrating SQL databases to Azure services such as Azure Data Lake, Data Lake Analytics, SQL Database, Data Bricks, and SQL Data Warehouse.

Extensive experience in implementing end-to-end Azure data solutions, including provisioning storage accounts, Azure Data Factory, SQL Server, SQL Databases, Data Bricks, and Cosmos DB.

In-depth understanding of Spark architecture with Databricks, including expertise in setting up AWS and Microsoft Azure environments for business analytics and managing machine learning lifecycles.

Strong background in Spark Streaming and Structured Streaming.

Certifications:

Databricks Certified: Databricks Certified lakehouse fundamentals.

Microsoft Certified: Azure Data Engineer Associate

TECHNICAL SKILLS

Hadoop Distributions

Apache Hadoop 2.x/1.x, Cloudera CDP, Hortonworks HDP, Amazon EMR (EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, DynamoDB, Redshift, ECS, Quick sight)

Programming Languages

Python, R, Scala, C++, Java, SQL, HiveQL, UNIX shell Scripting,

cloud Technologies

AWS, Azure, Google cloud platform Cloud Services (PaaS & IaaS), Active Directory, Application Insights, Azure Monitoring, Azure Search, Data Factory, Key Vault and SQL Azure, Azure Devops, Azure Analysis services, Azure Synapse Analytics (DW), Azure Data Lake.

Databases

Snowflake, MySQL, Oracle, MS SQL SERVER, PostgreSQL, DB2

NoSQL Databases

HBase, Cassandra, Mongo DB, DynamoDB and Cosmos DB

Version Control

Git, SVN, Bitbucket, Azure DevOps

ETL/BI

Informatica, SSIS, SSRS, SSAS, Tableau.

Operating System

Mac OS, Windows 7/8/10, Unix, Linux, Ubuntu

Methodologies

RAD, JAD, UML, System Development Life Cycle (SDLC), Jira, Confluence, Agile, Waterfall Model

Work Experience:

Client: Everest Global, Warren, NJ Mar 2023 - Present Role: ETL Developer\Data Engineer

Responsibilities:

Designed SSIS Packages to transfer data from flat files to SQL Server using Business Intelligence Development Studio.

Extensively used SIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.

Used Execution Plan, SQL Profiler and Database Engine Tuning Advisor to optimize queries and enhance the performance of

databases.

Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.

Created databases and schema objects including tables, indexes and applied constraints, connected various applications to

the database and written functions, stored procedures and triggers.

Developed and maintained database solutions, including designing tables, views, stored procedures, and triggers to support business applications.

Collaborated with data architects to design and implement data models for optimal performance and scalability.

Participated in the full software development life cycle, including requirements analysis, design, coding, testing, and deployment.

Worked closely with QA teams to ensure the accuracy and integrity of data throughout the ETL process.

Spearheaded the development and optimization of ETL processes using SSIS, resulting in a 20% reduction in data processing time.

Collaborated with business analysts to gather requirements, design data models, and implement scalable solutions to meet business needs.

Implemented best practices for code deployment and version control using Azure DevOps, enhancing team collaboration and ensuring code integrity.

Led the integration of Tidal job scheduling system, streamlining job execution and improving overall system reliability.

Utilized Jira for project management, issue tracking, and collaboration with cross-functional teams, ensuring timely delivery of projects.

Conducted performance tuning and optimization of SQL queries, stored procedures, and ETL packages, resulting in improved system efficiency.

Perform validation and verify software at all testing phases which includes Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases.

Develop conceptual solutions & create proofs-of-concept to demonstrate viability of solutions. Implement Copy activity, Custom Azure Data Factory Pipeline Activities. Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, PowerShell.

Created logging defects in Jira and Azure Devops tools. Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central Datamart changes.

Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and applied automation to environments and applications.

Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, SSMS,SSIS, Azure DevOps, Azure SQL DW(Synopsis), Azure SQL DB.

Client: Steel and Metal Service Center, New Castle, DE Jan 2021 – Feb 2023

Role: SQL Developer\Data Engineer

Responsibilities:

Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements. Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).]

Implemented setting up separate application and reporting data tiers across servers using Geo replication functionality. Implemented Disaster Recovery and Failover servers in Cloud by replicating data across regions.

Experience in implementing Azure data solutions, provisioning storage accounts, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB.

Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the SQL Activity and creating UNIX shell scripts for database connectivity and executing queries in parallel job execution.

Implemented Azure data lake, Azure Data factory and Azure data bricks to move and conform the data from on - premises to cloud to serve the analytical needs of the company.

Created Azure BLOB and Data Lake storage and loaded data into Azure SQL Synapse analytics (DW).

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Developed Spark applications using Scala and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data uncover insight into the customer usage patterns and even Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session.

Worked on Spark Architecture for performance tuning including spark core, spark SQL, Data Frame, Spark streaming, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection.

Created data pipeline package to move data from Blob Storage to MYSQL database and executed MySQL stored procedure using events to load data into tables Worked on creating correlated and non-correlated sub-queries to resolve complex business queries involving multiple tables from different databases.

Perform analysis on data quality and apply business rules in all layers of data extraction transformation and loading process.

Perform validation and verify software at all testing phases which includes Functional Testing, System Integration Testing, End to End Testing, Regression Testing, Sanity Testing, User Acceptance Testing, Smoke Testing, Disaster Recovery Testing, Production Acceptance Testing and Pre-prod Testing phases.

Develop conceptual solutions & create proofs-of-concept to demonstrate viability of solutions. Implement Copy activity, Custom Azure Data Factory Pipeline Activities. Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, PowerShell.

Created logging defects in Jira and Azure DevOps tools. Involved in planning cutover strategy, go-live schedule including the scheduled release dates of Portfolio central Datamart changes.

Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and applied automation to environments and applications. Worked on various automation tools like GIT, Terraform, and Ansible.

Environment: Databricks, Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure HDInsight, Blob storage, Apache Spark, Azure ADF V2, ADLS, Spark SQL, Python/Scala, SSMS, SSIS, Visual Studio, Azure SQL DW(Synopsis), Azure SQL DB.

Client: Innovation Systems, Bangalore, India Jan 2019 - Dec 2020

Role: Data Engineer

Responsibilities:

Spearheaded the successful implementation and ongoing management of data solutions on the Azure cloud platform, utilizing cutting-edge tools such as Azure Data Factory and Azure Databricks.

Collaborated closely with cross-functional teams, translating business requirements into scalable and efficient data pipelines that supported critical business processes.

Developed, optimized, and maintained robust Extract, Transform, Load (ETL) processes using Azure Data Factory. Ensured seamless integration of data from diverse sources into the Azure data warehouse.

Implemented data transformations and cleansing procedures, playing a pivotal role in maintaining high data quality standards and integrity.

Contributed significantly to the design and implementation of scalable data models for Azure Synapse Analytics. This involved meticulous attention to detail and alignment with overarching data architecture and strategic business objectives.

Worked collaboratively with data architects to ensure the seamless integration of data models within the broader data ecosystem.

Actively engaged with data scientists, business analysts, and other stakeholders to discern intricate data requirements and deliver tailored data engineering solutions.

Authored and maintained comprehensive documentation for data pipelines, ETL processes, and data models. Ensured knowledge transfer and adherence to industry best practices.

Executed targeted performance tuning initiatives on T-SQL queries, stored procedures, and data processes. Achieved notable improvements in system efficiency and significant reductions in data processing times.

Implemented and championed best practices in data storage, indexing strategies, and overall data retrieval, resulting in enhanced system performance.

Developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Environment: Hadoop, HDFS, Hortonworks, Hive, Sqoop, Python, Unix, Shell Scripting, Spark SQL, Map Reduce, Hive, Pyspark, ETL, AWS, Oracle, MySQL.

Education Qualifications:

Rowan University, New Jersey Bachelor’s in Computer Science



Contact this candidate