Post Job Free

Resume

Sign in

Data Engineer Azure

Location:
Denton, TX
Posted:
February 10, 2024

Contact this candidate

Resume:

Manish Reddy

Mail: ad3jes@r.postjobfree.com

Phone: 940-***-****

Certifications: Azure Data Engineer (DP-203)

PROFESSIONAL SUMMARY:

Over 3 years of IT experience in a variety of industries, which includes hands on experience in Big Data, Hadoop, and Cloud Computing.

Expertise with the tools in Hadoop Ecosystem including Spark, Hive, HDFS, MapReduce, Sqoop, Kafka, Yarn, Oozie, and HBase.

Excellent knowledge on Distributed components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.

Experience automating data engineering pipelines utilizing proper standards and best practices (right partitioning, right file formats, incremental loads by maintaining previous state etc.,)

Experience in designing and developing production ready data processing applications in Spark using Scala/Python.

Experience working with Snowflake Multi cluster and virtual warehouses in Snowflake.

Strong experience creating the most efficient Spark applications for performing various kinds of data transformations like data cleansing, de-normalization, various kinds of joins, data aggregation.

Experience fine-tuning Spark applications utilizing various concepts like Broadcasting, increasing shuffle parallelism, caching/persisting Data Frames, sizing executors appropriately to utilize the available resources in the cluster effectively etc.,

Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matplotlib etc.

Good Knowledge in productionizing Machine Learning pipelines (Featurization, Learning, Scoring, Evaluation) primarily using Spark ML libraries.

Good exposure with Agile software development process.

Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.

Strong experience on Hadoop distributions like AWS, and Azure Databricks.

Good understanding of NoSQL databases and hands-on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.

Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet, and Avro.

Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.

Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.

Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.

Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers, and strong experience in writing complex queries for Oracle.

Software development involving cloud computing platforms like Azure, Amazon Web Services (AWS), and Google Cloud (GCP).

Experienced in working with Amazon Web Services (AWS) using S3, EMR, Redshift, Athena, Glue Meta store etc.,

Experienced in working with Microsoft Azure using Azure Data Lake Gen2, Azure Data Factory, Azure Synapse Analytics, Azure Stream Analytics, Azure Databricks, Azure Blob Storage, Azure Purview, Azure Data Flow etc.,

Experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Data Proc, Storage, Composer and Stack driver.

Strong experience in Object-Oriented Design, Analysis, Development, Testing, and Maintenance.

Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.

Worked in large and small teams for systems requirement, design & development.

Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ, and repositories SVN and Git.

Experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

Experience of using build tools SBT, Maven.

TECHNICAL SKILLS:

Operating Systems

Linux, Windows XP/7/8/10, Mac.

Programming languages

Python, R, SQL, T-SQL.

Big Data Technologies

HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark, Spark Streaming, Kafka, ETL.

Scripting/Web Languages

HTML5, CSS3, XML, SQL, Shell/Unix, Perl, Python.

Databases

Cassandra, HBASE, MongoDB, Oracle, MS SQL, MY SQL, Teradata.

Utilities/Tools

Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Alteryx, Visio, Jenkins, Jira, IntelliJ.

Cloud Services

AWS (EC2, S3, EMR, RDS, Lambda, CloudWatch, Auto scaling, Redshift, Cloud Formation, Glue etc.) Azure (Databricks, Data Lake, Data Factory, Data Storage, HDInsight, SQL etc.)

Azure Data Factory, Azure Synapse Analytics, Azure Stream Analytics, Azure Logic Apps, HDInsights, Service Bus, Databricks, Azure Data Lake Storage, Azure SQL Data Warehouse, Azure Functions, Azure Data Flow, SSIS, SSAS, SSRS.

Data Visualization Tolls

Tableau, PowerBI, SSRS, Cloud Health.

Software Life Cycle

SDLC, Waterfall and Agile models.

WORK EXPERIENCE:

Client: British Airways - USA May 2023 – Dec 2023

Role: Data Engineer

Responsibilities:

Developed ETL workflows using Azure Data Factory, Azure Synapse Analytics, and Azure Logic Apps to efficiently load big data sets into the data warehouse.

Leveraged SQL scripting for data modeling, enabling streamlined data querying and reporting capabilities, which contributed to improved insights into customer data.

Actively participated in collaborative efforts to design and construct data pipelines, employing Apache Airflow for orchestration.

Additionally, I played a key role in deploying real-time dashboards using PowerBI. These dashboards provided senior management with instant access to vital metrics related to passenger demand and engagement, facilitating data-driven decision-making.

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Worked in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet, and Avro.

This experience allowed me to develop a deep understanding of data engineering principles, especially within the Azure ecosystem, and honed my skills in ETL, SQL scripting, and dashboard development. It also emphasized the importance of real-time data processing and reporting for business intelligence.

Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.

Expert with Azure Automation PowerShell, Azure Resource Manager Templates, Terraform.

Client: Innovative information Solutions - India June 2019 – June 2021

Role: Software Developer

Responsibilities:

Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing on Hortonworks.

Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.

Involved in designing and deploying multi-tier applications using all the Azure services such as (Virtual Machines, Azure DNS, Azure Storage, Azure SQL Database, Azure Cosmos DB, Azure Service Bus, Azure Queue Storage, Azure Active Directory) with a focus on high-availability, fault tolerance, and auto-scaling using Azure Resource Manager templates.

Collected data using Spark Streaming from Azure Storage in near-real-time, performing necessary transformations and aggregations on the fly to build the common learner data model, and persisted the data in HDFS.

Loaded data from different source (database & files) into Hive using Talend tool.

Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.

Develop stored procedures in snowflake and use in Talend for loading dimensions and facts.

Utilized Azure Data Factory for data management and pipeline processes in the Azure HDInsight cluster.

Used Azure Databricks for querying the data in the publish layers, enabling other teams or business users to access it for faster processing.

Proficient with container systems like Docker and container orchestration like Azure Kubernetes Service (AKS), worked with Terraform for infrastructure as code.

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Built real-time data pipelines with Azure Event Hubs and Spark Streaming.

Developed Python scripts using both DataFrames/SQL/Datasets and RDD/MapReduce in Spark for data aggregation, queries, and writing data back into OLTP systems through Azure Data Factory.

Developed Hive queries to pre-process the data required for running the business process.

Evaluate Snowflake design considerations for any change in the application.

Install and Configure Apache Airflow for S3 bucket and snowflake data warehouse and created DAGs to run the Airflow.

Push data as delimited files into HDFS using Talend Big data studio.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.

Cluster coordination services through Azure Service Fabric.

Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible.

Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.

Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.

Developed multiple Kafka Producers and Consumers as per the software requirement specifications.

Expert with Azure Automation PowerShell, Azure Resource Manager Templates.

The Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions, and bucketing, intended for efficiency.

Developed Hive queries for the analysts by loading and transforming large sets of structured, semi structured data using hive.

Environment: HDP, Hadoop, AWS, EC2, S3 Bucket, Redshift, Cassandra, Hive, HDFS, Spark, Spark-SQL, Spark-Streaming, Scala, KAFKA, Hortonworks, Map Reduce, Airflow, Sqoop, Java Oracle 12c, SQL Server, T-SQL, MongoDB, HBase, Python and Agile Methodologies.

Client: Talentio - India July 2019 – June 2021

Role: Software Programming Instructor

Responsibilities:

Delivered dynamic seminars and coding lectures on competitive C programming, engaging with over 30 universities and facilitating learning for numerous students.

Conducted hands-on workshops and lectures on SQL and Python programming, fostering skill development in programming languages at over 10 educational institutions.

Successfully conducted over 100 seminars, demonstrating a passion for knowledge-sharing and promoting excellence in computer programming.

Facilitated interactive learning experiences, contributing to the growth of programming communities within diverse academic environments.

Environment: SQL, Python, DBMS, C and C++ Programming Language

EDUCATION:

-University of North Texas, USA

Master of Science in Artificial Intelligence

Aug 2021 – May 2023

-CMR College of Engineering & Technology, India.

Bachelor of Technology in Electronics and Communication Engineering

Aug 2017 – Apr 2021



Contact this candidate