Post Job Free
Sign in

Data Engineer Azure

Location:
Piscataway, NJ
Posted:
August 24, 2023

Contact this candidate

Resume:

Phanindra S

Sr. Data Engineer

+1-848-***-****

*************@*****.***

PROFESSIONAL SUMMARY:

Highly skilled and results-driven Sr. Data Engineer with 8+ years of experience designing, developing, and implementing robust data solutions.

Skilled in programming languages, including Python and Scala, with expertise in the Big Data and Hadoop ecosystem, including Hive, Spark, PySpark, Kafka, Flume, Oozie, Zookeeper, Apache NiFi, Sqoop, HDFS, and HBase.

Experienced in working with Hadoop distributions such as Cloudera, Hortonworks Data Platform (HDP), and MapReduce.

Expertise Working with Azure Services ADF, Blob, Event Grid, Azure Synapse, AZ Copy.

Good Working Knowledge of Azure Stack.

Extensive knowledge of data warehousing solutions, including RedShift, BigQuery, and Snowflake, enabling efficient storage and retrieval of large-scale datasets.

Strong experience in writing scripts using Python API, PySpark API and Spark API for analysing the

data.

Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg,

embedPy, NumPy, and Beautiful Soup.

Fluent programming experience with Scala, Java, Python, SQL, T - SQL, R.

Hands-on experience in developing and deploying enterprise-based applications using major Hadoop.

ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark

GraphX, Spark SQL, Kafka.

Well-versed in cloud platforms like AWS and Azure, leveraging their services for scalable and cost-effective data processing.

Competent in scripting with Python and Bash, automating processes, and optimizing data workflows.

Strong database expertise, encompassing MySQL, Oracle, Cassandra, and MongoDB, ensuring efficient data storage and retrieval.

Proficient in CI/CD practices using tools like Jenkins, Maven, Docker, and Kubernetes, streamlining development and deployment processes.

Experienced in version control systems, including Git, GitLab, and GitHub, ensuring collaboration and code management.

Adept in data visualization tools like Tableau, Power BI, and Looker, transforming complex data into actionable insights.

Well-versed in implementing monitoring solutions, including ELK, Splunk, Nagios, Prometheus, and Dynatrace, ensuring data pipeline reliability and performance optimization.

Profound in working with various operating systems, including Linux, Windows, and macOS, adapting to diverse environments.

Excellent problem-solving and analytical skills, coupled with strong communication and leadership abilities.

Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming.

Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.

Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka

TECHNICAL SKILLS:

Programming Languages

Python, Scala

Big Data/ Hadoop Ecosystem

Hive, Spark, PySpark, Kafka, Flume, Oozie, Zookeeper, Apache NiFi, Sqoop, HDFS, HBase

Hadoop Distributions

Cloudera, Hortonworks Data Platform (HDP), MapReduce

Data Warehouses

RedShift, BigQuery, Snowflake

Cloud Platforms

AWS, Azure

Scripting

Python, Bash

Databases

MySQL, Oracle, Cassandra, MongoDB

CI/CD

Jenkins, Maven, Docker, Kubernetes

Version control

Git, GitLab, GitHub

Data visualization

Tableau, Power BI, Looker

Monitoring

ELK, Splunk, Nagios, Prometheus, Dynatrace

Operating systems

Linux, Windows, macOS

PROFESSIONAL EXPERIENCE:

CSAA- Remote

Data Engineer

Responsibilities:

Dec 2022 - Present

Involved in data warehouse implementations using Azure SQL Data Warehouse, SQL Database, Azure Data Lake Storage (ADLS), and Azure Data Factory v2.

Created specifications for ETL processes, finalized requirements, and prepared specification documents.

Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed optimized database architecture.

Created Azure Data Factory for copying data from Azure BLOB storage to SQL Server

Implemented ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks

Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.

Architected & implemented medium to large-scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

Created Databricks notebooks using SQL, Python, and automated notebooks using jobs.

Worked on a direct query using Power BI to compare legacy data with the current data and generated reports and stored and dashboards.

Created Reusable ADF pipelines to call REST APIs and consume Kafka Events.

Used Azure Logic Apps for scheduling ADF pipelines.

Developed and configured Build and Release (CI/CD) processes using Azure DevOps, along with managing application code using Azure GIT with required security standards for .Net and Azure applications.

Migrated the ETL logic, which was currently running in SSIS, to Azure Pipeline in Azure data factory without any change in business logic.

Implemented CI/CD pipelines using Jenkins and built and deployed the applications.

Ensured deliverables (Daily, Weekly & Monthly MIS Reports) are prepared to satisfy the project requirements cost, and schedule.

Experience on Splunk query language and Monitored Database Connection Health by using Splunk DB connect health dashboards.

Experience in Create and Manage Splunk DB connects Identities, Database Connections, Database Inputs, Outputs, lookups, access controls.

Environment: SQL Database, Azure Data Factory, Azure DataLake Storage, Azure Synapse Analytics, Azure Synapse Workspace, ADF pipelines, Azure DataBricks, Azure SQL Data Warehouse, Azure DataLake Analytics, Power BI, Jenkins, Kafka, REST API, GIT and Splunk.

Walmart- Bentonville, AR

Python/Data Engineer June 2021 – Dec 2022

Responsibilities:

Developed Restful Micro Services using Django and deployed on AWS servers using EBS and EC2.

Worked on Flask and Snowflake queries.

Used Jenkins for continuous integration and delivery platform over GIT.

Automated most of the daily tasks using Python scripting.

Involved in the CI/CD pipeline management for managing the weekly releases.

Worked on Jira for managing tasks and improving individual performance.

Making recommendations to the team in terms of appropriate testing techniques, shared testing tasks

Defined different Django API profiling techniques for faster rendering information.

Later Migrated applications from Django to Flask and NoSQL (DynamoDB) to SQL(Snowflake)

Used Jenkins, AWS, and Bitbucket environments.

Worked on Automating, Configuring, and deploying instances on AWS, Azure environments, and Data centres, also familiar with EC2, Cloud watch, Cloud Formation, and managing security groups on AWS.

Troubleshot Production issues pertaining to AWS Cloud Resources and Application Infrastructure point of view.

Developed Shell scripts for cleansing and validating data files using utilities like AWK, Sed, grep, and other UNIX commands.

Have also been using AWS cloud technologies such as S3 for data storage, EMR for data processing, EC2 for virtual Linux instances, CloudWatch for log analysis, and Lambda functions for triggering various jobs in AWS cloud services.

Developed Shell scripts implementing PL/SQL queries for data migration & batch processing.

Wrote Python scripts to parse JSON documents and load the data in the PostgreSQL database.

Used Apache Airflow programmatically to schedule and monitor workflows and orchestrate complex data pipelines.

Experience with building data pipelines in Python/Pyspark and building Python DAG in Apache Airflow.

Worked with JSON-based REST Web services.

Moved data warehouse from on-prem Teradata/PostgreSQL to AWS Cloud. Implemented a proof of concept deploying this product in Amazon Web Services AWS Worked with AWS to implement the client-side encryption as Dynamo DB does not support at-rest encryption at this time.

Designed, implemented, and maintained solutions for using Docker, Jenkins, Git, and Puppet for microservices and continuous deployment.

Working on the improvement of existing machine learning algorithms to extract data points accurately.

Responsible for setting up Python REST API framework using Flask.

Engaged in Design, Development, Deployment, Testing, and Implementation.

Developed framework for converting existing PowerCenter mappings to PySpark (Python and Spark) Jobs.

Created Pyspark frame to bring data from DB2 to Amazon S3.

Provided guidance to the development team working on PySpark as an ETL platform.

Optimized the Pyspark jobs to run on Kubernetes Cluster for faster data processing.

Environment: Python 3.6, Django Framework 1.3, Flask Framework, AngularJS, CSS, DynamoDB, MySQL, HTML5/CSS, Snowflake, Amazon Web Service (AWS), S3, EC2, PyCharm, Microsoft Visual Code, Linux, Shell Scripting.

FM Global - Norwood, MA

Data Engineer Sep 2020 - May 2021

Responsibilities:

Worked in all the phases of SDLC and agile development methodologies including SCRUM, involved in daily SCRUM meetings to keep track of the ongoing project status and issues using JIRA.

Implemented AWS solutions using E2C, S3, RDS, EBS, Elastic Load Balancer, Glue Pipelines, Glue Crawler, Auto scaling groups, Optimized volumes, and EC2 instances and created monitors, alarms, and notifications for EC2 hosts using Cloud Watch.

Used Spark SQL to process a huge amount of structured data.

Worked on Data Analysis, processed Data, and performed Analytics on the data from various systems.

Performed transformations and actions on the datasets using Spark Core for log transaction aggregation and analytics.

Worked on Spark Data Frames and Spark SQL for Various Data Analytics Processing

Imported the data from different sources like AWS S3 into Spark Data Frame

Imported data from AWS S3 and into Spark Data Frames and performed transformation and actions.

Designed and deployed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, and Impala.

Designed Snowflake Schema for Data Warehouse, ODS architecture by using tools like Data Model, Erwin.

Developed data models and data migration strategies utilizing concepts of the snowflake schema.

Implemented ETL pipelines using Pyspark in the AWS cloud.

Analysed and developed Data Integration templates to extract, cleanse, transform, integrate, and load to data marts for user consumption. Reviewed the code against standards and checklists.

Created a deployment document for the developed code and provide support during the code migration phase.

Created Initial Unit Test Plan to demonstrate that the software, scripts, and databases developed conforms to the Design Document.

Worked on SQL tools like TOAD and SQL Developer to run SQL Queries and validate the data.

Created API integrations, pulling data from and service application’s API.

Used Non-Relational databases for storing and retrieving the data.

Used GIT repositories for software management and version control systems.

Experience on log parsing, complex Splunk searches, including external table lookups, Splunk data flow, components, features and product capability.

Experience on Splunk query language and Monitored Database Connection Health by using Splunk DB connect health dashboards.

Expert in installing and using Splunk apps for Unix and Linux (Splunk nix).

Used Time chart attributes such as Span, Bins, Tag, and Event Types. Created and configured management reports and dashboards.

Environment: Python, AWS, EC2, S3, CloudWatch, Spark SQL, Kafka, Scala, Snowflake, PySpark, TOAD, GIT, REST, JSON, GitHub, SQL, Oracle, LINUX, JIRA, Splunk

Josh Software, Pune, India

Data Engineer Dec 2018 - Aug 2020

Responsibilities:

Assisted in developing and maintaining ETL pipelines using Python and Azure Data Factory (ADF) to extract, transform, and load data from various sources into target systems.

Worked with the Apache Hadoop ecosystem, including Hive, HDFS, and HBase, to process and analyze large-scale datasets, ensuring efficient data storage and retrieval.

Utilized Sqoop for seamless data transfer between Hadoop and relational databases, enabling efficient data integration and synchronization.

Developed and maintained RESTful APIs to enable data access and integration between different systems.

Assisted in data processing and analytics tasks using Spark and PySpark, leveraging the power of distributed computing for efficient data transformations.

Utilized SparkSQL to query and analyze structured data within Spark, enabling efficient data retrieval and manipulation.

Worked with Snowflake, a cloud-based data warehousing platform, for scalable and performant data storage and analytics.

Assisted in writing Pig scripts for data processing tasks, enabling efficient data manipulation and transformation on Hadoop.

Implemented real-time data streaming solutions using Kafka, enabling the processing and analysis of high-velocity data streams.

Worked with HDFS (Hadoop Distributed File System) for storing and managing large datasets, ensuring fault tolerance and high availability.

Utilized Git for version control and collaboration, ensuring proper code management, tracking changes, and enabling teamwork.

Assisted in building and maintaining CI/CD pipelines using Jenkins, automating the build, testing, and deployment processes.

Developed interactive data visualizations and reports using Power BI, enabling stakeholders to gain insights and make data-driven decisions.

Utilized ELK (Elasticsearch, Logstash, Kibana) stack for log analysis and monitoring, ensuring efficient data exploration and visualization.

Assisted in orchestrating data workflows and scheduling jobs using Oozie, ensuring reliable and timely data processing and automation.

Collaborated with cross-functional teams, utilizing SQL and JIRA for effective communication, task management, and tracking progress within the development cycle.

Environment: Python, Azure, ADF, ETL, Apache Hadoop, Hive, HDFS, HBase, Sqoop, REST, Spark, PySpark, SparkSQL, Snowflake, Pig, Sqoop, Kafka, HDFS, GIT, Jenkins, Power BI, ELK, Oozie, SQL, JIRA.

CMC Hyderabad, India

Python Developer July 2015 - Dec 2018

Responsibilities:

Designed the back end and part of the front end of the application using Python 3 on the Django Web Framework

Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.

Implementation of machine learning methods, optimization, and visualization. Mathematical methods of statistics such as Regression Models, Decision Trees, Naïve Bayes, Ensemble Classifiers, Hierarchical Clustering, and Semi-Supervised Learning on different datasets using Python.

Worked extensively on Django's MVT Framework.

Worked extensively on asynchronous requests to the server.

Used different Python packages for implementation like python-dateutil, requests, JSON, SQL parse, DateTime, and many more.

Developed Shell scripts for cleansing and validating data files using utilities like AWK, Sed, grep, and other UNIX commands.

Developed Shell scripts implementing PL/SQL queries for data migration & batch processing.

Wrote Python scripts to parse JSON documents and load the data in the PostgreSQL database.

Worked with JSON-based REST Web services.

Working on the improvement of existing machine learning algorithms to extract data points accurately.

Responsible for setting up Python REST API framework using Flask.

Used HTML, CSS, AJAX, and Bootstrap 4 to design and developed the user interface of the website.

Used GIT version control for following proper developer workflow.

Skilled in using Collections in Python for manipulating and looping through objects.

Engaged in Design, Development, Deployment, Testing, and Implementation.

Different testing methodologies like unit testing, Integration testing and web application testing.

Environment: Python, Django, Machine Learning, MySQL, NOSQL, PL/SQL, GIT, SVN, Unit Test, Linux, REST APIs, Apache Cassandra, MongoDB, Shell Scripting, Windows, PyTest, Jira



Contact this candidate