Post Job Free
Sign in

Data Engineer Associate Software

Location:
Phoenix, AZ
Posted:
May 15, 2023

Contact this candidate

Resume:

Sandhya Kumari Kada

Email: *************@*****.***

Phone: +1-623******

Professional Summary

● Data Engineer with 7+ years of experience in Bigdata and Cloud engineering with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, programming languages like Python and PySpark, Scala including Big Data technologies like Hadoop, Hive, Spark, Hbase,Cassandra,Kafka and Elastic search.

● Experience in using Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NOSQL DB, Azure HDInsight and Databricks.

● Experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, SNS, Autoscaling, Azure SQL, Azure Synapse,Elasticsearch, Cosmos DB, Databricks, ADLS, Blob S, EMR and other services of the AWS family.

● Hands on experience configuring, Installing and using major components of the Hadoop ecosystem such as Hadoop, HDFS, Job Tracker, Task Tracker, Name Node, Data Node in Cloudera (CDP & CDH).

● Extensive Knowledge on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) using Scala, PySpark, Spark SQL and Spark-Shell and spark optimization techniques.

● Used sqoop to import and export data between HDFS and RDMS.

● Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.

● Experience with different file formats like Avro, parquet, Json, and XML.

● Experience in building ETL scripts in different languages like PLSQL, Informatica, Hive and PySpark and expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.

● Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.

● Experienced in implementation of Data Quality Checks metadata in NiFi Pipeline.

● Creating Splunk dashboards and alerts using post get method for data consistency, latency and threshold reports

● Experienced in designing and developing operational data stores, data warehouses, and data marts.

● Hands on experience of UNIX and shell scripting to automate scripts.

● Query optimization, execution plan, and Performance tuning of queries for better performance in SQL.

● Hands on experience of Snowflake in ETL(AWS) to extract the data using S3, SNS & Streaming.

● Extensive working experience in all phases of Software Development Lifecycle following Agile, Waterfall methodologies.

Certifications

Microsoft Certified Azure Fundamentals AZ-900

Microsoft Certified Azure Data Engineer DP-203

Technical Skills

Work Experience

Client: Expert Technology Services April 2023-present Senior Data Engineer

Roles & Responsibilities:

Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.

Developed Scala code on scala IDE using sbt connecting cassandra as source file, transforming in scala and saving it to Elastic search.

Experience on power BI desktop using power query editor to clean, modelling, analyzing & visualizing the data to generate dashboards for weekly statistics report (WSR) using power point. Environment: Scala IDE, Cassandra, Elastic search, Spark, SBT,Power BI, power query editor, power point presentation

AWS Services S3, EC2, EMR, Redshift, ElasticsearchRDS, Lambda, Kinesis, SNS, SQS,

AMI, IAM, Cloud formation

Hadoop Components / Big Data HDFS, Cloudera Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Cassandra

Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, PySpark, Airflow, Snowflake

Spark Components Apache Spark, Data Frames, Spark SQL, Spark, YARN, Pair RDDs

Web Technologies / Other

components

XML, Log4j, HTML, XML, CSS

Server-Side Scripting UNIX Shell Scripting

Databases Oracle, Microsoft SQL Server, MySQL

Programming Languages Scala, Impala, Python

Web Servers Apache Tomcat, WebLogic

NoSQL Databases NoSQL Database (HBase, MongoDB, Cassandra) Currently Exploring Apache NiFi, Splunk

Cloud Services AWS, Azure

ETL Tools Talend Open Studio & Talend Enterprise Platform Reporting and ETL Tools Tableau, PowerBI, SSIS

Data -Streaming Batch Processing & Real-time streaming using KAFKA Client: MasterCard International August 2022-April 2023 Senior Data Engineer

Roles & Responsibilities:

Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.

Developing Data Quality checks on Hadoop and Oracle database tables using Nifi and Splunk alerts

Developing reporting alerts using spark and splunk

Developing Nifi pipeline for script triggers and scheduling

Testing in staging prod and deploying code in production environment once development is done

Developing Python scripts for data quality checks in linux environment.

Monitoring day to day activities in ALM dashboard via Rally

Worked on Nifi pipeline enhancements

Creating threshold, latency and consistency alerts in Splunk

Developed Python scripting codes for data inserts, data consistency & Latency reports alerts via email.

Environment: Spark, PySpark, Airflow, ALM(Rally), Bash Scripting, NiFi, Splunk, Toad, Dbeaver, Python.

Client: Humana Healthcare June 2020- August 2022

Data Engineer

Technical Platform: HUE CDP, Sqoop 2.0, Hadoop 1.6, Spark 2.11 on Jupyter notebook on notebook Roles & Responsibilities:

Working on developing code using HQL for files ingestion, doing unit testing and deploying code on PROD.

PROD Cluster consists of CLI and HUE web browser .

Scheduling purpose we work on crontab and oozie.

For files Ingestion, whose sources are oracle and SQL we implement Hive code in unix scripts

Working with Real-Time Ingestions like Spark, Flume Applications.

Flexible with Spark SQL and Hive QL

Written Python Scripts to parse the XML documents and load the data in database

Experience in working on streamsets application for processing real time data from API to HDFS through Kafka consumption and Kafka production.

Created Application Interface Document for the downstream to create new interface to transfer

Ingested data in mini-batches and performs data frame transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics

Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python

(spark)

Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally with tez engine.

Worked on cloud era migration from CDH to CDP during this migration pig and flume technologies replaced with streamsets and spark application.

Hans on sqoop for connecting with MySQL/Oracle database for importing the RDMS data to Hadoop.

Documentation & Defect tracking management and debugging & performance tuning. Environment: HUE, Scala, Hadoop (HDFS, MapReduce, Yarn),Cloudera CDP, Spark, PySpark, Oozie, Hive, Sqoop, HBase, Spark Streaming, Streamsets, Kafka,Unix Shell Scripting, Crontab. Client: Experian December 2019-May 2020

AWS Data Engineer

Roles & Responsibilities:

Worked in an Agile environment with weekly Sprints and daily Scrum meetings.

Collaborated with Architects and Subject Matter Experts to review business requirements and build sources to target data mapping documents and pipeline design documents

Worked in an Agile environment with weekly Sprints and daily Scrum meetings

Used Python to convert Hive/SQL queries into RDD transformations in Apache Spark using Scala.

Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3 redis database.

Used AWS data pipeline for Data Extraction, Transformation and Loading from heterogeneous data sources.

Designed and developed Spark workflows using Scala in EMR for data pull from AWS S3 bucket and EC2 for creating policies.

Implemented Spark RDD transformations to Map business analysis and applied actions on top of transformations.

Environment:

AWS EMR, EC2, S3, Redis, scala, spark.

Client: Vodafone Feb 2019- December 2019

Azure Data Engineer

Roles & Responsibilities:

Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.

Extensively worked with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics

Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks

Provided support for Azure Databricks and building data pipelines using Pyspark.

Created Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, Pipeline to extract, transform and load data from different sources like Azure SQL, Blob storage, Azure SQL DW, write-back tool and backwards.

Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share.

Designed and configured Azure Cloud relational servers and databases, analyzing current and future business requirements.

Worked on Power Shell scripts to automate the Azure cloud system creation of Resource groups, Web Applications, Azure Storage Blobs & Tables, firewall rules.

Designed and deployed data pipelines using Data Lake, Databricks, and Airflow.

Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.

Created and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.

Created several Databricks Spark jobs with PySpark to perform several tables to table operations.

Created data pipeline for different events in Azure Blob storage into Hive external tables. Used various Hive optimization techniques like partitioning, bucketing and Map join.

Involved in developing data ingestion pipelines on Azure HD Insight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).

Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.

Documentation & Defect tracking management and debugging & performance tuning..

Worked with Azure Synapse in spark pool to process large volume of using pyspark. Environment: Azure Synapse, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, MongoDB, Cassandra, Teradata, Azure AD, Git, Blob Storage, Data Factory, Scala, Hadoop (HDFS, MapReduce, Yarn), Spark, PySpark, Hive, Oozie, Ubuntu, Winscp Client: Bharat Sanchar Nigam Limited (BSNL) Jan 2018- Jan 2019 Software Engineer

Roles & Responsibilities:

Worked on Google Device, Hotstar development on Application building for interface between BSNL data and UI, Caching Application into PLSQL database.

Worked on Broadband Services and created effective Features that driven the data towards efficient features

Currently working on Oracle Database 11.2.0.1 version in Extracting Features, Data and building queries for development and JAVA coding for Integration

Scheduling jobs during crontab

Developing updated feature in broadband

Working on CRM application for customer unique number registration

Working on unix shell scripting

Environment: PL/SQL, Stored procedures, Triggers, unix shell scripting, crontab. Client: Goodman Fielder November 2017- Jan 2018

ETL Data Engineer

Roles & Responsibilities:

● Used SSIS to perform ETL process (data extraction, transformation, and loading, validation).

● Analyzed project-related problems and created innovative solutions involving technology, analytic methodologies, and advanced solution components.

● Used Excel and Tableau to analyze the health care website data and build informative reports.

● Updated the website based on analysis; this led to an increase in customers' overall experience by 8%.

● Hands-on experience in Linux administration (RHEL7), including installing, upgrading, and configuring Linux-based operating systems. Set up users and groups, ACL file permissions, assigned users, and groups quota.

● Worked with developers to design and deploy scalable, supportable infrastructure.

● Created stored procedures using PL/SQL and tuned the databases and backend process.

● Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.

● Took personal responsibility for meeting deadlines and delivering high quality work. Strived to continually improve existing methodologies, processes, and deliverable templates.

● Used Snowflake with Spark streaming created snow pipe using AWS S3 as source and target and AWS SNS for sending notifications once data is loaded and required ETL transformation takes place Environment:

SSIS packages, Visual studio, PL/SQL, Snowflake, AWS S3, AWS SNS,Linux, SQL, Spark Streaming, Exel, Tableau

Client: TCS May 2017- November

2017

Associate Software Engineer Trainee

Roles & Responsibilities:

Trained on SQL, Core Java,javascript

Worked on POC developing a banking page and its services

Worked on JAVA MVC insert, javascript, CSS & HTML

Worked on POC developing Java code based on OOPS concept

Cetifications on core Java on ION digitsl platform Environment: SQL, Core Java, javascript, Java MVC, HTML, CSS, ION digital Project: Robotics Line Follower December 2016- April 2017 Internship

Roles:

Worked on ROBO Line follower programming.

Developing C++ code to make to robo movement on a line mapped accordingly

Mapping is done using a tape and the sensor in the robo marks the tape as the target and follow it accordingly

In future it will be very helpful in restaurants, households, offices to reduce manual intervention

The C++ code which is used to implemented have all robo positions in different methods like down, up, left and right.

This program is inserted in Arduino microcontroller placed inside the robo Education

B.Tech GITAM University 2013-2017



Contact this candidate