Sandhya Kumari Kada
Email: *************@*****.***
Phone: +1-623******
Professional Summary
● Data Engineer with 7+ years of experience in Bigdata and Cloud engineering with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, programming languages like Python and PySpark, Scala including Big Data technologies like Hadoop, Hive, Spark, Hbase,Cassandra,Kafka and Elastic search.
● Experience in using Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos, NOSQL DB, Azure HDInsight and Databricks.
● Experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, SNS, Autoscaling, Azure SQL, Azure Synapse,Elasticsearch, Cosmos DB, Databricks, ADLS, Blob S, EMR and other services of the AWS family.
● Hands on experience configuring, Installing and using major components of the Hadoop ecosystem such as Hadoop, HDFS, Job Tracker, Task Tracker, Name Node, Data Node in Cloudera (CDP & CDH).
● Extensive Knowledge on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) using Scala, PySpark, Spark SQL and Spark-Shell and spark optimization techniques.
● Used sqoop to import and export data between HDFS and RDMS.
● Experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
● Experience with different file formats like Avro, parquet, Json, and XML.
● Experience in building ETL scripts in different languages like PLSQL, Informatica, Hive and PySpark and expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
● Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
● Experienced in implementation of Data Quality Checks metadata in NiFi Pipeline.
● Creating Splunk dashboards and alerts using post get method for data consistency, latency and threshold reports
● Experienced in designing and developing operational data stores, data warehouses, and data marts.
● Hands on experience of UNIX and shell scripting to automate scripts.
● Query optimization, execution plan, and Performance tuning of queries for better performance in SQL.
● Hands on experience of Snowflake in ETL(AWS) to extract the data using S3, SNS & Streaming.
● Extensive working experience in all phases of Software Development Lifecycle following Agile, Waterfall methodologies.
Certifications
Microsoft Certified Azure Fundamentals AZ-900
Microsoft Certified Azure Data Engineer DP-203
Technical Skills
Work Experience
Client: Expert Technology Services April 2023-present Senior Data Engineer
Roles & Responsibilities:
Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.
Developed Scala code on scala IDE using sbt connecting cassandra as source file, transforming in scala and saving it to Elastic search.
Experience on power BI desktop using power query editor to clean, modelling, analyzing & visualizing the data to generate dashboards for weekly statistics report (WSR) using power point. Environment: Scala IDE, Cassandra, Elastic search, Spark, SBT,Power BI, power query editor, power point presentation
AWS Services S3, EC2, EMR, Redshift, ElasticsearchRDS, Lambda, Kinesis, SNS, SQS,
AMI, IAM, Cloud formation
Hadoop Components / Big Data HDFS, Cloudera Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Cassandra
Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, PySpark, Airflow, Snowflake
Spark Components Apache Spark, Data Frames, Spark SQL, Spark, YARN, Pair RDDs
Web Technologies / Other
components
XML, Log4j, HTML, XML, CSS
Server-Side Scripting UNIX Shell Scripting
Databases Oracle, Microsoft SQL Server, MySQL
Programming Languages Scala, Impala, Python
Web Servers Apache Tomcat, WebLogic
NoSQL Databases NoSQL Database (HBase, MongoDB, Cassandra) Currently Exploring Apache NiFi, Splunk
Cloud Services AWS, Azure
ETL Tools Talend Open Studio & Talend Enterprise Platform Reporting and ETL Tools Tableau, PowerBI, SSIS
Data -Streaming Batch Processing & Real-time streaming using KAFKA Client: MasterCard International August 2022-April 2023 Senior Data Engineer
Roles & Responsibilities:
Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.
Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.
Developing Data Quality checks on Hadoop and Oracle database tables using Nifi and Splunk alerts
Developing reporting alerts using spark and splunk
Developing Nifi pipeline for script triggers and scheduling
Testing in staging prod and deploying code in production environment once development is done
Developing Python scripts for data quality checks in linux environment.
Monitoring day to day activities in ALM dashboard via Rally
Worked on Nifi pipeline enhancements
Creating threshold, latency and consistency alerts in Splunk
Developed Python scripting codes for data inserts, data consistency & Latency reports alerts via email.
Environment: Spark, PySpark, Airflow, ALM(Rally), Bash Scripting, NiFi, Splunk, Toad, Dbeaver, Python.
Client: Humana Healthcare June 2020- August 2022
Data Engineer
Technical Platform: HUE CDP, Sqoop 2.0, Hadoop 1.6, Spark 2.11 on Jupyter notebook on notebook Roles & Responsibilities:
Working on developing code using HQL for files ingestion, doing unit testing and deploying code on PROD.
PROD Cluster consists of CLI and HUE web browser .
Scheduling purpose we work on crontab and oozie.
For files Ingestion, whose sources are oracle and SQL we implement Hive code in unix scripts
Working with Real-Time Ingestions like Spark, Flume Applications.
Flexible with Spark SQL and Hive QL
Written Python Scripts to parse the XML documents and load the data in database
Experience in working on streamsets application for processing real time data from API to HDFS through Kafka consumption and Kafka production.
Created Application Interface Document for the downstream to create new interface to transfer
Ingested data in mini-batches and performs data frame transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics
Involved in Hive/SQL queries performing spark transformations using Spark RDDs and Python
(spark)
Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally with tez engine.
Worked on cloud era migration from CDH to CDP during this migration pig and flume technologies replaced with streamsets and spark application.
Hans on sqoop for connecting with MySQL/Oracle database for importing the RDMS data to Hadoop.
Documentation & Defect tracking management and debugging & performance tuning. Environment: HUE, Scala, Hadoop (HDFS, MapReduce, Yarn),Cloudera CDP, Spark, PySpark, Oozie, Hive, Sqoop, HBase, Spark Streaming, Streamsets, Kafka,Unix Shell Scripting, Crontab. Client: Experian December 2019-May 2020
AWS Data Engineer
Roles & Responsibilities:
Worked in an Agile environment with weekly Sprints and daily Scrum meetings.
Collaborated with Architects and Subject Matter Experts to review business requirements and build sources to target data mapping documents and pipeline design documents
Worked in an Agile environment with weekly Sprints and daily Scrum meetings
Used Python to convert Hive/SQL queries into RDD transformations in Apache Spark using Scala.
Created scripts to read CSV, JSON and parquet files from S3 buckets in Python and load into AWS S3 redis database.
Used AWS data pipeline for Data Extraction, Transformation and Loading from heterogeneous data sources.
Designed and developed Spark workflows using Scala in EMR for data pull from AWS S3 bucket and EC2 for creating policies.
Implemented Spark RDD transformations to Map business analysis and applied actions on top of transformations.
Environment:
AWS EMR, EC2, S3, Redis, scala, spark.
Client: Vodafone Feb 2019- December 2019
Azure Data Engineer
Roles & Responsibilities:
Gathered business requirements, working closely with business users, project leaders and developers and analyzed the business requirements, designed data models, analyzed existing systems, and propose improvements in processes.
Extensively worked with Azure cloud platform (HDInsight, Data Lake, Databricks, Blob Storage, Data Factory, Synapse, SQL, SQL DB, DWH and Data Storage Explorer.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics
Ingested data to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks
Provided support for Azure Databricks and building data pipelines using Pyspark.
Created Pipelines in Azure Data Factory (ADF) using Linked Services, Datasets, Pipeline to extract, transform and load data from different sources like Azure SQL, Blob storage, Azure SQL DW, write-back tool and backwards.
Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share.
Designed and configured Azure Cloud relational servers and databases, analyzing current and future business requirements.
Worked on Power Shell scripts to automate the Azure cloud system creation of Resource groups, Web Applications, Azure Storage Blobs & Tables, firewall rules.
Designed and deployed data pipelines using Data Lake, Databricks, and Airflow.
Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.
Created and provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
Created data pipeline for different events in Azure Blob storage into Hive external tables. Used various Hive optimization techniques like partitioning, bucketing and Map join.
Involved in developing data ingestion pipelines on Azure HD Insight Spark cluster using Azure Data Factory and Spark SQL. Also Worked with Cosmos DB (SQL API and Mongo API).
Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
Documentation & Defect tracking management and debugging & performance tuning..
Worked with Azure Synapse in spark pool to process large volume of using pyspark. Environment: Azure Synapse, Databricks, Data Lake, Cosmos DB, MySQL, Azure SQL, MongoDB, Cassandra, Teradata, Azure AD, Git, Blob Storage, Data Factory, Scala, Hadoop (HDFS, MapReduce, Yarn), Spark, PySpark, Hive, Oozie, Ubuntu, Winscp Client: Bharat Sanchar Nigam Limited (BSNL) Jan 2018- Jan 2019 Software Engineer
Roles & Responsibilities:
Worked on Google Device, Hotstar development on Application building for interface between BSNL data and UI, Caching Application into PLSQL database.
Worked on Broadband Services and created effective Features that driven the data towards efficient features
Currently working on Oracle Database 11.2.0.1 version in Extracting Features, Data and building queries for development and JAVA coding for Integration
Scheduling jobs during crontab
Developing updated feature in broadband
Working on CRM application for customer unique number registration
Working on unix shell scripting
Environment: PL/SQL, Stored procedures, Triggers, unix shell scripting, crontab. Client: Goodman Fielder November 2017- Jan 2018
ETL Data Engineer
Roles & Responsibilities:
● Used SSIS to perform ETL process (data extraction, transformation, and loading, validation).
● Analyzed project-related problems and created innovative solutions involving technology, analytic methodologies, and advanced solution components.
● Used Excel and Tableau to analyze the health care website data and build informative reports.
● Updated the website based on analysis; this led to an increase in customers' overall experience by 8%.
● Hands-on experience in Linux administration (RHEL7), including installing, upgrading, and configuring Linux-based operating systems. Set up users and groups, ACL file permissions, assigned users, and groups quota.
● Worked with developers to design and deploy scalable, supportable infrastructure.
● Created stored procedures using PL/SQL and tuned the databases and backend process.
● Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.
● Took personal responsibility for meeting deadlines and delivering high quality work. Strived to continually improve existing methodologies, processes, and deliverable templates.
● Used Snowflake with Spark streaming created snow pipe using AWS S3 as source and target and AWS SNS for sending notifications once data is loaded and required ETL transformation takes place Environment:
SSIS packages, Visual studio, PL/SQL, Snowflake, AWS S3, AWS SNS,Linux, SQL, Spark Streaming, Exel, Tableau
Client: TCS May 2017- November
2017
Associate Software Engineer Trainee
Roles & Responsibilities:
Trained on SQL, Core Java,javascript
Worked on POC developing a banking page and its services
Worked on JAVA MVC insert, javascript, CSS & HTML
Worked on POC developing Java code based on OOPS concept
Cetifications on core Java on ION digitsl platform Environment: SQL, Core Java, javascript, Java MVC, HTML, CSS, ION digital Project: Robotics Line Follower December 2016- April 2017 Internship
Roles:
Worked on ROBO Line follower programming.
Developing C++ code to make to robo movement on a line mapped accordingly
Mapping is done using a tape and the sensor in the robo marks the tape as the target and follow it accordingly
In future it will be very helpful in restaurants, households, offices to reduce manual intervention
The C++ code which is used to implemented have all robo positions in different methods like down, up, left and right.
This program is inserted in Arduino microcontroller placed inside the robo Education
B.Tech GITAM University 2013-2017