Jr. Bigdata Engineer

Location:

Marlborough, MA

Posted:

April 21, 2022

Contact this candidate

Resume:

Sri Harsha Panguluri

*************@*****.*** 512-***-****

Professional Summary:

●3+ years of experience as Data Engineer in BIGDATA using HADOOP, Spark framework and Analysis, Design, Development, Documentation, Deployment and Integration using SQL and Big Data technologies.

●Well versed in configuring and administering the Hadoop Cluster using Cloudera and Horton works.

●Experience in creating separate virtual data warehouses with difference size classes in AWS Snowflake.

●Experience with data transformations utilizing Snow SQL and Python in Snowflake.

●Hands-on experience in bulk loading and unloading data into Snowflake tables using COPY command.

●Experience in working with AWS S3 and Snowflake cloud Data warehouse.

●Expertise in building Azure native enterprise applications and migrating applications from on-premises to Azure environment.

●Implementation experience with Data lakes and Business intelligence tools in Azure.

●Experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/ Apache Storm, Kafka and Flume

●Currently working on Spark applications extensively using Scala as the main programming platform

●Processing this data using Spark Streaming API with Scala.

●Used Spark Data Frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building

●Developed RESTful web Services to retrieve, transform and aggregate the data from different end points to Hadoop (Hbase, Solr).

●Created Jenkins Pipeline using Groovy scripts for CI/CD.

●Exposure to Data Lake Implementation and developed Data pipelines and applied business logic utilizing Apache Spark

●Involved converting Cassandra/Hive/SQL queries into Spark transformations using RDD's and Scala.

●Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

●Hands on experience doing real time on NO-SQL databases like MongoDB, HBase and Cassandra

●Experience in creating MongoDB clusters and hands on experience with complex MongoDB aggregate functions and mapping

●Experience in using Flume to load log files into HDFS and Oozie for data scrubbing and process

●Experience on performance tuning of HIVE queries and Map Reduce programs for scalability and faster execution

●Experienced in handling real time analytics using HBase on top of HDFS data

●Experience in transforming, Grouping, Aggregations, Joins using Kafka Streams API

●Hands on experience deploying KAFKA connect in standalone and distributed mode creating docker containers using DOCKER

●Created TOPICS and written KAFKA producer and consumer in Python as required, developed KAFKA source/sink connectors to store the streaming new data into topics, from topics to required different database by performing ETL tasks also used Akka toolkit with Scala to perform some builds

●Experienced in collecting metrics for Hadoop clusters using Ambari & Cloudera Manager.

●Has knowledge on Storm architecture, Experience in using data modeling tools like Erwin

●Excellent experience in using scheduling tools to automate batch jobs

●Hands on experience in using Apache SOLR/Lucene

●Expertise using SQL Server, SQL, queries, procedures, functions

●Hands on experience in App Development using Hadoop, RDBMS and Linux shell scripting

●Strong experience in Extending Hive and Pig core functionality by writing custom UDFs

●Experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across big volume of structured and unstructured data.

●Extensive experience in Text Analytics, developing different statistical Machine Learning, Data mining solutions to various business problems and gathering data visualization using Python and R.

●Ability to work as team and individually on many cutting-edge technologies with excellent management skills, business understanding and strong communication skills

Experience

Data Engineer 01/2021 - Present

Quantiphi Marlborough,MA,USA

•Performed Data ingestion loaded on-prem data to GCS buckets and further loaded data into Data Warehouse BigQuery created Authorized views and automated pipeline using CICD and Terraform.

•Built analytics pipeline to load the dialog flow logs into BigQuery and built visualization dashboard using Looker

•Provided best practices and recommendation to improve the performance of spark jobs run on Dataproc clusters

•Worked on an open source tool for Data Validation and modified code to support Hive connector

•Worked with Client to migrate their current on premise architecture to GCP cloud

•Designed target architecture on GCP to optimize the ETL pipeline.

•Migrated the on perm Spark jobs to Cloud Dataproc to transform the data

Data Engineer Intern 09/2020 - 12/2020

Quantiphi Marlborough,MA,USA

•Designed and developed a streaming Dataflow ETL,orchestrated the pipeline using AirFlow.

•Designed end-to-end optimal SQL-driven ETL pipelines for a Data Warehouse in Google Cloud BigQuery and orchestrated them to run automatically using Airflow.

•Implemented POC to build an ETL pipeline using Cloud Data Fusion,extracted data from private Cloud SQL instances, loaded into BigQuery and scheduled workflow using Airflow.

•Generated aggregated data using Hive ETL and Spark ETL(complex and huge data sets)

•Created multiple jobs to export to MySQL DWH using sqoop as this is the source for all the reporting

•Using the above pre processed sources created a customer 360 dataset using SPARK ETL

•Generated a data set for all the products to gauge their performance using SPARK ETL

•Cleaning and loaded of the metadata file for a particular location using a python script

•created multiple hive scripts to segregate the data

Senior Tech Associate 07/2018 - 07/2019

BankOfAmerica Hyderabad, India

•Involved in creating Hive tables, loading and analyzing data using Hive queries

•Built reusable Hive UDF libraries for business

•Worked with Spark core, Spark streaming and Spark SQL modules of Spark

•Developed shell scripts to initiate jobs with required features and environment

•Developed batch ETL pipeline using loading data into Teradata target tables using sqoop and HDFS.

•Optimized existing Hive queries using many Hive Optimization techniques.

•Data cleaning and Data preparation on the raw data set and create a pre processed data using different Hive scripts

•Normalize the preprocessed Data sets of different sources using Hive transformation scripts and also created custom UDFs(created UDF for accepting multiple date formats and generate a single version of date format)

•Tuning the developed the HIVE ETL's to bring down the execution time by creating day wise partitions for all the tables as our batch job runs day wise

•Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.

•Analyzed the SQL scripts and designed the solution to implement using Scala.

•Used Spark-SQL to Load JSON data and create Schema R DD and loaded it into Hive Tables and handled structured data using Spark SQL.

•Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data

Education

University Of Cincinnati 12/2020

Master of Science in Computer Science Cincinnati,OH,USA

Osmania University 06/2018

Bachelor of Engineering in Electronics and Communication Engineering Hyderabad, TN, India

Skills

Programming Languages : Python,Scala,Java Script, HTML

Frameworks : Django, FlaskApp, React.Js.

Tools : Docker,GCP, Airflow,Tableau,Looker Spark, AWS, Hadoop, Terraform, Gitlab

Databases : MySqL MongoDB,PostgreSQL,Cloud BigTable, BigQuery

Projects

Yelp Search Engine Node.js, React.js, MongoDB, AWS EC2

•Collected data, images from Yelp and Google. Analyzed and built a full stack search engine .

•Hosted the application on EC2 server

Book Genre Classification NLP,CNN

•Implemented an approach to develop a complete end-to-end solution that can capture real-time book cover images, extract the book title, and predict its genre.

•The title is extracted from the image through OCR, data is pre-processed through NLP and later carried out nine deep learning models using RNN, CNN with GloVe embeddings

Web Application Python, FlaskApp,AWS,Docker

•Developed Web application using FlaskApp

•Hosted the Application on Amazon Ec2

Certifications

● Google Cloud Professional Data Engineer Certification

● Looker Certified LookML Developer

Contact this candidate