Resume

Aws Cloud Software Development

Location:

Allen, TX

Posted:

October 12, 2023

Contact this candidate

Resume:

Vamshi Krishna

Hadoop/Big Data Developer

Mail: ad0b9q@r.postjobfree.com Mobile: 469-***-****

PROFESSIONAL SUMMARY

Having around 3 years of experience in software development as a Hadoop developer in Big Data/ Hadoop/Spark technology development.

Experience in developing applications that perform large scale distributed data processing using big data ecosystem tools like HDFS, YARN, Sqoop, Flume, Kafka, MapReduce, Pig, Hive, Spark, Spark SQL, Spark Streaming, HBase, Cassandra, MongoDB, Mahout, Oozie, and AWS.

Good functional experience in using various Hadoop distributions like Hortonworks, Cloudera, and EMR.

Good understanding in using data ingestion tools- such as Kafka, Sqoop and Flume.

Experienced in performing in-memory real time data processing using Apache Spark.

Good experience in developing multiple Kafka Producers and Consumers as per business requirements.

Extensively worked on Spark components like Spark SQL, MLlib, GraphX, and Spark Streaming.

Configured Spark Streaming to receive real time data from Kafka and store the stream data to HDFS and process it using Spark and Scala.

Experience in spinning up different Azure resources using ARM templates.

Experience in setting up Azure Big data environment using Azure HD Insight

Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.

Experience with an in-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including IAM, EC2, EMR, SNS, RDS, Redshift, Athena, Dynamo DB, Lambda, Cloud Watch, Auto-Scaling, S3, and Route 53.

Developed quality code adhering to Scala coding standards and best practices.

Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Flume, Kafka, Hive, Scala, SPARK SQL, Data Frames, SQOOP, MySQL, Unix commands, Cassandra, MongoDB, Tableau tool and related big data tools.

Hands on developing and debugging YARN (MR2) Jobs to process large Datasets.

Experience in support of IBM Mainframe applications - MVS, COBOL, JCL, PROCs, VSAM, File aid, JCL, SQL and DB2.

Hands on experiences with Hadoop stack. (HDFS, Map Reduce, YARN, Sqoop, Flume, Hive-Beeline, Impala, Tez, Pig, Zookeeper, Oozie, Sentry, Kerberos, Centrify DC, Falcon, Hue, Kafka, Storm).

Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Azure cloud.

Extensive work experience in creating UDFs, UDAFs in Pig and Hive.

Involved in deploying applications on Azure. Involved in setting big data cluster using Azure HDInsight

Good experience in using Impala for data analysis.

Experience on NoSQL databases such as HBase, Cassandra, MongoDB, and DynamoDB.

Implemented CRUD operations using CQL on top of Cassandra file system.

Manage and review HDFS data backups and restores on Production cluster.

Experience in creating data-models for client's transactional logs, analyzed the data from Cassandra tables for quick searching, sorting, and grouping using the Cassandra Query Language (CQL).

Expert knowledge on MongoDB data modeling, tuning, disaster recovery and backup.

Hands on experience on Ad-hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.

Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.

Expertise in relational databases like MySQL, SQL Server, DB2, and Oracle.

Great understanding on Solr to develop search engine on unstructured data in HDFS.

Experience in cloud platforms like AWS, Azure.

Working closely with Azure to migrate the entire Data Centers to the cloud using Cosmos DB, ARM templates.

Extensively worked on AWS services such as EC2 instance, S3, EMR, Cloud Formation, Cloud Watch, and Lambda.

Experience on ELK stack and Solr to develop search engine on unstructured data in HDFS.

Implemented ETL operations using Big Data platform.

Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.

Experience working with Core Java, J2EE, JDBC, ODBC, JSP, Java Eclipse, EJB and Servlets.

Strong experience on Data Warehousing ETL concepts using Informatica, and Talend.

Skills

Big Data : Hadoop, HDFS, MapReduce, Pig, Hive, Spark, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, YARN, Hue.

Hadoop Distributions : Cloudera (CDH4, CDH5), Hortonworks, EMR.

Programming Languages : C, Java, Python, Scala.

Database : NoSQL, HBase, Cassandra, MongoDB, MySQL, Oracle, DB2, PL/SQL, Microsoft SQL Server.

Cloud Services : AWS, Azure.

Frameworks : Spring, Hibernate, Struts.

Scripting Languages : JSP, Servlets, JavaScript, XML, HTML.

Java Technologies : Servlets, JavaBeans, JSP, JDBC, EJB.

Application Servers : Apache Tomcat, Web Sphere, WebLogic, JBoss.

ETL Tools : Informatica, Talend.

Work Experience

Hadoop/Big Data Developer

Thomson Reuters, Hyderabad, India

Oct 2019 – Dec 2021

Role & Responsibilities:

Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data Ingestion to at least one Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.

Converting the existing relational database model to Hadoop ecosystem.

Designing and Creating Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and non-relational to meet business functional requirements.

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark

Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN

Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.

Created pipeline for processing structured and unstructured streaming data using spark streaming and stored the filtered data into S3 as parquet files.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.

Data Ingestion to at least one Azure Services - (Azure Data Lake, Azure DF, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.

Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.

Developed Schedulers that communicated with the Cloud based services (AWS) to retrieve the data.

Strong experience in working with ELASTIC MAPREDUCE and setting up environments on Amazon AWS EC2 instances.

Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.

Imported the data from different sources like AWS S3, LFS into Spark RDD.

Experienced in working with Amazon Web Services (AWS) EC2 and S3 in Spark RDD

Managed and reviewed Hadoop and HBase log files.

Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.

Designed and implemented HIVE queries and functions for evaluation, filtering, loading, and storing of data.

Analyze table data and implement compression techniques like Teradata Multivalued compression.

Involved in ETL process from design, development, testing and migration to production environments.

Involved in writing the ETL test scripts and guided the testing team in executing the test scripts.

Writing Hadoop MapReduce jobs to run on Amazon EMR clusters and creating workflows for running jobs.

Model complex ETL jobs that transform data visually with data flow or by using compute services Azure Databricks, Azure Blob Storage, Azure SQL Database, Cosmos DB.

Worked with Elastic MapReduce (EMR) on Amazon Web Services (AWS).

Have good understanding of Teradata MPP architecture such as Partitioning, Primary Indexes,

Created Partitions, Buckets based on State to further process using Bucket based Hive joins.

Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop.

Creating Hive tables and working on them using HiveQL.

Building and creating scripts for data modelling, mining for easier access to Azure Logs, App Insights to

Creating and truncating HBase tables in hue and taking backup of submitter ID.

Developed data pipeline using Kafka to store data into HDFS.

Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.

Developed ETL Process using HIVE and HBASE.

Worked as an ETL Architect/ETL Technical Lead and provided the ETL framework Solution for the Delta process, Hierarchy Build and XML generation.

Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.

Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.

Installed and configured Apache Hadoop, Hive and Pig environment.

Environment: Azure Data Factory (ADF v2), Azure Databricks (PySpark), Azure Data Lake, Spark (Python/Scala), Hadoop, HDFS, pig, Hive, Flume, Sqoop, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Elastic search, Linux- Ubuntu, Kafka.

Hadoop Developer

Bank of Montreal, Hyderabad India

May 2018 – Oct 2019

Role & Responsibilities:

Worked on extracting and enriching HBase data between multiple tables using joins in spark.

Worked on writing APIs to load the processed data to HBase tables.

Replaced the existing MapReduce programs into Spark application using Scala.

Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.

Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.

Developed intranet portal for managing Amazon EC2 servers using Tornado and MongoDB.

Building SSIS packages to create ETL process and load data into SQL Server database for some of the SSRS Reporting requirements.

Created new database objects like procedures, functions, packages, triggers, indexes, and views using T-SQL in development and production environment for SQL server 2008/2012.

Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.

Extensively used Spark stack to develop preprocessing job which includes RDD, Datasets and Data frame APIs to transform the data for upstream consumption.

Involved in writing optimized Pig Scripts along with developing and testing Pig Latin Scripts.

Able to use Python Pandas, NumPy modules for Data analysis, Data scraping and parsing.

Deployed applications using Jenkins framework integrating Git- version control with it.

Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.

Installed Oozie workflow engine to run multiple SNS and Pig jobs.

Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.

Installed Hadoop eco system components like Pig, Hive, Hbase and Sqoop in a Cluster.

Participated in production support on a regular basis to support the Analytics platform.

Used Rally for task/bug tracking.

Used GIT for version control.

Good knowledge on Kafka streams API for data transformation.

Implemented logging framework - ELK stack (Elastic Search, Logstash& Kibana) on AWS.

Setup Spark EMR to process huge data which is stored in AWS S3.

Developed Oozie workflow for scheduling & orchestrating the ETL process.

Used Talend tool to create workflows for processing data from multiple source systems.

Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.

Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, HBase, Talend, Python Scripting, Java.

Contact this candidate