BigData Hadoop Developer

Location:

Pittsburgh, PA

Posted:

February 07, 2020

Contact this candidate

Resume:

SABATINI GADDAM

361-***-****

**************@*****.***

PROFESSIONAL SUMMARY:

●Over 5+ years of experience as Solutions-oriented IT Software Developer, with experience in analysis, design, installation, configuration, development and integration using Hadoop and its Eco systems (HDFS, MapReduce, Hive, Impala, Zookeeper, Oozie, HBase, Sqoop, Pig, Kafka, Python, Spark, Flume).

●Expertise in Data ingestion, Modelling, Querying, Processing, Storage, Analysis and implementing enterprise level systems spanning Big Data and Data Integration.

●Experienced in Hadoop architecture and the daemons of Hadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker, Single node and Multi node Cluster Configuration.

●Expertise in writing HiveQL queries to store processed data into Hive tables for analysis, created Managed and External tables in Hive using shared meta store to optimize the performance.

●Proficient in using Apache Sqoop for importing data to HDFS from RDBMS and exporting to MySQL.

●Extensive knowledge in writing Pig scripts to analyze data using Pig Latin.

●Experienced in performing transformations and actions on RDD’s and Data Frame's, creating case classes for the required input data and performed the data transformations using Spark-Core and Spark SQL.

●Optimized the performance of data stream batch jobs through SPARK Streaming.

●Experienced in using Flume and Kafka to load the log data from multiple sources into HDFS.

●Hands on experience on NoSQL databases including HBase, Cassandra and MongoDB.

●Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.

●Worked on Data Modelling using various Programming languages like Python.

●Experienced in optimizing the performance and troubleshooting in HBase Shell/API, Pig, Hive and MapReduce.

●In depth Knowledge on Oracle Hyperion EPM suite for analysis and reporting of data.

●Experience in working with Amazon AWS cloud which includes services like (EC2, S3, RDS and EBS), Elastic Beanstalk, Cloud Watch.

●Experience in working with Agile Methodologies including SCRUM and Test-Driven Development.

●Extensive knowledge in relational databases like Oracle, MySQL and SQL Server.

●Experience in maintaining the cluster and troubleshooting the operating systems (Windows, Linux and UNIX).

●Worked with Big Data distributions like Cloudera with Cloudera Manager, Hortonworks Ambari.

●Expertise in Trouble shooting, knowledge transfer, documentation and training the clients.

EDUCATION:

Bachelor of Engineering in Electronics and Communications from Osmania University - 2012, Hyderabad, India.

TECHNICAL SKILLS:

Hadoop Ecosystem:

HDFS, Map Reduce, YARN, Hive, Pig, Sqoop, Spark, Pyspark, Kafka, Flume, Python, Zookeeper & Oozie

Hadoop Management:

Cloudera Manager, Hortonworks Ambari

No SQL

HBase, MongoDB, Cassandra

Programming Languages:

C, C++, Java, Python

IDES

PyCharm, Anaconda, Eclipse

Operating Systems:

Linux, Unix, Windows

RDBMS:

Oracle, MS SQL Server, MySQL.

Other Technologies

Oracle EPM Suite Hyperion

PROFESSIONAL EXPERIENCE:

Nike, Memphis, TN.

Hadoop Developer April 2017 - Present

Responsibilities:

●Worked with systems engineering team to plan and deploy new Hadoop environments and expand the existing environments.

●Involved in importing data from various data sources like Oracle and MySQL using Sqoop for loading the data into HDFS.

●Developed HiveQL queries, Map side joins and Hive UDFs when relevant.

●Configured Hive metastore with MySQL, which stores the metadata for Hive tables.

●Analysed the web log data using HiveQL to extract number of orders and most purchased products.

●Created Managed and External tables in Hive using shared meta store, used partitions and bucketing concept in hive to optimize the performance.

●Used Flume, Kafka to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.

●Accessed data stored in Hive database through spark to perform data validations and wrote results to HBase.

●Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.

●Used Spark Scripts by using java and Python shell commands as per the requirement.

●Developed Scala scripts, UDFs using both Data frames/SparkSQL/Data sets and RDD/MapReduce in Spark for Data Aggregation.

●Optimized the existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD’s.

●Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

●Developed Python, PySpark, Spark scripts to filter/cleanse/map/aggregate data.

●Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.

●Involved in project increment sessions and sprint planning (Agile methodology) for each implementation task.

●Installed Oozie workflow engine to run multiple map-reduce programs which run independently with time and data.

●Tested and reported defects in an Agile Methodology perspective.

●Co-ordinated with application teams for Hadoop updates, patches, version upgrades as required.

●Involved in cluster maintenance and monitoring.

●Participated in functional reviews, test specifications and documentation review.

Environment: Hadoop, HDFS, Map Reduce, YARN, Sqoop, Hive, Pig, flume, Spark, Kafka, Oozie, Zookeeper, HBase, Oracle, MySQL, Cloudera Distributions.

Think Analytics LLC, Irving, Tx. April 2016- March 2017

Hadoop Developer.

Responsibilities:

●Participated in requirement gathering and documenting the business requirements by conducting workshops/meetings with various business users.

●Handled importing large data sets from various relational databases like Oracle and MySQL into HDFS using Sqoop and export the analysed data back for visualization and report generation by the BI team.

●Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the data coming from distinct sources.

●Worked with spark core, spark SQL and spark streaming modules of Spark.

●Loaded streaming data using Kafka and processed using Spark.

●Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.

●Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

●Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD’s.

●Involved in developing Python, PySpark, Spark scripts to filter/map/aggregate data.

●Collected the logs data from web servers and integrated in to HDFS using Flume.

●Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.

●Maintenance of data importing scripts using Hive and MapReduce jobs.

●Developed and maintain several batch jobs to run automatically depending on business requirements.

●Unit testing and Deploying for internal usage monitoring performance of solution.

Environment: Apache Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Eclipse, Oozie, Zookeeper, Flume, Oracle, MySQL and Cloudera Distribution.

TXU Energy, Dallas, Tx. April 2014 –March 2016

Hyperion Consultant

Responsibilities:

●Involved in the Entire Life Cycle from Requirement gathering, Design, Development, Testing, Migration and Administration of Hyperion Planning and Essbase reporting applications.

●Responsible for the development of security for all the 26 Planning, Essbase and HFM applications

●Involved in the development of data flow process from various data sources in to Hyperion system

●Developed and maintained numerous business rules using Calculation manager for the Planning applications, MDX scripts for the Reporting applications and Rule file for the HFM application.

●Automated the metadata update process from SQL Server to all Hyperion Planning and Essbase applications using Oracle Data Integrator scheduler (ODI).

●Migrated the Planning, and Essbase reporting applications with their artefacts including data from one environment to other environment using Life cycle management, Import and export utilities.

●Automated the applications, database and data load process and security backups.

●Developed dynamic process for dimension and data loads using rule files for all the applications.

●Developed and maintained numerous Reports using Hyperion Financial Reporting Studio.

●Assigned filters for Planning and Essbase Applications through EAS and by using Shared Services.

●Automated the data loads, execution of calculation scripts, database backup process using MAXL.

●Developed test cases for the business users to for the easy testing of data in the applications.

●Migrated all the applications including Business Rules, Calculation Scripts, Data Forms, Rules Files, supporting details, security from development environment to production environment using LCM and import/export utilities.

●Tuned the existing applications for better performance by identifying the issues on priority basis and by modifying the server, application and database wise settings.

●Worked on Backup, recovery of the databases and their artefacts of the Hyperion applications.

●Updated the Initial design document and maintenance documents for the future purpose.

●Involved in Knowledge Transfer and training of the business users about the new environment.

●Worked with Hyperion / Oracle support team to resolve Oracle EPM product related issues.

Environment: Oracle EPM 11.1.1.3 (EPMA Environment), Planning, Essbase, Shared Services, Oracle Hyperion SmartView, Oracle Data Integrator (ODI), Financial Reporting studio, SQL Server 2005, Windows 2008.

Contact this candidate