Sign in

hadoop developer

Portland, Oregon, 97204, United States
January 31, 2019

Contact this candidate


Meghana P

Ph: 860-***-****

Professional Summary

Around 5 years of IT experience in development, implementation and testing of BusinessIntelligence and Data Warehousing solutions with BigData technologies.

Excellent Knowledge in understanding Big Data infrastructure, distributed file systems –HDFS, parallel processing – Map Reduce framework and complete Hadoop ecosystem – Hive, Hue, Pig, Hbase, Zookeeper, Sqoop, Kafka, Spark, Flume and Oozie.

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.

In depth knowledge of real-time ETL/Spark analytics using Spark Sql with visualization

Extensive experience with big data query tools like Pig Latin and HiveQL.

Experience in extracting the data from RDBMS into HDFS using Sqoop.

Experience in collecting the logs from log collector into HDFS using Flume.

Good understanding of NoSQL databases such as HBase, Cassandra and Mongo DB.

Experience in analyzing data in HDFS through MapReduce, Hive and Pig.

Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.

Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH4 & CDH5 clusters, HDP 2.2 with Kafka-Storm and EC2 platform, IBM’s Big Insight Hadoop ecosystem.

Knowledge on Hadoop administration activities such as installation, configuration and management of clusters using Cloudera Manager, Hortonworks and Apache Ambari.

Hands on experience on performing ETL by using Talend and excellent understanding of creating dashboard reports using Tableau.

Experience in Scala, Multithreaded processing, Sql, Plsql.

Hands on experience in loading unstructured data (Log files, Xml data) into HDFS using Flume.

Good knowledge on Apache Spark, Kafka, Splunk and BI tools such as Pentaho and Talend.

Experience in tuning the performances by using Partitioning, Bucketing and Indexing in HIVE.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.

Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies with developing and maintaining the Web Applications using the Web Server Tomcat

Experience with front end technologies like HTML5, CSS3, Javascript and jQuery for UI to get a complete end to end system

Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos, Redhat, Ubuntu.


Bachelor of Technology in Electronics from JNTU, Hyderabad, India.

Masters in Electrical Engineering from SDSU, San Diego, CA, USA.

Technical Skills

Hadoop/Big Data

HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala

Operating Systems/Environment

Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight

Java & J2EE Technologies

Core Java, Servlets, JSP, JDBC, Java Beans

Modeling Tools

UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio


Eclipse, Net beans, JUnit testing tool, Log4j for logging


Oracle, DB2, MS-SQL Server, MySQL, MS- Access, Teradata, NoSQL (HBase, MongoDB, Cassandra )

Web Servers

Web Logic, Web Sphere, Apache Tomcat 7

Build Tools

Maven, Scala Build Tool(SBT), Ant

Operating systems and Virtual Machines

Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11

ETL Tools

Talend for Big data, Informatica


Nike Inc – OR Jan 2017 – Present

Role: Senior Software/Big Data Engineer


Worked on buildingan ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive

Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.

Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.

Involved in developing a data quality tool for checking all data ingestion into Hive tables

Collaborated with BI teams to ensure data quality and availability with live visualization

Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines

Design and support multi-tenancy on our data platform to allow other teams to run their applications

Used Impala for low latency queries, visualization and faster querying purposes.

Created HIVEQueries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .

Created HBase tables to load large sets of structured data.

Managed and reviewed Hadoop log files.

Coded to ENCRYPT/DECRYPT data for PII groups.

Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.

Processed JSON files and ingested into Hive tables

Used python to parse XML files and created flat files from them.

Used Hbase to support front end applications that retrieve data using row keys.

Used Control-M as Enterprise Scheduler to schedule all our jobs

Used Bit-Bucket extensively for code repository

Environment: Cloudera,Hue,Java,Python,Sql,Shell-scripting,CONTROL-M,Oozie,Spark,Sqoop,Bit-Bucket,Hive,


Toyota Insurance Management Solutions - TX Sep 2015 –Dec 2016

Role: Senior Software/Big Data Engineer


Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.

Co-ordinated with the other team members to write and generate test scripts, test cases for numerous user stories.

Communicate regularly with business and I.T leardership.

Analyzed driving behavior of the customers as part of User Based Insurance(UBI) Program.

Developed an algorithm to score drivers based on their driving behavior.

Developed pyspark/Spark SQL scripts to analyze various customer behaviors.

Responsible for data extraction and data ingestion from different data sources into HDFS Data Lake Store by creating ETL pipelines using Sqoop, Oozie, Spark and Hive.

Extensively worked with pyspark / Spark SQL for data cleansing and generating Data Frames and RDDs.

Worked on Hortonworks distribution for processing Big Data across a Hadoop Cluster of virtual servers.

Used sqoop to export data to relational database.

Used Bit Bucket to collaboratively interact with the other team members.

Involved in creating Hive tables, loading data of formats like avro,json,csv,txt,parquet and writing hive queries to analyze data using HQL.

Developed Spark Programs for Batch Processing.

Developed Spark code using python for pyspark, scala and Spark-SQL for faster testing and processing of data.

Scheduled various spark jobs for daily and weekly.

Monitored various cluster activities using Apache Ambari.

Created data visualizations using Microsoft Power BI and Tableau.

Modelled Hive partitions extensively for faster data processing.

Implemented various udfs in python as per the requirement.

Involved in data movement between two clouds.

Involved in Agile methodologies, daily scrum meetings and sprint planning.

Environment: Hortonworks, MapReduce, HDFS, HQL, Python, Spark, Hive, Pyspark, Spark SQL,Bit Bucket, Ambari, Jupyter,JIRA, Sqoop, Zookeeper, Scala, Shell Scripting, Sql.

Client: People's United Bank - Bridgeport, CT Mar 2015 – Sep 2015

Role: Hadoop Developer


Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server

Configured MySql Database to store Hive metadata

Responsible for loading unstructured data into Hadoop File System (HDFS)

Created POC to store Server Log data in MongoDB to identify System Alert Metrics

Created Reports and Dashboards of Server Alert Data

Created Map Reduce Jobs using Pig Latin and Hive Queries

Built Big Data Edition & Hadoop based architecture remodelling for one reporting stream.

Involved in importing data from relational databases like Teradata, Oracle, MySQL using Sqoop Used Sqoop tool to load data from RDBMS into HDFS

Cluster coordination services through Zoo Keeper

Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows

Created Reports and Dashboards using structured and unstructured data

Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.

Extensively worked with SQL scripts to validate the pre and post data load.

Created unit test plans, test cases and reports on various test cases for testing the data loads

Worked on integration testing to verify load order, time window.

Performed the Unit Testing which validate the data is processed correctly which provides a qualitative check of overall data flow up and deposited correctly into targets.

Responsible for post production support and SME to the project.

Involved in the System and User Acceptance Testing.

Involved in POC working with R for data analysis.

Environment: Hadoop, Cloudera, Pig, Hive, Java, Sqoop, HBase, noSQL, Informatica Power Center 8.6, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures.

Western Union - San Francisco, CA April, 2013 – June, 2014

Role: Hadoop Consultant


Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software

Installed and configured MapReduce, HIVE and the HDFS; implemented CDH4 (Hortonworks) Hadoop cluster on CentOS/Linux. Assisted with performance tuning and monitoring

Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW

Capturing data from existing databases that provide MySQL interfaces using Sqoop

Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS

Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JMS, JBoss and Web Services

Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics

Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data

Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems

Managed and reviewed Hadoop log files

Tested raw data and executed performance scripts

Shared responsibility for administration of Hadoop, Hive and Pig

Exposure to Machine Learning using R and Mahout.

Developed Hive queries for the analysts, used ETL tool Talend for processing and further did visualization for transactional data

Helped business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS

Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios

Supported code/design analysis, strategy development and project planning

Developed multiple MapReduce jobs in Java, further any required coding in Java for data cleaning, filtering and preprocessing with experience of testing.

Assisted with data capacity planning and node forecasting

Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Administrator for Pig, Hive and Cassandra installing updates, patches and upgrades.

Handling structured and unstructured data and applying ETL processes.

Environment: Hadoop, MapReduce, HDFS, Hive, Cassandra, Java (jdk1.7), Hadoop distribution of Hortonworks, Cloudera, MapR, IBM DataStage 8.1(Designer, Director, Administrator), MySQL, Windows, Linux

Contact this candidate