Sign in

Hadoop/Spark Developer

Charlotte, North Carolina, United States
August 28, 2017

Contact this candidate


Hadoop/Spark Developer E-mail Phone No: (302)***-****


Around 8 Years of experience in Information Technology Industry which includes 5+Years of experience as Hadoop/Spark Developer using Bigdata Technologies like Hadoop Ecosystem, Spark Ecosystems and 2+Years of Java/J2EE Technologies and SQL.

Hands on experience in installing, configuring and using Hadoop ecosystem components like HDFS, MapReduce Programming, Hive, Pig, Yarn, Sqoop, Flume, Hbase, Impala, Oozie, Zoo Keeper, Kafka, Spark.

In depth understanding of Hadoop Architecture including YARN and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.

In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLib and Spark Real time Streaming.

Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).

Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.

Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS

Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.

Experience in usage of Hadoop distribution like Cloudera, Hortonworks distribution & Amazon AWS

Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.

Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.

Experience in working with flume to load the log data from multiple sources directly into HDFS.

Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.

Good knowledge on Impala, Mahout, SparkSQL, Storm, Avro, Kafka, Hue and AWS and knowledge on IDE tools such as Eclipse, NetBeans, and Maven.

Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS. Assisted with performance tuning, monitoring and troubleshooting.

Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka

Experience in manipulating the streaming data to clusters through Kafka and Spark-Streaming.

Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java

Experience in NoSQL Column-Oriented Databases like Hbase, Cassandra and its Integration with Hadoop cluster.

Involved in Cluster coordination services through Zookeeper.

Good level of experience in Core Java, J2EE technologies as JDBC, Servlets, and JSP.

Hands-on knowledge on core Java concepts like Exceptions, Collections, Data-structures, Multi-threading, Serialization and deserialization.

Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.

Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.

Technical Skills

Languages : Java, Python, Scala, SQL, HiveQL, NoSQL, Piglatin

Hadoop Ecosystem : HDFS, Hive, Map Reduce, HBase, Yarn, Sqoop, Flume, Oozie, Zookeeper, Impala, Avro

Databases : Oracle, RDBMS,DB2,SQL Server, MySQL

NoSQL Databases : HBase, MongoDB, Cassandra

Scripting Languages : JavaScript, AJAX, CSS, Python, Perl, Unix Shell Script

Programming Languages : C, C++, C#, Java, J2EE, JDBC, Python, Scala, Shell Scripting, PL/SQL, Android, Unix

Java Languages : Java,J2EE, JDBC, Servlets, JSP, JSTL, JavaBeans, XML Parsers, EJB, Hibernate, Struts

Web Technologies : Servlets, HTML, JavaScript

Web Servers : Web Logic, Web Sphere, Apache Tomcat, JBOSS.

Web Services : SOAP, Restful API, WSDL

Operating Systems : Windows XP/Vista/7/8, Linux, Unix, Ubuntu

Professional Experience

Client : Wells Fargo, Charlotte, NC ( April 2016 - Present )

Role : Hadoop/Spark Developer

Roles & Responsibilities:

Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.

Followed Agile & Scrum principles in developing the project

Developed Spark API to import data into HDFS from DB2 and created Hive tables.

Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.

Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Importing Large Data Sets from DB2 to Hive Table using Sqoop

Used Impala for querying HDFS data to achieve better performance.

Implemented Apache PIG scripts to load data from and to store data into Hive.

Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.

Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through SparkSQL

Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD's.

Developed Spark scripts by using Scala Shell commands as per the requirement.

Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.

Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requirements of customer facing API.

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using SparkSQL.

Worked on Batch processing and Real-time data processing on Spark Streaming using Lambda architecture.

Developing Spark code in Scala and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.

Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.

Utilized Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of using MapReduce in Java.

Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.

Involved in converting Hive/SQL queries into Spark transformations using Spark Dataframes and Scala.

Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.

Load the data into Spark RDD and do in memory data Computation to generate the Output response.

Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.

Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.

Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.

Analyzed the SQL scripts and designed the solution to implement using PySpark.

Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.

Used Oozie workflow to co-ordinate pig and Hive Scripts.

Environment: HDFS, MapReduce, Hive, Sqoop, HBase, Oozie, Flume, Sqoop, Impala, Kafka, Zookeeper, SparkSQL, Spark Dataframes, PySpark, Scala, Amazon AWS S3, Python, Java, JSON, SQL Scripting and Linux Shell Scripting, Avro, Parquet, Hortonworks.

Client : Health Dialog, Bedford, NH ( Apr2014 - Feb2016 )

Role : Sr.Hadoop Developer

Roles & Responsibilities:

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Application master, Node Manager, Resource Manager, Name Node, Datanode and MapReduce concepts.

Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase.

Good experience with NoSQL database Hbase and creating Hbase tables to load large sets of semi structured data coming from various sources.

Wrote Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS.

Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis

Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Involved in moving all log files generated from various sources to HDFS for further processing through Flume.

Developed Java code to generate, compare & merge AVRO schema files.

Developed complex MapReduce streaming jobs using Java language that are implemented Using Hive and Pig and using MapReduce Programs using Java to perform various ETL, cleaning and scrubbing tasks.

Prepared the validation report queries, executed after every ETL runs, and shared the resultant values with business users in different phases of the project.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting & used the hive optimization techniques during joins and best practices in writing hive scripts using HiveQL.

Importing and exporting data into HDFS and Hive using Sqoop. Writing the HIVE queries to extract the data processed

Developing and running Map-Reduce Jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.

Teamed up with Architects to design Spark model for the existing MapReduce model and Migrated MapReduce models to Spark Models using Scala.

Implemented Spark using Scala and utilizing SparkCore, Spark Streaming and SparkSQL API for faster processing of data instead of MapReduce in Java.

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL

Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop

Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.

Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce Hive, Pig, and Sqoop.

Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Pig, Sqoop, Spark and Zookeeper.

Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup

Environment: Apache Hadoop, HDFS, MapReduce, HBase, Hive, Yarn, Pig, Sqoop, Flume, Zookeeper, Kafka, Impala, SparkSQL, Spark Core, Spark Streaming, NoSQL, MySQL, Cloudera, Java, JDBC, Spring, ETL, WebLogic, Web Analytics, Avro, Cassandra, Oracle, Shell Scripting, Ubuntu.

Client : GNS Healthcare, Cambridge, MA ( Apr 2012 - Mar 2014 )

Role : Hadoop Developer

Roles & Responsibilities:

Installed and configured various components of Hadoop Ecosystem like Job Tracker, Task Tracker, Name Node and Secondary Name Node.

Designed and developed multiple MapReduce Jobs in Java for complex analysis.

Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.

Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations

Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase

Involved in moving all log files generated from various sources to HDFS for further processing through Flume.

Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.

Implemented the workflows using Apache Oozie framework to automate tasks.

Analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.

Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.

Moving the data from Oracle, MSSQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.

Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis

Used Spark-SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled Structured data using Spark SQL

Created MapReduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.

Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume

Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.

Implemented SparkRDD Transformations, actions to migrate MapReduce algorithms.

Used Zookeeper for providing coordinating services to the cluster.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs

Environment: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Scala, HDFS, MapReduce, Hive, HBase, Oozie, Pig, Sqoop, Flume, Zookeeper, Kafka, Scala, Python, Java, JSON, Oracle, SQL, Avro

Client : Mindteck India Ltd, Bangalore, India ( Jul 2010 - Mar 2012 )

Role : Java / SQL Developer

Roles & Responsibilities:

Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.

Developed applications in environments of Waterfall methodologies.

Designed a Web application using Web API with Angular JS and populated data using java entity framework

Developed the GUIs using HTML, CSS, JSP and AngularJS framework Components.

Written Java Script, HTML, CSS, Servlets, and JSP for designing GUI of the application

Used Struts Framework to design actions, action forms, and related configuration for every use-case

Used SOAP for the data exchange between the backend and user interface.

Implemented application servers like Apache Tomcat, Web Sphere and Web Logic in project based on the requirement.

Used Web sphere Application Server for deploying the application.

Used SQL queries to perform backend testing on the database.

Created database access layer using JDBC and SQL stored procedures

Worked on Java based connectivity of client requirement on JDBC connection.

Developed code using various patterns like Singleton, Front Controller, Adapter, DAO, MVC Template, Builder and Factory Patterns.

Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create and

maintain the database, roles, users, tables, views, procedures and triggers.

Utilized Java and MySQL from day to day to debug and fix issues with client processes.

Used JIRA tracking tool to manage and track the issues reported by QA and prioritize and take action based on the severity

Wrote SQL statements Stored procedures and functions that are called in Java.

Extensively used Core Java such as Multithreading, Exceptions, and Collections.

Hands on experience using JBOSS for the purpose of EJB and JTA, and for caching and clustering purposes.

Generated server side SQL scripts for data manipulation and validation and materialized views.

Environment : Java, JSP, HTML, CSS, RAD, JDBC, AJAX JavaScript, Struts, Servlets, Apache Tomcat, Web Logic, Web Sphere, SOAP, JBoss, PL/SQL, Eclipse, JavaScript,, EJB, XML, Windows XP, LINUX, ANT, Eclipse.

Client : NTT DATA, Hyderabad, India ( Feb 2009 - Jun 2010 )

Role : SQL Developer

Roles & Responsibilities:

Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.

Created Database objects like Stored Procedures, Functions, Packages, Triggers, Indexes and Views using T-SQL

Performed data conversions from flat files into a normalized database structure

Created database maintenance planner for the performance of SQL Server, which covers Database integrity checks, update Database statistics and re-indexing.

Created and maintained dynamic websites using HTML, CSS, Jquery, JavaScript

Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc) and then loaded the data into destination tables by performing different kinds of transformations using SSIS/DTS packages.

Created several SSIS packages for performing ETL operations to transform data from a cube using MDX

Mostly worked on Installation, configuration, development, maintenance, administration and upgrade.

Participated in maintaining and modifying tables and constraints for Premium Database using MS SQL Server

Migrating data from different data sources to SQL server database using SSIS.

Performed Unit Testing and Tuned SQL statements using Indexes and Stored Procedures

Created several SSIS packages for performing ETL operations to transform data from OLTP to OLAP systems.

Built SSIS packages to load data to OLAP Environment and monitoring the ETL Package Job.

Developed custom reports like Sub Reports, Matrix Reports, Charts, and Drill down reports using SQL Server Reporting Services (SSRS) to review score cards, business trends based on the data from different locations.

Created various kinds of reports involving Drill Down, Drill through Report, Parameterized Reports and Ad-hoc Reports.

Created checkpoints and configuration files in SSIS packages, Experienced in slowly changing dimension in SSIS packages.

Responsible for backup, restore systems and other databases as per requirements and also scheduled those backups

Developed and deployed packages in SSIS, imported data on daily basis from the OLTP system, Staging area to Data Warehouse and Data Marts.

Environment: MS SQL Server 2005, SQL Integration Services (SSIS), SSRS, Data Transformation Services (DTS), T-SQL, Visual Studio 2008, Windows 2007 Enterprise, MS Office2007.

Contact this candidate