Post Job Free

Resume

Sign in

Data Developer

Location:
Herndon, VA, 20170
Posted:
February 05, 2018

Contact this candidate

Resume:

BigData/Hadoop/Spark developer

Summary:

Over * years of experience in software development with 4+ years of experience as BigData Engineer by working extensively on components related to Hadoop ecosystem.

Experienced in building highly scalable Big Data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Flume, HBase, Cassandra, Couchbase and MongoDB).

Strong experience in Hadoop and Big Data Ecosystem including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Kafka, Strom, Sqoop, Flume, Storm, Oozie and Impala.

Have experience on Spark Core, Spark Streaming, Hive Context, Spark Sql and MLlib for analyzing streaming data.

Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.

Experience with Oozie Workflow Engine workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.

Experience in designing and developing Enterprise applications using Java/J2EE technologies on Hadoop MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, J2EE tools & technologies like JDBC, Spring, struts, MVC, RAD, Hibernate, XML, JBoss, Apache Tomcat and IDEs tools Eclipse 3.0, My Eclipse, RAD.

Strong functional experience on Cloudera Data Platform using VMware Player, Cent OS 6 Linux environment as well as Hadoop distributions like Cloudera, MapR and HortonWorks.

Adept at writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON & Avro.

Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL and DML SQL queries and writing complex queries for Oracle Broad knowledge on Hadoop environment with strong understanding of components like HDFS, MapReduce, YARN.

Strong experience in installing, configuring and using components of Apache Hadoop ecosystem of CDH.

Experience in using Impala for faster processing huge volumes of data stored in the Hadoop cluster and integrated it with BI tools like Tableau, Pentaho.

Comprehensive knowledge on SQL and expertise in writing complex HQL queries to prepare data for various analysis on a daily basis.

Experienced in developing programs by using SQL, Python & shell scripts to schedule the processes running on a regular basis.

Expertise in loading data from Relational Database Management Systems into HDFS using Sqoop.

Experience in loading streaming data into HDFS and, in performing streaming analytics using stream processing platforms like Flume and Apache Kafka messaging system.

Good understanding on architecture and components of Spark, and efficient in working with Core Spark, SparkSQL, Spark streaming.

Experience in integrating GridGain In-Memory data fabric with Apache Spark to achieve superior performance and functionality.

Experience on working with cloud infrastructure like Amazon Web Services(AWS) including Amazon Cloud EC2, Simple Storage Service S3 and Amazon EMR.

Experienced in working with Hadoop storage and Analytics framework over AWS cloud and also experienced in launching EMR cluster, EC2 instances, S3 buckets, Amazon Data pipeline, Simple Workflow Services instances.

Strong expertise in using Apache Solr and in adding Solr to the existing clusters for indexing the documents into Solr cluster and issued complex queries against the indexed documents.

Experience in tuning, optimizing Hadoop/Spark and ETL workflows for High Availability.

Expertise in Job/Workflow scheduling and monitoring tools like Oozie and Zookeeper.

Experience on working with HBase and have an in-depth knowledge of NoSQL databases like MongoDB, Cassandra and also worked on their integration with Hadoop cluster.

Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text, Avro, Sequence, Xml, JSON and Parquet).

Solid understanding of OLAP concepts and challenges, especially with large data sets and mapping, analysis and documentation of OLAP reports.

Strong experience with Oracle Database and programming languages like Structured Query Language SQL, PL/SQL and in developing Packages, Stored Procedures, Functions, Triggers, and Cursors.

Expertise in utilizing Putty/Secure Shell Client tools for interacting with UNIX operating system and VPN for remote server connectivity.

Strong knowledge in working with UNIX/LINUX environment and writing shell scripts.

Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with individuals at all levels and expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.

TECHNICAL SKILLS:

Languages: Java, C, SQL, Scala, Python, XML, XHTML, HTML, AJAX, CSS, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting

Hadoop Technologies: HDF(NiFi), HDFS, MapReduce, YARN, Spark, Hive, HBase, Sqoop, Flume, SyncSort, Oozie, Zookeeper, Kafka, SSIS, Impala, Pig, Apache Solr, Ambari, AWS (S3, EC2, EMR, LAMBDA)

Java & J2EE Technologies: Hibernate, Spring framework, JSP, Servlets, Java Beans, JDBC, EJB 3.0, Java Sockets, jQuery, JSF, JMS, Struts 2.1, MVC, Junit.

Databases: MySQL, Cassandra, MongoDB, DynamoDB, Redis, Oracle 9i, MS SQL Server

Other tools: Maven, Jenkins, Ant, Log4j, Tableau, Qlik, Eclipse, Intellij, Putty, WinSCP, DataLake, Talend, GitHub

Protocols: HTTP, HTTPS, FTP, TCP/IP, SOAP, REST.

Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP

Application Servers: IBM Web Sphere, JBoss, WebLogic

Web Servers: Apache Tomcat

IDEs: Eclipse, Net Beans

Operating System: Unix, Windows, Ubuntu, Cent OS

Education: Bachelor of Technology, JNTU, Hyderabad, India.

Work Experience:

Cigna Insurance – Boston, MA

Hadoop/Spark developer September 2016 – Present

Description: Cigna Insurance partners with patients, physicians and healthcare professionals as well as payors to provide integrated care management. The Information Technology Department designs, develops and maintains software programs required to keep cigna insurance on the leading edge of medical technology. It has both clinic facing and non-clinic facing subgroups that provides clinician teammates the tools to provide quality patient care. Village Health IT utilizes designers, developers, system architects, project managers, application architects, nurses, trainers and many more professionals to incorporate the technology needs of our clinicians, business and physician customers.

Responsibilities:

Responsible for design development of Spark SQL Scripts based on Functional Specifications.

Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase.

Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.

Used Apache Solr to index the documents and used free-form queries to search the indexed documents.

Developed Spark applications by using Scala and Python and implemented Apache Spark for data processing from various streaming sources.

Developed Spark applications using Scala & Python to do the analytics on the data stored on HDFS.

Worked as a Spark Expert and performance Optimizer.

Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) table.

Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.

Developed preprocessing job using Spark Data frames to transform Json documents to flat file.

Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response.

Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.

Involved in loading data from rest endpoints to Kafka producers and transferring the data to Kafka brokers.

Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.

Importing and exporting data into HDFS and HIVE, PIG using Sqoop.

Designed and Implemented real-time applications for faster analytics by integrating Apache Spark with Apache ignite in-memory GridGain.

Performed faster querying using Spark SQL by integrating with GridGain which can provides SQL with indexing.

Worked with GridGain in-memory file system to share state of spark job and applications when working with files instead of RDD's.

Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.

Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.

Worked with Oozie workflow engine to run multiple Hive jobs.

Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text, Avro, Sequence, Xml, JSON and Parquet).

Generated various kinds of reports using Pentaho and Tableau based on Client specification.

Used Jira for bug tracking and, GIT & Bit Bucket to check-in and checkout code changes.

Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.

Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, HBase, Apache Solr, Oozie, Spark, Scala, Python, AWS, Flume, Kafka, Tableau, Linux, Shell Scripting.

Hadoop/Spark developer January 2015 – August 2016

Client: Disney – Orlando, FL.

Description: Disney world is officially known as the Walt Disney World Resort. It provides the distinct vision of entertainment with its own diversity. It has large business with its own record of the most visited vacation place all over the world. The platform for maintaining the resort technically is built on the Hadoop ecosystem with HDFS/HBase being the primary data storage

Responsibilities:

Interacted with source team to understand the functional and non-functional requirements.

Worked on real time/near real-time analytics using the big data platforms like Hadoop and Spark using Python.

Worked on migrating MapReduce programs into Spark transformations using Spark with Scala.

Developed spark scripts by using Scala shell as per requirements.

Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.

Developed and implemented API services using Scala in Spark.

Developed a data pipeline using Kafka to store data into HDFS.

Extensively implemented POC's on migrating to Spark-Streaming to process the live data.

Ingested data from RDBMS and performed data transformations, and then export the transformed data to MongoDB as per the business requirement.

Migrated Hive QL queries on structured data into Spark QL to improve performance.

Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's.

Involved in improving the performance and optimization of the existing algorithms using Spark.

Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.

SyncSort DMX-h is specifically designed to help you achieve your modern data strategy objectives with a single interface for accessing and integrating all your enterprise data sources

Handled large datasets using Partitions, Spark in-memory capabilities, broadcasts in Spark, effective & efficient Joins, transformations.

Well experienced in handling Data-Skewness in Spark-SQL.

Implemented MLLIB in Spark and used K-means algorithm to cluster the data available in hive tables.

Worked with Apache Solr to implement indexing and wrote Custom SOLR query segments to optimize the search.

Developed a knowledge object using Python and R to calculate the life expectancy of pancreatic

cancer patients. Object was implemented in an open-source healthcare object library. Object is expected to be implemented on Michigan Urology Surgery Improvement Collaborative (MUSIC) website.

Expertise in working with Apache Solr for indexing and load balanced querying to search for specific data in larger datasets.

Worked on Ad-hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.

Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis and extracted files from MongoDB through Flume and processed.

Expertise on MongoDB NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.

Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.

Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.

Developed Unix shell scripts to load large number of files into HDFS from local file system.

Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.

Worked on upgrading and installation of CDH upgrades and patches.

Experienced in using agile approaches including Test-Driven Development, Extreme Programming, and Agile Scrum.

Environment: Hadoop, Hive, HDFS, Spark, Spark-SQL, Kafka,SyncSort,Flume, SOLR, Java, Scala, Python, Hive, Sqoop, Oozie, Shell Scripting, Cloudera, JUnit.

Hadoop developer/Java developer June 2013 – December 2014

Mercy Health Systems – Janesville, WI

Description: Mercyhealth is a non-profit health care provider and hospital system based in Janesville, Wisconsin. At Mercy I Designed and developed big data solutions involving Terabytes of data. The big data solution consists of collecting large amounts of log data from distributed sources, transformations and standardizations analysis, statistics, aggregations and reporting etc. Built an on-demand elastic Hadoop cluster infrastructure to cater the needs of various Big Data projects, automated various Big Data workflows to process and extracts analytics out of the data using Sqoop, MapReduce, and Hive.

Responsibilities:

Involved in a live production of Cloudera distribution for Hadoop cluster consisting of 20 nodes.

Worked with Hadoop, Hive, Sqoop, Kafka, Pig, No-Sql and RDBMS databases for data integration and analytics.

Developed MapReduce jobs using java for cleansing, modelling, and analyzing the data.

Created Hive tables and loaded data from relational databases like MySQL and Sql Server using Sqoop.

Developed and worked on Sqoop jobs for incremental load of data to populate the Hive tables.

Implemented static and dynamic partitioning and created buckets in Hive for optimizing the analysis on Hive tables.

Created generic Hive UDF's and incorporated them into Hive scripts for data analysis as per the business requirements.

Using Sqoop, exported the analyzed data from Hive to RDBMS databases for generating the data reports.

Worked on loading the unstructured data into HDFS and transformed and analyzed the data using Pig scripts.

Integrated Apache Kafka for data ingestion in HDFS and No-Sql databases.

Used Zookeeper for managing cluster services like configuration, Synchronization, naming registry.

Worked with sequence files and compresses various file formats.

Involved in complete implementation of ETL logic.

Involved in monitoring the performance, workload of the cluster using cloudera manager.

Followed Agile methodology and involved in daily scrum meetings.

Environment: Hadoop, Hive, Sqoop, Pig, Kafka, RDBMS, No-Sql databases, Zookeeper, Linux, CDH.

Java Developer September 2012 – May 2013

Comerica Bank - Auburn Hills, MI

Description: Comerica Incorporated is a financial services company headquartered in Dallas, Texas. It has retail banking operations in Texas, Michigan, Arizona, California and Florida, with select business operations in several other U.S. Participated in the development of a paperless web-based document tracking and management system for department specializing in home mortgage loans. Responsible for both front end and back end development. JavaScript/JQuery, and Java Server Pages were used on the front end. Implemented backend rules and logic using Java/J2EE and Spring Tools. SCRUM (Agile Development) was the methodology used for the development of this product.

Responsibilities:

Developed presentation screens by using JSP, HTML and JavaScript.

Implemented Model View Controller (MVC) architecture and developed Form classes, Action Classes for the entire application using Struts Framework.

Performed client-side validations using JavaScript and server-side validations using in built Struts Validation Framework.

Implemented the data persistence functionality of the application by using Hibernate to persist java objects to the relational database.

Used Hibernate Annotations to reduce time at the configuration level and accessed Annotated bean from Hibernate DAO layer.

Used HQL statements and procedures to fetch the data from the database.

Transformed, Navigated and Formatted XML documents using XSL, XSLT.

Used JMS for asynchronous exchange of message by applications on different platforms.

Developed the view components using Struts Logic tags and Struts tag libraries.

Involved in designing and implementation of Session Facade, Business Delegate, Service Locator patterns to delegate request to appropriate resources.

Involved in developing SQL queries, stored procedures, and functions.

Creation of database objects like tables, views using oracle tools like Toad, and SQL* plus.

Involved in writing Stored Procedure in using PL/SQL.

Worked on Linux environment for jobs scheduling for Inbound data on monthly basis.

Used JUnit Testing Framework for performing Unit testing.

Deployed application in WebSphere Application Server and developed using Rational Application Developer.

Environment: Struts, Hibernate 3.0, HTML, JSP, RAD, JMS, CVS, JavaScript, XSL, XSLT, HQL, Servlets 2.5, WebSphere Application Server 6.1, Toad, PL/SQL.

Java Developer June 2011 – August 2012 CYIENT -Hyderabad, Telangana, India.

Description: Java stream in CYIENT starts from basics of programming including variables, statements, keywords, loops, conditional statements etc. Soon after covering basics, you will go through advanced object-oriented concepts and different built-in data structures in Java. Then you will move to DBMS concepts, SQL and SQL Java integration. Next phase is learning fundamentals of web front-end technologies like HTML, CSS and JavaScript. After learning front-end technologies, you will learn to create simple web applications that involves both front-end and back-end written in Java

Responsibilities:

Gathered and analyzed the requirements and categorized them into user requirement specifications and functional requirement specifications leading to design and development of use cases.

Involved in full SDLC for developing of entire applications using Agile methodology.

Designed and implemented the user interface using HTML, CSS, JavaScript and SQL Server.

Involved in the UI development, including layout and front-end coding per the requirements of the client by using JavaScript and Ext JS.

Interacted with team for analysis, design and development of database using relational database concepts.

Involved in building complex SQL queries and ETL scripts for data extraction and analysis to meet the application requirements.

Developed SQL Server stored procedures, tuned SQL queries (using Indexes and Execution Plan), and developed UDF's, Views, and created Triggers to maintain the Referential Integrity.

Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL server.

Created web services using Restful and JSON to communicate with external systems.

Implemented client side and server-side data validations using the JavaScript, JQuery.

Designed and developed various data gathering forms using HTML, CSS, JavaScript, JSP and Servlets.

Developed server-side modules using JSP, Servlets and MVC framework.

Experience in implementing of J2EE standards, MVC2 architecture using Struts Framework.

Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans.

Created Java Beans accessed from JSPs to transfer data across tiers.

Written Struts Action Classes and Action Forms and have also written Business logic in Session Beans.

Experience in going through bug queue, analyzing and fixing bugs, escalation of bugs.

Monitored the error logs using Log4j and fixed the problems.

Performed unit testing and integration testing.

Environment: JDK, JSP, J2EE, Struts, EJB, Servlets, JMS, JNDI, JDBC, SQL Server, SQL, T-SQL, Web logic, Eclipse workshop IDE, JQuery and JSON.



Contact this candidate