Sign in

Data Developer

Milwaukee, WI
June 14, 2018

Contact this candidate





Over 7+ years of professional experience as software developer in Design, Development, Deployment, and support of large scale distributed systems.

Experience in Bigdata related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper and Oozie.

Excellent understanding or knowledge of Hadoop architecture and various components such as Big Data and Hadoop Files System HDFS, Job Tracker, Task Tracker, Name Node, Data Node (Hadoop1.x), YARN concepts like Resource Manager, Node Manager (Hadoop 2.x)

Experience with distributed systems, large-scale non-relational data stores, RDBMS, NoSQL Map-Reduce systems, Data modeling, database performance, and multi-terabyte data warehouses.

Experience in loading data from Enterprise data lake (EDL) to HDFS using Sqoop scripts.

Experience in designing, developing and testing ETL processes using Informatica.

Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend.

Good knowledge in Talend Integration experience with AWS.

Experience in Deploying Industrial scale Data Lake on Cloud platform.

Exposure in Designing and creating HDFS Data Lakes by drawing relationship between different sources of data from various systems.

Experience with developing and securing applications against Data pipeline in large scale environments using Apache Flink.

Scala and Java, Created frameworks for processing data pipelines through Spark.

Experience in Developing Python code to gather the data from HBase and designs the solution to implement using PySpark.

Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF).

Extensive experience in Designing, Installation, Configuration and Management of Apache Hadoop Clusters, Hadoop Eco systems and Spark with Scala as well Python.

Extensive experience in Data Ingestion, Transformation, Analytics using Apache Spark framework, and Hadoop ecosystem components.

Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.

Experience using Kafka cluster for Data Integration and secured cloud service platform like AWS and doing Data summarization, Querying and Analysis of large Datasets stored on HDFS and Amazon S3 filesystem using Hive Query Language (HiveQL).

Experience utilizing Docker and Kubernetes in a public cloud environment.

Experience in Amazon AWS services such as Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.

Experienced in working with different Reporting Tools POWER BI and Tableau.

Experience in streaming data using Apache Storm from source to Hadoop.

Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming or HDF.

Experience in developing Oozie workflow scheduling and orchestrating with ETL process.

Good knowledge on Spark architecture and real-time streaming using Spark.

Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL and Spark streaming.

Hands on experience in working on Spark-SQL queries, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.

Expertise in integrating the data from multiple data sources using Kafka.

Experience of working with different Integrated Development Environments (IDE)’s, Continuous Integration (CI)/Continuous Development (CD) tools is desired.

Experience working with Apache SOLR for indexing and querying.

Strong experience in Hadoop development and Testing big data solutions using Cloudera Distribution, Hortonworks), Amazon Web Services(AWS), Azure.

Experience in Active Development as well as onsite coordination activities in web based, client/server and distributed architecture using Java, J2EE which includes Web services, Spring, Struts, Hibernate and JSP/Servlets.

Good working knowledge on servers like Tomcat, Web Logic 8.0.

During this period I have also acquired strong knowledge of Software Quality Processes and SDLC (Software Development Life Cycle).

Extensively worked on Java development tools, which includes Eclipse Galileo 3.5, Eclipse Helios 3.6, Eclipse Mars 4.5, WSAD 5.1.2.


Big Data Ecosystem : Hadoop 0.22.0, MapReduce, HDFS, HBase, Zoo Keeper, Hive, Pig,

Sqoop, Cassandra, Oozie, Azkaban

Java/J2EE : Java 6, Ajax, Log4j, JSP 2.1 Servlets 2.3, JDBC 2.0, XML, Java Beans

Methodologies : Agile, UML, Design Patterns

Frameworks : Struts, Hibernate, Spring

DataBase : Oracle 10g, PL/SQL, MySQL

Application Server : Apache Tomcat 5.x 6.0, JBoss 4.0

Web Tools : HTML, Java Script, XML, XSL, XSLT, XPath, DOM

IDE/ Testing Tools : NetBeans, Eclipse

Scripts : Bash, ANT, SQL, HiveQL, Shell Scripting

Testing API : JUNIT

Navistar Inc - Naperville, IL

Spark/Scala Developer

Feb 2016 - Present


Navistar International Corporation is a leading North American truck manufacturer with great products, strong market positions and best-in-class distribution. The project involves in examining customer information and improves customers experience and wants to provide better and feasible alternates as per the ongoing Marketing Strategy. As part of the team I am involved in data related to customer's reviews, suggestions and their inputs collectively grouped together on regular intervals.


Responsible for building scalable distributed data solutions using Hadoop.

Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.

Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.

Configured deployed and maintained multi-node Dev and Test Kafka Clusters.

Developed Spark scripts by using Scala shell commands as per the requirement.

Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.

Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.

Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.

Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

Responsible to manage data coming from different sources.

Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.

Developed Microservices based on Restful web service in Scala which handles high concurrency and high volume of traffic.

Hands-on experience with Talend - ETL (Extract-Transform-Load) tools.

Developing Talend jobs by using the context variables and scheduling the jobs to run it automatically.

Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend.

Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.

Worked extensively with Sqoop for importing metadata from Oracle.

Involved in creating Hive tables, and loading and analyzing data using hive queries.

Developed Hive queries to process the data and generate the data cubes for visualizing.

Implemented schema extraction for Parquet and Avro file Formats in Hive.

Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.

Create, manage, and modify logical and physical data models using a variety of data modeling philosophies and techniques.

Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).

Supported current migration of Data Center to Amazon Cloud.

Extensively worked with S3 bucket in AWS.

Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.

Performed data manipulation, data shaping and data cleansing.

Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Scala, Python, Pyspark, Kafka, Hive, Sqoop, Elastic Search, Impala, Cassandra, Data modelling, Solr, Tableau Desktop, Power BI Server, Talend, Oozie, Jenkins, Cloudera, AWS-S3, Oracle 12c, Linux.

Wells Fargo – Charlotte, NC

Hadoop Developer

Sept2014 - Feb2016


This project is implemented to track and Analyze logs from various sources of Retail Banking. The data is used to generate reports for various business requirements that include finding the trends of customers / services of modules / max usage timings / usage medium / usage platforms etc.


Experienced to implement Hortonworks distribution system

Analyzed large data sets by running Hive queries and Pig scripts.

Involved in creating Hive tables, and loading and analyzing data using hive queries.

Developed Simple to complex MapReduce Jobs using Hive and Pig.

Involved in running Hadoop jobs for processing millions of records of text data

Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Worked on various Linux environments like Centos, Ubuntu and RedHat.

Experience in installing, upgrading and configuring Red Hat Linux Interactive Installation.

Involved in loading data from LINUX file system to HDFS

Responsible for managing data from multiple sources.

Experience working with ETL Informatica for data Integration.

Experienced in running Hadoop streaming jobs to process terabytes of xml format data.

Load and transform large sets of structured, semi structured and unstructured data.

Created ETL jobs to load data into MongoDB and transported MongoDB into the Data Warehouse.

Extracted data from MongoDB, Hbase, Sqoop and placed in HDFS for processing.

Assisted in exporting analyzed data to relational databases using Sqoop .

Comfortable coordinating with offshore team for development and support activities.

Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Writing custom user-defined functions (UDFs).

Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.

Environment: Hortonworks, Hadoop, HDFS, Pig, Hive, Map Reduce, MongoDB, Sqoop, LINUX, Swift and Big Data

GNS Healthcare - Cambridge, MA

Hadoop Developer

June2013 – Aug2014


Health record team of GNS Health initiative gathers patient/person information across all the data sources and creates Person record that will be used by downstream systems for running analytics against that data.


Oversee the performance of Design to develop technical solutions from Analysis documents.

Exported data from DB2 to HDFS using Sqoop.

Extracted the data from Teradata into HDFS using Sqoop. Analyzed the data by performing Hive queries

Developed MapReduce jobs using Java API.

Good Knowledge in developing MapReduce programs using Apache Crunch.

Implemented high performing ETL pipelines in Java, Hadoop.

Installed and configured Pig and also wrote Pig Latin scripts.

Wrote MapReduce jobs using Pig Latin.

Developed workflow using Oozie for running MapReduce jobs and Hive Queries.

Worked on Cluster coordination services through Zookeeper.

Worked on loading log data directly into HDFS using Flume.

Experienced in running Hadoop streaming jobs to process terabytes of xml format data.

Responsible to manage data coming from different sources.

Assisted in exporting analyzed data to relational databases using Sqoop.

Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts

Experience in defining, designing and developing Java applications, specially using Hadoop [Map/Reduce] by leveraging frameworks such as Cascading and Hive.

Experience in Develop monitoring and performance metrics for Hadoop clusters.

Experience in Document designs and procedures for building and managing Hadoop clusters.

Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also setup large clusters.

Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.

Involved in working on Cassandra database to analyze how the data get stored.

Successfully loaded files to Hive and HDFS from Mongo DB Solar.

Experience in Automate deployment, management and self-serve troubleshooting applications.

Define and evolve existing architecture to scale with growth data volume, users and usage.

Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.

Installed and configured Hive and also written Hive UDFs.

Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6).

Verizon - Irving, TX

Hadoop Developer

June 2012-June2013


Verizon operates multiple wire lines and wireless networks serving consumer, business customers. I worked as a Hadoop Developer in Data Insights team where I performed analysis on huge data sets and helped the organization get a competitive advantage by finding out the customer trends which helped in ad targeting and network optimization. In ad targeting, we collected the data from all the customers and our team performed the analysis by using various traits such as demographics, purchase history etc. and did the advertising based on the results from the analysis. In network optimization, we collected the data from all the network towers and conducted analysis to find out the peak hour usage of each tower and strategic locations to put new towers to increase customer satisfaction.


Worked on the proof-of-concept for Apache Hadoop framework initiation.

Worked on Installed and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.

Importing and exporting data into HDFS and HIVE using Sqoop

Responsible to manage data coming from different sources

Monitoring the running MapReduce programs on the cluster.

Responsible for loading data from UNIX file systems to HDFS.

Installed and configured Hive and also written Hive UDFs.

Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.

Implemented the workflows using Apache Oozie framework to automate tasks.

Knowledge of agile methodology for delivering software solutions.

Collaborating with all the other members of the team to take shared responsibility for the overall efforts that the team has committed to.

Developed scripts and automated data management from end to end and sync up b/w all the clusters.

Execute performance tests of custom-built applications.

Experience in working with Hadoop testing team to deliver high quality software on time.

Allow end-users to perform user acceptance testing.

Execute regression tests that compare the outputs of new application code with existing code running in production

Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, mySQL, Toad 9.6, Windows NT, UNIX, Sqoop, Hive, Oozie.

GEICO – Maryland, Chevy Chase

Java Developer

Jan 2012 - June 2012


This project involved the development of a Web based application which is used by the Insurance Agents to sell policies, endorsement, process policy claims, maintain the reports of the policy holder etc. The customer can access complete account information where the customer can login and order for a policy, Motor Vehicle Report(MVR), Claim, Endorsements for existing policies, cancellation etc.


Involved in setting up the environments for the applications.

Used Spring framework for dependency injection, transaction management and AOP.

Designing and coding of Business Classes using Spring IOC and AOP features. Developed JUnit tests for the modules.

Used JQuery for UI Validations.

Worked on the Common UI layout for defining the Header, Fotter and Menu using JSF Facelets

Used JSON for hanging request/response of RESTful service.

Experience in developing web services with XML based protocols such as SOAP and WSDL.

Created new and modified existing SQL and PL/SQL queries as per the requirements.

Wrote ANT build scripts to compile java classes and create jar, performed Unit testing and package them into ear files.

Responsible for developing the UI pages using HTML, CSS, JavaScript and Bootstrap.

Experience in using various JQuery UI controls and corresponding Event handlers etc.

Used Hibernate framework for back end development and Spring dependency injection for middle layer development.

Written Database objects like Triggers, Stored procedures in SQL.

Environment: Java, J2EE, Servlets, JSP, JDBC, Spring, Hibernate, Web services, WSDL, SOAP, REST, Axis, Jersey, SOA, HTML, DHTML, CSS, XML, AJAX, jQuery, ANT, MAVEN, TOAD, SQL, PL/SQL, Oracle, Design Patterns, UNIX, Tomcat, Oracle, Windows 7.

Symbiosis - Hyderabad, Telangana

Java Developer

Sep 2010 – Dec 2011


Symbiosis client service professionals work across more than 20 industry sectors such as data analytics, cyber security, technology strategy, finance and accounting, tax, strategic risk, business risk, strategy and operations, and human capital.


Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.

Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.

Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.

Developed portlet kind of user experience using Ajax, jQuery.

Used spring IOC for creating the beans to be injected at the run time.

Modified the existing JSP pages using JSTL.

Used spring tool suite (STS) as the ide for the development.

Used jQuery script for client side JavaScript methods.

Developed the Pig UDF'S to pre-process the data for analysis.

Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through Eclipse IDE

Involved in writing PL/SQL for the stored procedures.

Involved in the development of Web Interface using MVC Struts Framework.

User Interface was developed using JSP and tags, CSS, HTML and Java Script.

Database connection was made using properties files.

Used Session Filter for implementing timeout for ideal users.

Used stored Procedure to interact with database.

Environment: Linux, MySQL, MySQL Workbench, Eclipse, J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Tomcat, Web logic 7.0/8.1


Bachelor’s in Electronics and Communication Engineering.

Contact this candidate