Sign in

Data Java

Morrisville, North Carolina, United States
March 01, 2018

Contact this candidate


Veera Mani

Hadoop/Spark Developer 832(821)-5280

Professional Summary:

Overall 8+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies.


●Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop (1.0 and YARN) MapReduce, Spark, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper in a range of industries such as Financing sector and Health care.

●Experience in writing Hive Queries for processing and analyzing large volumes of data.

●Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.

●Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.

●Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.

●Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Partitioner to speed up the jobs.

●Used HBase in accordance with Hive as and when required for real time low latency queries.

●Experienced in writing Hadoop Jobs for analyzing data using HIVE Query Language (HQL),

Pig Latin (Data flow language), and custom MapReduce programs in Java.

●Good understanding of NOSQL databases like MongoDB, Cassandra, and Hbase.

●Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub and Bit Bucket.

●Experience in using different source code version control tools like GIT, Subversion (SVN), and TFS.

●Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.

●Involved in Creating tables, partitioning, bucketing of table and creating UDF's in Hive.

●Worked on various serialization techniques such as Avro, Parquet for performance.

●Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.


●Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala and Python.

●Used Spark-SQL to perform transformations and actions on data residing in Hive.

●Used Kafka & Spark Streaming for real-time processing.

●Expertise in designing and development Spark Scala programs to filter and transform ingested data using RDDs, Datasets and Data frames APIs.

●Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).

●Highly skilled in integrating Kafka with Spark streaming for high speed data processing.

●Experience in tuning and improving the performance of Spark jobs by exploring various options.


●Ability to spin up different AWS VPC like Ec2, EBS, S3, EMR using cloud formation templates.

●Hands on experience with Amazon Redshift integrating with Spark.

●Good Experience on Amazon Web Service resources like Redshift, VPC, SNS & SQS.

●Extensive experience in working with broad range of Amazon Web Services (AWS) cloud services and it's features like Auto Scaling, AWS Storage, ELB, EBS, VPC, Security Groups, Access Control Lists (ACL), and S3.

BI/ETL Tools

●Hands on experience with BI tools Like QlikView and Tableau to generate reports.

●Hands on experience in Business Intelligence and Data-Warehousing.

●Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

●Extensive experience in working with different databases such as Oracle, IBM DB, RDBMS, SQL Server, NoSQL, MySQL and writing Stored Procedures, Functions, Joins and Triggers for different Data Models.

●Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data-warehouses.


●Excellent Object-Oriented Programming (OOP) skills with C++ and Java and in-depth understanding of data structures and algorithms.

●Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns and Collections.

●Documented the events, workflows, code changes, bugs fixes related to enhancing new features and correcting code defects.

Technical Skills

Bigdata Ecosystem

HDFS and Map Reduce, Pig, Hive, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases

HBase, MongoDB, Cassandra.

Java Technologies

Java, J2EE, JDK1.4/1.5/1.6/1.7/1.8, JDBC, Hibernate, XML,

Parsers, JSP 1.2/2, Servlets, EJB, JMS, Struts, Spring Framework, Java Beans, AJAX, JNDI.


Amazon EMR, EC2, EBS, S3, Lambda, Redshift

Spark components

Spark core, Spark SQL, Spark Streaming


Netezza, Cassandra, SQL Server, MySQL, Postgres, ORACLE and DB2.

Programming Languages

C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Swift, Android, PL/SQL, HQL, Unix, Shell Scripting.

Scripting Languages

Python (Numpy, pandas, matlibplot), Perl, Shell, Sheme, Tcl, Unix Shell Scripts, Windows Power Shell

Web Technologies

HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js.

Development Methodologies

Waterfall, UML, Design Pattern (Core Java and J2EE), Agile Methodologies (Scrum).

IDE Development Tools

Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij, Jetbrains.

Reporting tools

Tableau, QlikView

Management Tech

SVN, Git, Jira, Maven.

Virtualization Technologies

VMWare ESXi, windows Hyper-V, Power VM, Virtual Box, Citrix XEN, KVM

Web Servers

Web Logic, Web Sphere, Apache Tomcat, JBOSS.

Web Services



MVC, Struts, Hibernate, Spring Framework, Spring Boot.

Credit Suisse, Princeton, NJ Sep 2015 - Present

Spark/Hadoop Developer

Retail Enterprise Credit Risk application calculates Bank’s retail data such as credit cards, auto, student and home loans for risk domains including Enterprise Capital Management. Data comes from different System of records such mainly from Teradata. This data will undergo several cleansing and value added processing and then finally views will be created on this data as Hadoop warehouse. This data will be consumed by downstream like ECM for analyzing and generating reports.


●Involved in Installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop cluster in different environments such as Development Cluster, Test Cluster and Production.

●Deployed scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.

●worked with Amazon Web Services (AWS) using EC2 for hosting and Elastic map reduce (EMR) for data processing with S3 as storage mechanism.

●Involved in developing and Maintaining Applications written for Amazon Simple Storage.

●Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data to partitioned tables using spark.

●Involved in converting the files in HDFS into RDD's which are multiple data formats and performing Data Checking using RRD Operations.

●Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.

●Performed several operations in Spark such as Map, flat Map, filter, reduce By Key, group By Key, aggregate By Key, Union-all, combine By Key etc.

●Implemented Spark SQL to access hive tables into spark for faster processing of data.

●Used Spark Sql with python for creating data frames and performed transformations on data frames like adding schema manually, casting, joining data frames before storing them.

●Worked on Spark streaming using Apache Kafka for real time data processing and implemented Oozie job for daily import.

●Experience in creating Kafka producer and Kafka consumer for Spark streaming.

●Used Spark Dataframe Operations to perform required Validations in the data and to perform analytics on the Hive data.

●Wrote User Defined functions (UDFs) for special functionality for Spark using Scala.

●Good knowledge in Machine Learning Concepts by using Mahout and Mallet packages.

●Worked on three layers for storing data such as raw layer, intermediate layer and publish layer.

●Used parquet file format for published tables and created views on the tables.

●Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.

●Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.

●Creating impala views on top of Hive tables for faster access to analyze data.

●Preparing JIL scripts for scheduling the workflows using Autosys and automated jobs with Oozie.

●Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.

●Worked with Apache SOLR to implement indexing.

●Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.

●Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.

●Working knowledge of various AWS technologies like SQS Queuing, SNS Notification, S3storage, Redshift, Data Pipeline, EMR.

●Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in amazon s3 buckets.

●Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

Environment: HDFS, Spark, Hive, Pig, Map Reduce, Hue, Sqoop, Putty, Apache Kafka, Apache Mesos, AWS, Java Netezza, Cassandra, Oozie, Spark, SPARK SQL, Maven, Java, Scala, SQL, Linux, Toad, YARN, Main frames, Agile Methodology and Tableau.

Sears Corporation, Chicago, IL Feb 2014 – Aug 2015

Hadoop Developer

Designed and developed big data solutions involving Terabytes of data. The big data solution consists of collecting large amounts of log data from distributed sources, transformations and standardizations analysis, statistics, aggregations and reporting etc. Built an on-demand elastic Hadoop cluster infrastructure to cater the needs of various Big Data projects, automated various Big Data workflows to process and extracts analytics out of the data using MapReduce, Pig and Hive.


●Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.

●Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.

●Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.

●Used the optimization techniques including partitioning and bucketing in Hive to enable query the data more efficiently.

●Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.

●Involved in performance tuning of Hive from design, storage and query perspectives.

●Involved in developing Impala scripts to do Adhoc queries.

●Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the Dev, Uat and prod environment.

●Loaded some of the data into Cassandra for fast retrieval of data.

●Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.

●Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.

●Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.

●Developed custom writable classes for Hadoop serialization and De-serialization of time series tuples.

●Developed Spark scripts to import large files from Amazon S3 buckets.

●Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.

●Integrated Hive and Tableau Desktop reports and published to Tableau Server.

●Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.

●Used Jira for bug tracking and BitBucket to check-in and checkout code changes.

●Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

●Used Apache HUE interface to monitor and manage the HDFS storage.

●Utilized Git for version control, JIRA for project tracking and Jenkins for continuous integration/delivery (CI/CD).

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL, Jenkins, Git, JiRA, UNIX Shell Scripting, Cloudera.

Northern Trust Bank, IL Feb 13 – Jan 14 Hadoop Developer

The purpose of the project is to analyze the data coming from the different sources into the Hadoop data center unit. Created programs to process large volumes of data through a lot of prepay concepts which analyze, produce suspect claims and it helps to generate Datasets for visualization. This suspect claim verified again and it saves millions of dollars to the company every year.


●Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.

●Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.

●Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.

●Experienced in managing and reviewing Hadoop log files.

●Developed Map Reduce programs in Java for parsing the raw data and populating staging tables.

●Created Map Reduce programs to handle semi/unstructured data like xml, Json, Avro data files and sequence files for log files.

●Load and transform large sets of structured, semi structured and unstructured data.

●Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.

●Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.

●Implemented Hive Generic UDF's to implemented business logic around custom data types

●Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.

●Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.

●Implemented Partitions, Buckets in Hive for optimization.

●Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.

●Created HBase tables to store various data formats of data coming from different portfolios.

●Experience in troubleshooting in MapReduce jobs by reviewing log files.

●Developed end-to-end search solution using web crawler, Apache Nutch& Search Platform, Apache SOLR.

Environment: Hadoop, Cloudera Manager, Linux, RedHat, CentOs, Ubuntu Operating System, Teradata, Map Reduce, HBase, SQL, Sqoop, HDFS, Kafka, UML, Apache SOLR, Hive, Oozie, Cassandra, maven, Pig, UNIX, Python, and Git.

Quad One, India Oct 2010 – Dec 2012

JAVA Developer

This project is mainly based on computerizing warehouse transactions which maintains all the operations and transactions pertaining to online inventory. Also we were involved in the design and implementation of online inventory, which dealt with the maintenance of entire stock at the firm. This application mainly deals with the information about purchases, stocks, stores and also updating the transactions.


●Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.

●Extensively used java collections framework (list, set, map and queues).

●Designed various applications using multi-threading concepts, mostly used to perform time consuming tasks in the background.

●Proficient in developing static web applications with HTML5, CSS3, XHTML, DHTML, JavaScript, XML, Bootstrap, AJAX, AngularJS, JSON.

●Completely involved in back-end development (Business Layer) of the application using Java/J2EE technologies.

●Worked in all the modules of the application which involved front-end presentation logic developed using Spring MVC, JSP, JSTL, Servlets and data access layer using Hibernate framework.

●Use of Joins, Triggers, Stored Procedures and Functions in order to interact with backend database using SQL.

●Experience in developing Middle-tier components in distributed transaction management system using Java. Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.

●Responsible for periodic generation of reports.

●Performed Unit testing of the application using JUNIT.

●Developed ANT script for compiling and deployment.

●Review the changes on the weekly basis and ensure the deliverables to be quality.

●Used Eclipse IDE to deploy application on TOMCAT server.

●Used SVN as centralized version control system and log4j for logging.

●Documented the events, workflows, code changes, bugs fixes related to enhancing new features and correcting code defects.

Environment: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, UML, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

New India Assurance, Mumbai, India Jun2009– oct2010

Role: SQL Developer

The Integrated Insurance Management System handles all the key insurance functions, including generating quotations, handling of policies, claims, agency management and customer relationship management (CRM). It provides online facility to clients for their auto insurance policy. The online system utilizes the IIMS system and allows customer to apply for auto insurance, request modification to their policy, view their policy information, and online support.


●Participated in designing the Logical model by using Erwin based on the business requirement.

●Performed System Study and Requirements Analysis, prepared Data Flow Diagrams, Entity Relationship Diagrams, Table Structures, with a lot of interaction with the client.

●Facilitated consistent data entry into the database by developing stored procedures and triggers.

●Developed stored procedures to retrieve customer information from as required to evaluate eligibility for loan requests, account status etc.

●Wrote store procedures to generate account statements for different types of accounts.

●Maintained data integrity by creating checks and constraints.

●Monitored performance and optimizing SQL queries for maximum efficiency.

●Optimized the performance of queries by modifying the existing index system and rebuilding indexes again with respect to the I/O.

●Fine tuning of database objects and server to ensure efficient data retrieval.

●Facilitated easy user interface implementation and enforced security on critical customer information by developing indexed generic views.

●Implemented various types of SSIS Transforms in Packages including Slowly Changing Dimension, Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.

●Developed and deployed packages in SSIS, imported data on daily basis from the OLTP system, Staging area to Data Warehouse and Data Marts.

●Created and implemented Cubes, and designed attribute relationships for optimal performance of Hierarchies and Fact Dimensions.

●Involved in Backup, Restoring and Moving databases from heterogeneous sources to multiple environments.

●Worked on DTS Package, DTS Import/Export for transferring data from heterogeneous database

●Converted complex business logic into SQL Stored Procedures and user-defined functions to achieve functionality required by the UI team.

●Developed documentation that sufficiently describes technical deliverable as required for internal controls so that the maintenance responsibility can be passed on to production support team.

Environment: SQL Server 2005 Enterprise, SQL Query Analyzer, ER-Win, T-SQL, DTS, Excel, ER-diagrams.

Contact this candidate