Sign in

Data Developer

Princeton, NJ
June 10, 2019

Contact this candidate


Mani Kurra

Big Data Developer

Mobile: 480-***-**** * 207


Mani has over 7 years of experience spread across Hadoop, Scala, Java, Python and ETL. He has extensive experience in Big Data Technologies and in development of standalone and web applications in multi-tiered environments using Java, Hadoop, AWS, Spark, Hive, Pig, Sqoop, J2EE Technologies (Spring, Hibernate), Oracle, HTML, and Java Script. He has 4 years of comprehensive experience as a Hadoop Developer. Mani has very good communication, interpersonal and analytical skills and works well as a team member or independently.

Passionate towards working in Big Data and Analytics environment.

Extending Pig and Hive core functionality by writing custom UDF’s for Data Analysis.

Data transformation, file processing, and identifying user behavior by running Pig Latin Scripts and expertise in creating Hive internal/external Tables/Views using shared Meta Store.

Experience in writing scripts in HiveQL. Develop Hive queries helps for visualizing business requirement.

Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.

In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.

Extending Hive and Pig core functionality by writing custom UDF’s.

Good understanding of Zookeeper and Kafka for monitoring and managing Hadoop jobs.

Experience in NoSql technologies like Hbase, Cassandra, and Neo4j for data extraction and storing huge volumes of data.

Worked with Spark to create structured data from the pool of unstructured data received.

Experience in developing software in Python using libraries- numpy, matplotlib, pyspark, Pandas data frame, pymongo for database connectivity

Extensive experience with SQL, PL/SQL and database concepts.

Knowledge of NoSQL databases such as HBase, Dynamo DB and MongoDB.

Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.

Experience in developing solutions to analyze large data sets efficiently.

Good working knowledge of clustering, compression and continuous performance.

Experience in Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases. Used Informatica for ETL processing based on business.

Expertise in installing, configuration and administration of Tomcat Web Sphere. Understanding of cloud-based deployments into Amazon EC2 with Salt.

Hands on experience working on Talend Integration Suite and Talend Open Studio. Experience in designing Talend jobs using various Talend components.

Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.

Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration

Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation, and production support.

Major Strengths are familiarity with multiple software systems, ability to learn quick about new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner.


JNTUK(Jawaharlal Nehru University Kakinada)

Bachelor of Technology in Electronics and Communication engineering, 2012


Languages C, C++, Java, J2EE, Python

Big Data Ecosystem Hadoop/Big DATA HDFS, HBase, Pig, Hive, Sqoop, Zookeeper, Oozie, Spark, Kafka, Cloudera, Hortonworks, AWS.

NoSQL Technologies MongoDB, Dynamo DB, Cassandra, Neo4J

Databases Oracle 11g/10g/9.i/8.X, MySQL, MS SQL Server 2000

Web technologies Core Java, J2EE, JSP, Servlets, EJB, JNDI, JDBC, XML, HTML, JavaScript, Web Services,

Frameworks Spring 3.2/3.0/2.5/2.0, Struts 2.0/1.0, Hibernate 4.0/3.0, Groovy, Camel

App Server WebLogic 12c/11g/10.1/9.0, WebSphere 8.0/7.0

Web Server Apache Tomcat 7.0/6.0/5.5

IDE IntelliJ, PyCharm, Jupyter Notebook, Eclipse, Edit Plus 2, Eclipse Kepler

Tools Teradata, SQL Developer

Testing JUnit

Operating System Linux, UNIX and Windows 2000/NT/XP/Vista/7/8/10

Methodologies Agile, Unified Modeling Language (UML), Design Patterns (Core Java and J2EE)

System Design & Dev Requirement gathering and analysis, design, development, testing, delivery

Professional experience:

Data Engineer

Otsuka Pharmaceuticals, Princeton NJ

Feb 2019 to Till Date

Worked on developing ETL applications.

Create mapping, sessions and workflows, as per requirement of business to implement the logic.

Created a frame work which copy’s data from any Linux based file system to MongoDB.

Worked on tuples, dictionaries, object-oriented concepts based inheritance features for making algorithms.

Queried MongoDB database queries from python using pymongo to retrieve information.

Design reports, dashboards and data visualization using Matplotlib.

Successfully interpreted data to draw conclusions for managerial action and strategy.

For larger datasets used Pyspark Data Frames to analyze the data.

Modified queries, functions, cursors, triggers and stored procedures for MongoDB to improve performance, while processing data.

Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.

Used Pandas to put the data as time series and tabular format for east timestamp data manipulation and retrieval.

Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with python libraries. Used TDD (Test driven development) methodology.

An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.

Strong oral and written communication, initiation, interpersonal, learning and organizing skills matched with the ability to manage time and people effectively.

Environment: Python, MongoDB, SFTP, Linux, MapReduce, HDFS, Hive, PyCharm, Jupyter Notebook, UNIX Shell Scripting.

Hadoop Developer

Thomson Reuters, Rochester NY

July 2017 to January 2019

Worked on bigdata infrastructure build out for batch, transactional processing as well as real-time processing.

Developed systems on Hadoop and Amazon Web Services.

Brought the AWS-based system to production in 9 months.

AWS system built with Lambda, API Gateway, DynamoDB, and S3.

Automated big-data processing using EMR and Spark.

Developed deployment pipelines with Code Pipeline, Code Build, and Cloud Formation.

Developed components for the normalization Rest service using Scala.

Developed Spark scripts by using Scala Shell commands as per the requirement.

Good working knowledge on AWS IAM service, IAM policies, Roles, Users, Groups, AWS access keys and Multi Factor Authentication. And migrated applications to the AWS Cloud.

Experience with AWS Command line interface and PowerShell for automating administrative tasks. Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic reaching one or more AWS EC2, LAMBDA instances.

Hand-On experience in Implementing, Build and Deployment of CI/CD pipelines, managing projects often includes tracking multiple deployments across multiple pipeline stages (Dev, Test/QA staging and production).

Worked on transferring data from on – perm to dynamo db.

Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and dynamo db.

Hands on Experience in Oozie Job Scheduling.

Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.

Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.

Agile methodology was used for development using XP Practices (TDD, Continuous Integration).

Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.

An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.

Strong oral and written communication, initiation, interpersonal, learning and organizing skills matched with the ability to manage time and people effectively.

Environment: MapReduce, HDFS, AWS, Hive, Hue, Oozie,, Bigdata, Core Java, Eclipse, Hbase, Spark, Scala, Kafka, Cloudera Manager, LINUX, Puppet, IDMS, UNIX Shell Scripting.

Hadoop Developer

Apple Inc, Sunnyvale CA

July 2016 to July 2017


Worked on bigdata infrastructure build out for batch processing as well as real-time processing.

Developed, Installed and configured Hive, Hadoop, Bigdata, hue, Oozie, pig, Sqoop, Kafka, Elastic Search, Java, J2EE, HDFS, XML, PHP and Zookeeper on the Hadoop cluster.

Created Hive Tables, loaded retail transactional data from Teradata using Scoop.

Managed thousands of Hive databases totaling 250+ TBs.

Developed enhancements to Hive architecture to improve performance and scalability.

Collaborated with development teams to define and apply best practices for using Hive.

Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.

Worked on implementation of a log producer in SCALA that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.

Implemented a data export application to fetch processed data from these platforms to consuming application databases in a scalable manner.

Involved in loading data from Linux file system to HDFS.

Experience in setting up salt-formulas for centralized configuration management.

Monitoring Cluster using various tools to see how the nodes are performing.

Experience on Oozie workflow scheduling.

Developed Spark scripts by using Scala Shell commands as per the requirement.

Expertise in cluster task like adding Nodes, Removing Nodes without any effect to running jobs and data.

Transferred data from Hive tables to HBase via stage tables using Pig and used Impala for interactive querying of HBase tables.

Implementation of auditing the data for accounting by capturing various logs like HDFS Audit logs, Yarns Audit logs, Audit logs.

Worked on a proof of concept to implement Kafka-Storm based data pipeline.

Configured job scheduling in Linux using shell scripts

Worked on Machine Learning Library (MLib) for clustering and underlying optimization primitives.

Created custom Solr Query components to enable optimum search matching.

Utilized the Solr API to develop custom search jobs and GUI based search applications.

Also, implemented multiple output formats in the same program to match the use cases.

Developed Hadoop streaming Map/Reduce works using Python.

Installation of Apache SPARK on Yarn and managing Master and Worker nodes.

Performed benchmarking of the No-SQL databases, Cassandra and HBase.

Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.

Implemented test scripts to support test driven development and continuous integration.

Clear understanding of Cloudera Manager Enterprise edition.

Good experience in Hive partitioning, bucketing and perform different types of joins on Hive

tables and implementing Hive serdes.

Working on POC and implementation & integration of Cloudera for multiple clients.

Good knowledge on Creating ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.

Design and Implement the Various ETL Projects using Informatica, Data stage as data integration tool

Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.

POC work is going on using Spark and Kafka for real time processing.

Developed Python scripts to monitor health of Mongo databases and perform ad-hoc backups using Mongo dump and Mongo restore.

Deployed the project in Linux environment

Automated the apache installation and its components using salt.

Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.

Worked with NoSQL databases like Cassandra and Mongo DB for POC purpose.

Implement POC with Hadoop. Extract data with Spark into HDFS.

Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Solr, Bigdata, Core Java, Python, Eclipse, Hbase, Flume, Spark, Scala, Kafka, Cloudera Manager, Impala, UNIX RHEL, Cassandra, LINUX, Puppet, IDMS, UNIX Shell Scripting.

Hadoop Developer/Lead

Cummins, Columbus(IN)

October 2014 to June 2016

Created Hive Tables, loaded retail transactional data from Teradata using Scoop.

Responsible for Operating system and Hadoop Cluster monitoring using tools like Nagios, Ganglia, Cloudera Manager.

Talend administrator with hands on Big data ( Hadoop ) with Cloudera framework

Proactively managed Oracle/SQL Server backups, performance tuning, and general maintenance with capacity planning of the Talend complex.

Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.

Documented the Installation, Deployment, administration and operational processes of Talend MDM Platform (production, Pre-Prod, test30, test 90 and development) environments for ETL project.

Developed and designed ETL Jobs using Talend Integration Suite (TIS) in Talend 5.2.2.

Created complex jobs in Talend 5.2.2 using tMap, tJoin, tReplicate, tParallelize, tJava, tJavaFlex, tAggregateRow, tDie, tWarn, tLogCatcher, etc.

Used tStatsCatcher, tDie, tLogRow to create a generic joblet to store processing stats.

Created Talend jobs to populate the data into dimensions and fact tables.

Created Talend ETL job to receive attachment files from pop e-mail using tPop, tFileList, tFileInputMail and then loaded data from attachments into database and archived the files.

Used Talendjoblet and various commonly used Talend transformations components like tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput&tHashOutput and many more.

Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few Java code to capture global map variables and used them in the job.

Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.

Worked on MongoDB database concepts such as locking, transactions, indexes, replication, schema design.

Real time streaming the data using Spark with Kafka.

Experience in deploying, managing and developing MongDB clusters on Linux and Windows environment

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.

Developed pig scripts for replacing the existing home loans legacy process to the Hadoop and the data is back fed to retail legacy mainframes systems.

Migrated existing on-premises application to AWS

Wrapped the Oozie java API in spring boot services that can run in PCF independently as Micro Services.

Developed a Spark job in Java which indexes data into ElasticSearch from external Hive tables which are in HDFS.

Agile methodology was used for development using XP Practices (TDD, Continuous Integration).

Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.

Environment: Hadoop, Talend, MapReduce, Cloudera, Talend Hive, Pig, Kafka, ETL, Hortonworks, SQL, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA.

Jr. Java Developer

Magna InfoTech, Delhi

August 2012 to September 2014


Developing new pages for personals.

Implementing MVC Design pattern for the Application.

Using Content Management tool (Dynapub) for publishing data.

Implementing AJAX to represent data in friendly and efficient manner.

Developing and Action Classes.

Used JMeter for load testing of the application and captured the response time of the application

Created simple user interface for application's configuration system using MVC design patterns and swing framework.

Implementing Log4j for logging and debugging.

Implementing Form based approach for ease of programming team.

Involved in software development life cycle as a team lead.

Environment: Core Java, Java Swing, Struts, J2EE (JSP/Servlets), XML, AJAX, DB2, My SQL, Tomcat, JMeter.

Contact this candidate