Post Job Free

Resume

Sign in

Manager Data

Location:
United States
Posted:
June 08, 2014

Contact this candidate

Resume:

Professional Summary

Over all *.* years of experience in IT background in datawarehousing

technologies which includes 2.5 years of development experience in Hadoop

Technologies. Have worked extensively on other MPP databases like Teradata

and many ETL platforms including Informatica, datastage, UNIX and

Mainframes

Strong communication skills with good knowledge and practical

experience in datawarehousing concepts.

Willing to learn latest emerging Technologies in datawarehousing

world.

Experience in Hadoop:

. Worked on Java Map Reduce,HIVE,PIG,OOZIE,SQOOP, Impala on cloudera

(CDH 4.1 and 4.3) Cluster

. Worked on Tools like DMExpress which help in developing complex

transformation with the GUI.

. Monitoring Tools like Cloudera manager.

. Good Knowledge on hadoop streaming and writing UDF's with phython.

. Have good knowledge on Hadoop cluster set up and admin activities like

Fairschedular administration, Kerberos authentication, NameNode

Failover, HA, Datanode addition/removal,Balancer,Cluster

Configuration,JVM configuration, Disaster Recovery etc.

. Developed several shell scripts which acts a wrapper to start these

Hadoop jobs and set the configuration parameters

. Worked rainstor database built on a Hadoop platform

EDUCATION:

. Bachelors in Engineering, Chennai, India.

Technical Skills:

Hadoop Ecosystem Java Map Reduce,HIVE,PIG,OOZIE,SQOOP, Impala,

Development Cloudera Manager, HUE, Rainstor, DM express

Operating System LINUX (RedHat, Ubuntu), Unix (Solaris 8, 10), Windows

XP, Server 2000, 03, Server 2008, Mainframe ZOS

Databases Teradata, Rainstor, Sqlserver

ETL Technologies Informatica, Datastage, UNIX and Mainframe.

Scheduler Autosys,Oozie

Professional Experience:

Hadoop Developer in Bank of America Nov 2011 to Present (2.5

years)

Projects worked on:

Converting existing Teradata datawarehouse systems into Hadoop systems.

Roles & Responsibilities:

. Started with POC on Cloudera Hadoop converting one

small,medium,complex legacy system into Hadoop

. After successful POC started to convert the existing teradata systems

built on other ETL platforms like into suitable Java mapreduce, Hive

and Pig Latin. Having background on all the ETL technologies helped me

to analyze and convert the existing system faster into hadoop.

. Schedule these jobs with workflow engine like oozie. Actions can be

performed both sequentially and parellaly using OOZIE.

. Built wrapper shell scripts to hold these oozie workflow..

. Schedule these jobs using Autosys so that it can meet the external

dependencies and requirements.

. Integrate data from various sources into Hadoop and Move data from

Hadoop to other databases using sqoop import and Export.

. Use Cloudera manager to pull metrics on various cluster features like

JVM, Running Map and reduce tasks etc.

. Created pig latin scripts for performing action that can done with

Stored procedures in traditional database.

. Created many Java UDF and UDAFs in hive for functions that were not

pre existing in Hive like the rank,Csum,etc functions in teradata.

. Do various performance optimizations like using distributed cache for

small datasets, partition and bucketing in hive, doing mapside joins

etc.

. Building various log processing tools for collecting valuable

information like user logins, Action performed etc for various bank

security audits and also use Cloudera Navigator

. Did POC on Rainstor which is database built on top of Hadoop platform

famous for its archival strategies and retention policies on the

datasets. Replacement for tape archive and can be used as query tool

for users to query data that is residing in Hadoop platforms in the

form of active archives.

. Involved in POC on DM Express which can effectively considered as

replaced for writing complex java mapreduce programs and do all these

transformations in GUI form with the DMX plugged into the Sort phase

of the mapreduce program,

. Work on fairschedular pools to increase /decrease number of

mappers/Reducers allocated to a particular running job based on the

User demand and delivery SLA. This is constrained to the limitation on

total number of available mapper and reducer slots on the cluster.

. Move HDFS and Hive metadata between production and disaster recovery

platforms with BDR product from cloudera which uses distcp utility to

move data.

. Maintain and do load balancing with automated scripting across the

client nodes( Edge nodes) to Hadoop cluster through which user

interacts and executes the commands.

. Involved in several discussion for future products like informatica

9.5 with Hadoop connectors,

. Coded for many log file processing and field wise data analysis using

UNIX AWK scripts

. Coded scripts to monitor HDFS and Edge node directory space monitoring

which are set for spacequota.

. Latest project was to keep hadoop cluster as SOR provisioning point

where are the input files for various lobs are stored in HDFS.

. Connected to hive backend postgres sql and extracted critical hive

metadata information and populated system dictionary tables in hive.

. Worked extensively on Kerberos security renewal and the to pass the

token from the oozie to mapreduce to hive jobs etc

. Created concurrent access for hive tables with shared and exclusive

locking that can be enabled in hive with the help of Zookeeper

implementation in the cluster

. Worked on transforming Hive tables into impala tables which works good

for real time queries with better performance by avoiding mapreduce

and act directly on HDFS files.

Teradata datawarehouse developer in Bank of America Dec/2008 to

Nov 2011 (3 years)

Projects/Technologies worked on:

Informatica ETL loading into Teradata.

Roles & Responsibilities:

. Develop Informatica mappings based on user requirements.

. Pull the data from various external sources like Mainframe and Oracle

using Power exchange.

. Load into Target Teradata systems using Teradata specific loaders like

MLOAD,Fastload and TPUMP

. Implement Slowly changing Dimension logics in the mapping to

effectively handle change data capture which is typical in

datawarehousing systems.

. Write several custom transformations including custom Bteq

transformation for transformion

. Create workflows, worklets and mapplets to hold these transformations.

. Develop complex Distribution Layer with Teradata Bteq's written with

shell scripts to effectively load aggregated reporting data for

downstream systems which builds Microstatergy and cognos reports.

. Do performance analysis on both informatica and target teradata

systems by analyzing for various bottlenecks and implementing indexes,

collect stats and query tuning in teradata.

. Involved in creating datamodel creation for teradata tables.

Expertised myself in Teradata datamodeling techniques which involves

complex index analysis and Match the user requirement in a best

performance efficient way..

Datastage ETL loading into Teradata.

Roles & Responsibilities:

. Develop complex transformation from datastage sequence and parallel

stages.

. Worked on IBM's BDW model built with persistent dataset concept for

CDC in datastage.

. Manage performance on datastage environment by effectively

implementing partitioners and collectors depending on the requirement

without any data loss.

. Run Orchadmin dump of persistent datasets into a ascii formatted text

file and use it for various data analysis.

. Used Teradata parallel export and ODBC connector stage to move the

transformed data into teradata for downstream users.

. Start a DSRunjob using a shell script and schedule the script from

Autosys scheduler.

. Develop Generic datastage jobs for reusability with various columns

with Generic stage and Runtime column propagation.

Mainframe ETL loading into Teradata.

Roles & Responsibilities:

. Convert the input files into teradata loadable format with mainframe

cobol programs.

. Develop JCL to invoke programs to load to teradata using standard

teradata utilities like BTEQ,TPUMP,MLOAD,FASTLOAD,FEXP.

. FTP/ NDM files to downstream systems

. Use JCL sort utilities to do file manipulation like removing header,

trailer, concatenating data, reconciliation etc.

. Package the components in Changeman libraries and checkout the changes

to production LPARs.

Personal Details

Date of Birth - May 13, 1987

Sex - Male

Nationality - Indian

Current Location - Jacksonville Florida

Visa Type - H1B

Visa End Date -Dec31 2015



Contact this candidate