Professional Summary
Over all *.* years of experience in IT background in datawarehousing
technologies which includes 2.5 years of development experience in Hadoop
Technologies. Have worked extensively on other MPP databases like Teradata
and many ETL platforms including Informatica, datastage, UNIX and
Mainframes
Strong communication skills with good knowledge and practical
experience in datawarehousing concepts.
Willing to learn latest emerging Technologies in datawarehousing
world.
Experience in Hadoop:
. Worked on Java Map Reduce,HIVE,PIG,OOZIE,SQOOP, Impala on cloudera
(CDH 4.1 and 4.3) Cluster
. Worked on Tools like DMExpress which help in developing complex
transformation with the GUI.
. Monitoring Tools like Cloudera manager.
. Good Knowledge on hadoop streaming and writing UDF's with phython.
. Have good knowledge on Hadoop cluster set up and admin activities like
Fairschedular administration, Kerberos authentication, NameNode
Failover, HA, Datanode addition/removal,Balancer,Cluster
Configuration,JVM configuration, Disaster Recovery etc.
. Developed several shell scripts which acts a wrapper to start these
Hadoop jobs and set the configuration parameters
. Worked rainstor database built on a Hadoop platform
EDUCATION:
. Bachelors in Engineering, Chennai, India.
Technical Skills:
Hadoop Ecosystem Java Map Reduce,HIVE,PIG,OOZIE,SQOOP, Impala,
Development Cloudera Manager, HUE, Rainstor, DM express
Operating System LINUX (RedHat, Ubuntu), Unix (Solaris 8, 10), Windows
XP, Server 2000, 03, Server 2008, Mainframe ZOS
Databases Teradata, Rainstor, Sqlserver
ETL Technologies Informatica, Datastage, UNIX and Mainframe.
Scheduler Autosys,Oozie
Professional Experience:
Hadoop Developer in Bank of America Nov 2011 to Present (2.5
years)
Projects worked on:
Converting existing Teradata datawarehouse systems into Hadoop systems.
Roles & Responsibilities:
. Started with POC on Cloudera Hadoop converting one
small,medium,complex legacy system into Hadoop
. After successful POC started to convert the existing teradata systems
built on other ETL platforms like into suitable Java mapreduce, Hive
and Pig Latin. Having background on all the ETL technologies helped me
to analyze and convert the existing system faster into hadoop.
. Schedule these jobs with workflow engine like oozie. Actions can be
performed both sequentially and parellaly using OOZIE.
. Built wrapper shell scripts to hold these oozie workflow..
. Schedule these jobs using Autosys so that it can meet the external
dependencies and requirements.
. Integrate data from various sources into Hadoop and Move data from
Hadoop to other databases using sqoop import and Export.
. Use Cloudera manager to pull metrics on various cluster features like
JVM, Running Map and reduce tasks etc.
. Created pig latin scripts for performing action that can done with
Stored procedures in traditional database.
. Created many Java UDF and UDAFs in hive for functions that were not
pre existing in Hive like the rank,Csum,etc functions in teradata.
. Do various performance optimizations like using distributed cache for
small datasets, partition and bucketing in hive, doing mapside joins
etc.
. Building various log processing tools for collecting valuable
information like user logins, Action performed etc for various bank
security audits and also use Cloudera Navigator
. Did POC on Rainstor which is database built on top of Hadoop platform
famous for its archival strategies and retention policies on the
datasets. Replacement for tape archive and can be used as query tool
for users to query data that is residing in Hadoop platforms in the
form of active archives.
. Involved in POC on DM Express which can effectively considered as
replaced for writing complex java mapreduce programs and do all these
transformations in GUI form with the DMX plugged into the Sort phase
of the mapreduce program,
. Work on fairschedular pools to increase /decrease number of
mappers/Reducers allocated to a particular running job based on the
User demand and delivery SLA. This is constrained to the limitation on
total number of available mapper and reducer slots on the cluster.
. Move HDFS and Hive metadata between production and disaster recovery
platforms with BDR product from cloudera which uses distcp utility to
move data.
. Maintain and do load balancing with automated scripting across the
client nodes( Edge nodes) to Hadoop cluster through which user
interacts and executes the commands.
. Involved in several discussion for future products like informatica
9.5 with Hadoop connectors,
. Coded for many log file processing and field wise data analysis using
UNIX AWK scripts
. Coded scripts to monitor HDFS and Edge node directory space monitoring
which are set for spacequota.
. Latest project was to keep hadoop cluster as SOR provisioning point
where are the input files for various lobs are stored in HDFS.
. Connected to hive backend postgres sql and extracted critical hive
metadata information and populated system dictionary tables in hive.
. Worked extensively on Kerberos security renewal and the to pass the
token from the oozie to mapreduce to hive jobs etc
. Created concurrent access for hive tables with shared and exclusive
locking that can be enabled in hive with the help of Zookeeper
implementation in the cluster
. Worked on transforming Hive tables into impala tables which works good
for real time queries with better performance by avoiding mapreduce
and act directly on HDFS files.
Teradata datawarehouse developer in Bank of America Dec/2008 to
Nov 2011 (3 years)
Projects/Technologies worked on:
Informatica ETL loading into Teradata.
Roles & Responsibilities:
. Develop Informatica mappings based on user requirements.
. Pull the data from various external sources like Mainframe and Oracle
using Power exchange.
. Load into Target Teradata systems using Teradata specific loaders like
MLOAD,Fastload and TPUMP
. Implement Slowly changing Dimension logics in the mapping to
effectively handle change data capture which is typical in
datawarehousing systems.
. Write several custom transformations including custom Bteq
transformation for transformion
. Create workflows, worklets and mapplets to hold these transformations.
. Develop complex Distribution Layer with Teradata Bteq's written with
shell scripts to effectively load aggregated reporting data for
downstream systems which builds Microstatergy and cognos reports.
. Do performance analysis on both informatica and target teradata
systems by analyzing for various bottlenecks and implementing indexes,
collect stats and query tuning in teradata.
. Involved in creating datamodel creation for teradata tables.
Expertised myself in Teradata datamodeling techniques which involves
complex index analysis and Match the user requirement in a best
performance efficient way..
Datastage ETL loading into Teradata.
Roles & Responsibilities:
. Develop complex transformation from datastage sequence and parallel
stages.
. Worked on IBM's BDW model built with persistent dataset concept for
CDC in datastage.
. Manage performance on datastage environment by effectively
implementing partitioners and collectors depending on the requirement
without any data loss.
. Run Orchadmin dump of persistent datasets into a ascii formatted text
file and use it for various data analysis.
. Used Teradata parallel export and ODBC connector stage to move the
transformed data into teradata for downstream users.
. Start a DSRunjob using a shell script and schedule the script from
Autosys scheduler.
. Develop Generic datastage jobs for reusability with various columns
with Generic stage and Runtime column propagation.
Mainframe ETL loading into Teradata.
Roles & Responsibilities:
. Convert the input files into teradata loadable format with mainframe
cobol programs.
. Develop JCL to invoke programs to load to teradata using standard
teradata utilities like BTEQ,TPUMP,MLOAD,FASTLOAD,FEXP.
. FTP/ NDM files to downstream systems
. Use JCL sort utilities to do file manipulation like removing header,
trailer, concatenating data, reconciliation etc.
. Package the components in Changeman libraries and checkout the changes
to production LPARs.
Personal Details
Date of Birth - May 13, 1987
Sex - Male
Nationality - Indian
Current Location - Jacksonville Florida
Visa Type - H1B
Visa End Date -Dec31 2015