Hadoop AWS

Location:

Bethesda, MD

Posted:

July 22, 2017

Contact this candidate

Resume:

Naresh Kumar Chinthamani

***********@*****.*** +1-614-***-****

https://www.linkedin.com/in/naresh-kumar-ch-6273118/

PROFESSIONAL SUMMARY:

12+ years of experience in IT Industry in complete software product development life cycle.

Around 4 years of experience in using Spark and Hadoop frameworks.

Extensive experience in building and maintaining Big data, Business Intelligence and data warehouse products and services applications.

Solid understanding of Scala programming and development.

Expert level experience in MapReduce internals and performance tuning.

Experience in working with Hive query tuning to improve better throughput.

Implemented proof of concept work in HDFS and MapReduce for reducing ETL batch job time on a distributed cluster.

Expertise working with Cloudera distribution and production implementation.

Administering over the Spark UI for performance, memory tuning, GC tuning and inspecting the logs.

Shell scripting experience for handling of batch jobs.

Expertise in design, develop& deploy the applications built using MS SQL server & MSBI Technologies.

Creation of complex SSRS reports for the business insight and deploying to the RS server.

Creation of JasperReports using iReports studio and designing working with the domains, Ad-hoc views and Dashboards on the Jasper server.

Worked on multi-dimensional data modeling (OLAP) for Snowflake & Star schemas.

Hortonworks distribution handling for spark installation.

Migration of the source code of complex applications from SVN to Bitbucket server and setup the Bamboo auto build settings, and SourceTree terminal configuring to the team’s development environment.

Hands on experience on AWS platform with EC2, S3 & EMR.

Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration.

Experienced in Banking, Financial & E-learning domains.

A solid foundation in functional and object-oriented programming with data structures.

Delivered products and services using both waterfall and Scrum/ Agile development methodologies.

Can do attitude and never give up on any new challenges or latest technologies till I become expertise on it.

TECHNICAL SKILLSET:

Big data & Hadoop ecosystem

Spark, Hive, Sqoop, MapReduce, HDFS, Pig, Oozie, Flume, HBase, Storm, Kafka, Spark-streaming, Hue

Programming Languages

Scala, T-SQL, PL/SQL, C, C#, core Java, Python, R

Data Warehousing & BI tools

Databases: MS SQL Server (2008 R2, 2005, 2012), Oracle, DB2,

Reporting: SSRS, JasperReports (iReports), QlikView, IBM Cognos BI, Tableau, Power Pivot, Power BI

ETL: MSBI (SSIS, SSAS), DataStage, SAP BODI/BODS,

Database Servers

SQL server 2008 R2, SQL server 2005, SQL server 2012, Oracle, DB2, Teradata

NoSQL databases

Impala, HBase, Cassandra, Mongo DB

Frameworks

Apache Hadoop, AWS, Microsoft Visual studio, BIDS, Cognos Framework manager

Version control & build tools

Bitbucket, Bamboo, SourceTree, Tortoise SVN, MS visual source safe

Ticketing tools

JIRA, HP Quality Center

Operating Systems

Windows servers, UNIX, Linux, CentOS

App Servers

Apache Tomcat 6.0, IIS

EDUCATION:

Bachelor of Technology in Computer Science & Engineering (2001-05) from Jawaharlal Nehru Technological University, Hyderabad, India.

PROFESSIONAL EXPERIENCE:

ANZ Bank - Investment Banking - New York, NY Dec 2015 to Present

Sr. Data Engineer in Hadoop, Spark

Responsible for:

My responsibilities include Management Information System (MIS) enhancements and sustenance on the data lakes & pipe lines for better insight creation from the data facts.

Ownership of the design and development of Data pipe line jobs from different source systems.

Design and implement data ingestion techniques for real time data coming from various sources.

Used Spark-SQL & Scala API’s for querying & transformation of data in Hive using Data frames.

Working closely with customer and addressing solutions for all issues.

Improved performance of Rule Execution and Rule transformations by 10X.

Applied clustering and segmentation methods for product offerings.

Explore and evaluation of performance Metrics with MPP databases and Hive.

Worked with Data lake and pivotal team on various issues with ORC.

Proactively worked with offshore team on Knowledge transfer and reviews.

Extensively worked in Hive UDFs and fine tuning.

Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration.

Optimizing the EMRFS for Hadoop to directly read and write in parallel to AWS S3 performantly.

Developed Spark Streaming by consuming static and streaming data from sources like SQL server, EDW &OLTP data stores.

Configuring the Kafka with Spark stream for the process.

Optimize the performance of Ingestion and consumption.

Worked complete lifecycle i.e. modeling, ingestion, transformations, aggregation and data access layer.

Designed and developed Concurrent framework to simulate Parallel connections load testing.

Experienced in Designing and developing highly scalable and fault tolerant systems which served for 20 million records per day.

Implemented the Apache Impala for the data store and access as MPP.

Creation of regulatory reports and analysis. Defining the data streams

I am also part of development of another parallel system, EPS on Data science tools on Designing technical and Data flow process.

Conducting the scrum standup meeting for Agile development process.

Working in the Agile team to develop &responsible for delivering working software on a timely basis.

Administering tasks:

Monitoring & Resource allocation and configuration for Spark applications.

Scheduling and grouping into pools for the jobs bases on the priority.

Administering the cluster and tuning the memory based on the RDD usage.

Deployment of Spark steaming applications with optimized no. of executors, write ahead logs & check point configurations.

Worked on the Kerberos token authenticate & delegate token mechanism to implement the spark security

Key Achievements:

The critical gaps in the data pipeline system were fixed.

Enabled the BAs to counter the challenges on the data sets within latency constraint.

Technologies used:

Spark, Hive, Scala, HDFS, YARN, Sqoop, Flume, Kafka, Impala, shell scripting, bitbucket, Java, AWS S3 & EMR (EMRFS).

Skillsoft, Gainesville, FL Aug 2014 to Nov 2015

Lead Data Engineer – Hadoop

Responsible for:

●Took up initiative to use Hadoop MapReduce as a technology to reduce ETL time for large customers and building a prototype for ETL on Hadoop with

●Lead the ownership of the major design and development of the solution.

●Design and Implemented Data Ingestion Pipeline jobs using Sqoop into Hive tables.

●Used Flume to collect log from web servers and store the data in HDFS.

●Collected LMS data into Hadoop cluster using SQOOP.

●Wrote ETLs using Hive and processed the data as per business logic.

●Completed similar ETL processing in 1/5th of the time of ETL done in traditional technologies.

●Worked on creating Oozie workflows for scheduling jobs for generating reports on a daily, weekly and monthly cycles.

●Familiarity with Hadoop cluster setup & configurations

●Lead handling all ORC issues and provided solution using RCFile format.

●Responsible for reviewing logs and troubleshooting issues in MapReduce jobs.

●Developed Spark Scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.

●Creation of stories on Jira for the Agile development tasks.

●Working closely with the Agile team to develop high quality products with product owner on incremental basis.

Key Achievements:

Successfully completed the objective of transforming the ETL into Hadoop.

Environnement

Hadoop Framework, hortonworks, HDFS, MapReduce, PIG, HIVE, SQOOP, Flume, Oozie, core Java, Ubuntu (Linux distro).

SumTotal (a Skillsoft company), Gainesville, FL Nov 2012 to Aug 2014

Business Intelligence Lead

Responsible for:

●E-learning and HCM products feature enhancements & sustenance.

●Implemented the SQL server transaction replication on the between HCM system and reporting server.

●Using the SQL Change Tracking (CT) mechanism designed the functions and views to capture the delta changes between previous load data and present date.

●Developed the SSIS packages to pull the full data load and incremental load on the CT tables/ views.

●Created the iReports for the JasperReporting to give insight on the LMS course adherence and defaults.

●Responsible for creation of Jasper domains, Ad-hoc view & dashboards.

●Triaging the SEG cases and assigning to team

●Getting into customer calls with support team for P0/P1 escalated cases.

●Working on critical cases myself in addition to leading the team.

●Handling some of the most challenging performance tuning issues reported by customers in reports as well as ETL side

●Coordinating US team to handoff the priorities.

Key Achievements:

Handling hosted, premised customer base of around 3000+ to unblock from issues across 6 releases.

Consistently maintaining backlog below 25. An avg. weekly inflow varies from 25-30.

Brought down the incoming cases by doing RCA analysis and fixing the issues at their root.

Taken care of APAC & EMEA customers’ P0 SLAs with on call support.

Environnement :

SQL server 2008 R2, SSIS, SSRS, Jaspersoft iReports, SQL Replication, Power BI, Microsoft Azure, C# .net, SVN

Farm Credit Canada, Credit financing, Hyderabad, India Oct 2010 to Nov 2012

Business Intelligence Lead Developer

Responsible for:

●Enterprise Data Warehouse (EDP) design

●ETL jobs development using the SAP BODI

●Reviewing the business logic in comparison with ETL designs.

●Deployment of the intermediate tables on the staging environment.

●ETL test strategizing to achieve the accurate transformation logic holds good.

●Test scripts preparation using SQL server for validating the data b/w target and source.

●Logging the tickets on the QC for the data issues which are not in line with requirements and failing on the test environment.

●Re-validating the SQL scripts once the ETL job is fixed and pushing the package to the pre-production environment.

Key Achievements:

Streamlined the ETL process with automated validation scripts to fetch the synch results on daily basis.

Environnement :

SQL server 2008 R2, SSIS, SSRS, SAP BODI, Oracle, HP QC

Danske Bank, Denmark March 2008 to Aug 2010

BI Developer

Responsible for:

●Exploration Warehouse development for retail banking.

●Archival status reports using SSRS

●SSIS packages to load from mainframe datasets, external flat files

●Database maintenance and auditing.

●SPs creation for load the data through DB2 linked server

●SSAS cube design& developing

●Archiving the database at threshold limit.

●Data modeling for Classification (CL) BDW business concept.

●DS Jobs creation from Source to Copy tables for CL

Environment :

SQL server 2005, SSIS, SSRS & DataStage, DB2

Nationwide Building Society, Mortgages and Banking, Swindon, UK Jan 2006 – Feb 2008

Database developer

Responsible for:

●RDS Monitor development and report generation.

●Retail cube developing and maintaining.

●SSIS packages logic handling.

●QC maintenance

●Analysis the Change Requests (CR), preparing the IA doc.

●Writing the scripts to resolving CRs.

●SSAS cube processing and query generation for business requirements.

Environment : SQL Server 2005, C# .Net

Contact this candidate