Resume

Big Data Architect

Location:

Cincinnati, OH

Posted:

April 03, 2020

Contact this candidate

Resume:

Santosh

adcl7u@r.postjobfree.com

Contact No: +1-513-***-****

PROFESSIONAL SUMMARY

Around 10 years of experience in software development and analysis. Practical experience in building industry specific Java applications and implementing Big-data technologies such as Apache Hadoop and No-SQL database DataStax Cassandra. Configured, tuned and setup tools like Apache Kafka, PIG, SQOOP, Hive, HBase and OOZIE in various Hadoop distributions for industry specific needs. Implemented Java APIs and created custom Java programs for full-fledged utilization of Hadoop and its related tools.

●Big Data Orchestration and Implementation: Collaborated with line of business in understanding the needs to build the Big Data environment for the organization. Implemented the tools and technologies for creating clusters and developing data archives and data warehouses. Orchestrated the deployment of the environment by working with enterprise wide teams

●Hadoop Development and Administration: Experienced in installing, configuring and administering Hadoop cluster of major Hadoop distributions and optimize performance by tuning the cluster for better results. Worked with multiple distributions of Hadoop including enterprise version of IBM BigInsights, Cloudera (CDH4, CDH5, CDH 6), Hortonworks HDP (v2.1,v2.2) and open source Apache Hadoop

●Real-Time Messaging (Kafka): Architected and implemented both On Premise and cloud distributions of real time messaging system Kafka. Hands on experience installing Kafka distributions from Apache, Confluent and Amazon Web Services

●Data Ingestion: Architected and implemented in knowing the traditional storage systems to design and implement the required data ingestion procedures for the Hadoop clusters that included migration tools such as SQOOP and Talend.

●Data Modeling: Performed data modeling and analysis on HDFS and NoSQL databases. Provided cluster tuning to derive optimal results

●Scripting and Reporting: Created scripts for performing data-analysis with PIG, HIVE and IMPALA. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase, Cassandra and MongoDB.

●Custom Coding: Built MapReduce codes in Java that are customized for the requirement. Created User Defined Functions (UDFs), User Defined Aggregate Functions (UDAFs) in PIG, Hive. Used Hcatalog for simple query execution. Composed code and created the JAR files unavailable in PIG and Hive. Used automation tool in Maven while composing and creating the JAR files for custom tasks.

●Modern Hadoop Technologies: Developed SPARK applications using Scala for easy Hadoop transitions in future. Text search analysis was implemented using Lucene and Elasticsearch. Gained hands-on experience in implementing Kafka for a persistent messaging system.

●Java Experience: Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.

●Interface Design: Created front end user interface using HTML, CSS and JavaScript along with validation techniques. Implemented Ajax toolkit for validation with GUI. Worked with image editing tools such as Photoshop and Adobe Lightroom.

TECHNICAL SUMMARY

Domain

Technologies

Big Data Technologies

IBM BigInsights, Cloudera (CDH4/CDH5/CDH6), Hortonworks HDP (v2.1/v2.2), Apache Hadoop and Datastax Cassandra.

Tools in Big Data

HDFS, MapReduce, Hadoop2/YARN, PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE, Lucene, ElasticSearch, Kafka, IMPALA, SPARK, ZooKeeper, Mahout, Oozie, Azkaban.

Languages

Java, Python, PHP, Scala (Spark), PIG-Latin, HQL,SQL, PL-SQL, C++, C#, C# .Net, ASP .Net (3.5 and 4), Ajax toolkit (3.5 and 4).

Web Technologies

HTML, XML, CSS, JavaScript, JSP, JDBC, Maven, AJAX

Reporting Tools/ ETL Tools

Tableau, Power View for Microsoft Excel, Informatica

Operating Systems

Windows, Linux, Unix, Red Hat Enterprise Linux (RHEL), Ubunutu, CentOS.

Frameworks

Spring, Hibernate, JUnit

Databases

Oracle 9i, NoSQL Databases (Cassandra and HBase), Oracle

IDE/ Tools

Eclipse, Net Beans

EDUCATION

Masters in Information Technology from Southern New Hampshire University, 2015

Bachelors in Computer Science and Technology from JNTU Hyderabad, 2009

CERTIFICATIONS

Acquired certifications from MIT, Datastax, IBM Big Data University, and MapR Academy for the following:

●MIT - Tackling the challenges of Big Data

●Datastax - Cassandra Architecture, Configuring Cassandra, CQL, Cassandra read and write

●MapR - Operations Cluster Management, Hadoop Essentials, Developing Hadoop Applications

●IBM Big Data University - Multiple IBM Big Data University badges

PROFESSIONAL EXPERIENCE

Fifth Third Bank - Cincinnati, OH April 2015 – Present

Project 1 – Big Data Architect July 2019 – Present

Summary:

Realtime streaming of data to multiple system enterprise wide ensures all the systems have updated information. To enable Realtime messaging big data system Kafka needed to be stood up as an enterprise wide platform available for the systems to publish and subscribe to the messages.

Environment:

Kafka – Confluent and Amazon Managed Kafka Service

Configuration:

●Cluster configuration – 3 Managed Cloud Clusters

Responsibilities:

●Designed end to end architecture for standing up Kafka clusters

●Participated in vendor talks to understand the product details

●Stood up clusters on both Confluent and Amazon Managed Kafka Service (MSK) platforms

●Worked with network teams to create VPC peered clusters to cloud application

●Worked with Cloud Engineering team to create Transit Gateway and Service Rail connection

●Created Topics for multiple teams to publish and subscribe messages

●Setup Access Control Lists (ACLs) for the topics for production and consumption

●Worked with development team Terraform scripts to automate cluster creation on MSK

●Created a git repository for sharing the common code base

Project 2 – Big Data Architect – Developer September 2018 – June 2019

Summary:

Payment Archive, Phase 2 is aimed to create a better platform to provide all in one payment data. Initial Payment Archive was built based on traditional database DB2, which caused scalability issues. New platform is based on Hadoop eco system tools to create a scalable platform with real time ingestion and messaging.

Environment:

Kafka, Spark, HBase, ElasticSearch

Configuration:

●Cluster configuration - 5 Edge, 9 Management and 28 Data Nodes running Hortonworks HDP on RHEL 7

●10 GBPS network from switch to Cluster and 40GBPS in between switches

Responsibilities:

●Architected end to end ingestion and processing of the data from the source system to consumption layer

●Worked with development team to design multiple data streams for real-time, intra-day and batch processing through Kafka Queues

●Integrate Kafka Connectors from Mainframe to Kafka to push data to produce

●Work with network teams to create required firewall rules to connect Data Sources and Kafka queues

●Collaborate with enterprise wide teams to gather approvals for production release

●Complete the governance process in time to remove potential roadblocks

●Kafka Queue connect to Spark Streaming to cleanse the data to load onto HDFS

●Store the results of the processing in columnar format in HBase for intraday batch file compare

●Index the data for Elasticsearch consumption via java scripts from the consumption layer/systems

Project 3 – Big Data Developer/Architect August 2017 – September 2018

Summary:

Enterprise Dataware House (EDW), Phase 2 aimed to create an enterprise Data Lake on Hadoop, ingest the data from source systems into the HDFS file system by various ingestion and validation mechanisms to replace older EDW. Collaborate with different stakeholders long term and short term requirements for data consumption. Create new EDW that provide search capabilities enterprise wide for consuming systems and agents

Environment:

IDAA DB2, Spark, HDFS, Hive, Talend, Custom and Proprietary ETL tools and parsers

Configuration:

●Cluster configuration - 5 Edge, 9 Management and 28 Data Nodes running Hortonworks HDP on RHEL 7

●10 GBPS network from switch to Cluster and 40GBPS in between switches

Responsibilities:

●Understand the nature of the data and data sources from the data organization

●Worked with data organization to understand the nature of the data in the primary data source

●Architected the zones and layers as per the datasets and query complexity for End to End solution design

●Provide ingestion mechanism through Spark and Talend for ETL

●Model the data to fit the Financial Service Data Model (FSLDM) to provide a layer for ingestion into IDAA DB2

●Process the data from the common structure into the integrated layer comprised of HDFS and IDAA DB2

●Scoped the file types required to keep sync IDAA tables and Hadoop data files

●Transform the data from the integrated layer access layer hosted on DB2

●Provide mechanisms for delta captures for incremental changes to the data from source systems to reload and retry

Project 4 – Big Data Developer June 2016 – August 2017

Summary:

Strategic expansion of the business and physical locations is important. Driving the decision with technology and data gives better insight. Setting up of new branches and locations took churning big amount of geospatial data to be ingested, transformed and reported. Understanding the cluster requirements and the data churning might create multiple temporary data that needs expandable cluster, which drove the requirement to create a cloud cluster to scale up to the requirements with least amount of delay.

Environment:

Spark, python, Amazon Web Services – EC2 Cluster

Configuration:

●Amazon Web Services EC2 cluster with expandable EBS storage

●Cloudwatch integration for cluster monitoring

Responsibilities:

●Format and load geospatial data from the vendor into AWS EC2 cloud storage for cost reduction using direct data load

●Create an EC2 instance with greater EBS storage for temporary files

●Work with network teams to provide firewall access to connect from on premise to AWS cluster

●Create and tweak the spark scripts to meet the heavy IO operation on large geospatial data

●Understand the EBS and other storage/ script requirements to avoid the program to freeze due to space constraints

●Access the data in the cloud to perform spark programs to calculate the ideal spot to launch a office/branch in a given vicinity

●Create spark scripts to search in grids to understand the number of possibilities and eliminate the ones that are currently occupied / not available

●Integrate cloudwatch to monitor the cluster performance, setup alerts to get notified when disk and memory usage gets almost filled

Project 5 – BIG DATA DEVELOPER April 2015 – May 2016

Summary:

To understand their customer base and their financial needs better, the financial institution wanted to build a financial needs assessment model by understanding their spending patterns and historical data. This needed integration all the customer data including recent purchase transactions using different data channels such as Sqoop, batch ETL and collect in a common file system, HDFS. Spending and behavioral patterns were derived by loading the data in Hive. Tableau was the visualization tool to provide a central view of the customer profile.

Environment:

Cloudera Hadoop, Sqoop, Hive, Hue, Tableau, Red Hat Enterprise Linux (RHEL)

Configuration:

●15 Node Cluster running RHEL 7 with Cloudera CDH distribution.

●HiveQL for processing the data in HDFS to provide visualization to Tableau

Responsibilities:

●Created SQOOP ingestion process to extract the data from RDBMS

●Configured Batch ETL jobs to process the data via data pipeline

●Ingested data into HDFS and created a Hcatalog to work seamlessly for queries against the data lake

●Hive queries and HiveQL scripts to process the data and provide concentrated data sets

●Used partitioning and bucketing of the data to improve the rate of response for queries

●Developed Java scripts to refine the data.

●created landing and processed zones in HDFS and transformation rules were implemented by Talend

●Data querying done by Tableau using JDBC/ODBC connection with Hive connector to provide visualization reports

Hi-Caliber-It Solutions – India March 2011 – September 2013

Project 1 – Big Data / Java Developer August 2012 - September 2013

Summary:

Created a java program to compute Value at Risk (VaR) for any chosen firm or individual using Monte Carlo Simulation model. Created MapReduce code to work with huge data processing. Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.

Environment:

Apache Hadoop, Cloudera Manager CDH4, CentOS, Java, MapReduce, Eclipse Indigo, PIG, Hive, Sqoop, Oozie and SQL, JUnit.

Responsibilities:

●Installed and configured Apache Hadoop cluster for data storage

●Installed and configured PIG, Hive, Sqoop and Oozie workflow for multiple jobs

●Created simple and complex MapReduce jobs to pre-process data before ingestion to Hadoop cluster

●Created MapReduce jobs using PIG scripts and Hive

●HBase was used initially while benchmarking

●Created Java application that analyzes data using Monte Carlo simulations.

●Optimized PIG and Hive UDFs to work efficiently with the data sampling

●Migrated ETL processes from Oracle to Hive for performing aggregate functions

●Monitored Hadoop cluster with Cloudera manager

●Implemented Oozie Workflow to run multiple PIG scripts and Hive queries

Project 2 – Java Developer / Java Web Application Developer March 2011 - August 2012

Summary:

The project, Hire-A-Contractor is a web based application which allows homeowners to find the contractors easily in their area for their specific needs.The application is intended to provide easy sign up / in options for the contractors and the home owners. Allow homeowners to post their projects for contractors to pick and quote.

Environment:

Java, J2EE, NetBeans, Oracle 2010, Crystal Reports, Windows XP

Responsibilities:

●Programmed in Java to create interfaces and file uploads for quotes images

●Created javascripts to validate customer inputs on the web application

●Implemented Captcha coding to avoid scammers

●Client and Server side validations to check for input data using custom scripts

●Optimized java code to provide latency free processing

●Created JDBC/ODBC connections to Oracle for input and retrieval of the customer data

●Created time out code for projects that are bid and time based

●Created code to push daily crystal reports

Cerebrum Software Solutions, India April 2010 - Feb 2011

Senior Developer

Summary:

The project Tax efficient advisor institute’s online innovation aims at webinars about tax saving efficiency provided at the ease of having them online as per one’s convince. The project aims to provide webinars for registered users.Accept registrations for new webinars and provide a one minute walk through as registration progresses. Implement a payment gateway for payment of fees for courses registered by the users

Environment:

C#.net, ASP.Net (v.3.5), Visual studio 2008 professional, SQL Server 2008, Windows XP

Responsibilities:

●Created Active Server Pages (ASP) for different portions of the website

●Created Master Pages to keep the schema and styling of the website alike

●Created database schema for querying the required data for concurrent operations on the database

●Ajax tool kit used for validation of the customer input

●Created JDBC/ODBC connections to the SQL database to retrieve user and payment information

●Integrated payment gateway API to make payment towards the courses registered by the users

●Programmed in-built emailing system to notify users of their new and upcoming courses

Wipro Technologies/ Hewlett Packard – India September 2009 – March 2010

Business Process Associate

Project Summary:

To provide technical support for customers of Hewlett Packard Notebooks, based out of North America when faced software and hardware issues. These issues needed software and hardware support from experts that can track down the issue and resolve it. The responsibilities included providing necessary solution for an issue related to notebook, follow the troubleshooting steps to track an issue and resolve it.Escalate to higher levels as needed and transfers to other departments if issue doesn’t concern to the area of support

Environment:

Troubleshooting Tool - Astrix, Logging Tool - Avira, Windows XP

Responsibilities:

●Provide necessary support over the phone for the customers

●Probe the customers on how and what of their activity to understand the cause of the issue

●Remote into customers device to provide easy of solution for the customers

●Make the customer understand the cause of the issue and provide tips to not run into other issues

Contact this candidate