Lead Big Data/ Hadoop Architect

Location:

Dallas, TX

Salary:

130

Posted:

August 14, 2020

Contact this candidate

Resume:

KRISHNA CHAITHANYA

Lead Big Data/ Hadoop Architect

Email: ******.*@******************.***

Contact: 469-***-****

Summary:

● A lead architect with substantial experience of prototyping, designing, and developing scalable applications using distributed computing technologies like Hadoop and Apache Spark.

● Able to gather the requirements and lead a team of developers through the implementation, testing, debugging and validation phase.

● Identify the use case to build different streams of processing like batch, streaming, and queries with fast read/write times; assist with cluster design and sizing, choice of technologies and using analytics to drive better decision-making throughout the organization

● Over 16+ years of Information Technology experience with

specialized knowledge in Big Data, Object Oriented Analysis and Design (OOAD) and Functional Programming.

● Over 12 years Financial Domain experience working in Market Data Providers, Stock Exchanges, and Investment Banks.

● Market Risk/Credit Risk, BCBS239, MIFID2 and FRTBregulatory requirements implementation experience.

● Good Experience in writing ontology for semantic web and graph databases.

● Developed Apache Spark applications in Hadoop Eco system for real time streaming data using Scala programming language and KAFKA.

● Used Scala libraries CATS, Scala and akka.

● Designed/Developed a Real time Oracle RDBMS to Hadoop in gestor which supports real-time updates using Oracle Golden Gate and apache spark/Scala.

● Designed and developed a Trade Reconciliation Project using Hadoop/Java, Scala Pig, Hive and HBase.

● Extensive experience ingesting data into Hive and Impala and designing queries and schema for better performance of Hive/Impala queries.

● In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node and Resource Manager concepts.

● In depth understanding of Map Reduce and AWS Cloud concepts and its critical role in Data Analysis of huge and complex datasets.

● Hands on experience in ingesting data into Data Warehouse using various data loading techniques.

● Experienced in processing large datasets of different forms including structured, semi-structured and unstructured data.

● Expertise in usage of Hadoop and its ecosystem commands.

● Expertise at designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.

● In depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.

● Skilled on streaming data using Apache Spark, migrating the data from Oracle to Hadoop HDFS using Sqoop.

● Expertise in using Spark-SQL with various data sources like JSON and Parquet.

● Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.

● Import the data from different sources like HDFS/HBase into Spark RDD.

● Proficient on processing the data using Apache Pig by registering User Defined Functions (UDF) written in Java

● Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.

● Did Proof of Concept for Hadoop projects successfully and designed entire Hadoop echo system.

● Administered Hadoop using Apache Ambari. Hands on experience with Cloudera Hadoop.

● Used Sqoop/Flume Agents for data ingestion into HDFS from Db and log files.

● Day to day interaction with Business and end clients to get accurate requirements and wrote various design documents and functional specifications.

● Worked with Hadoop 2.0(YARN) and Spark framework.

● Designed and developed a Market Data Repository using Hadoop/Java in a multi node cluster.

● Extensive experience in data warehousing using Hadoop and NOSQL databases like Mongo DB, HBase.

● Setup and Administered Multi node Hadoop Cluster from scratch.

● Fine-tuned various memory & performance issues in Hadoop Jobs.

● Designed and developed Trade Latency Reporter project using Hadoop.

● Cross functional experience using Java/C++ in Finance (Real time Market Data/Investment Banking/Capital Market/derivatives trading algos/exchange connectivity) and Telecom Domain.

● Extensive experience in Developing and Configuring distributed computing systems.

● Extensive experience in using Collections and Generic programming in Java.

● Proven expertise in OOAD (Java on UNIX) of high performance, high volume, and distributed application with UML and various design patterns, skilled at progressing from problem statement to well documented design.

● Developed Java applications using Singleton, Template, Observer, Visitor and factory design patterns.

● Developed Java applications for IBM WebSphere MQ Connectivity in multi-threaded environments.

● Experience in Database Design and Query tuning in Oracle, Sybase and MySQL.

● Significant exposure to Performance improvement and Scalability on application and database side.

Technical Skill Set:

Programming Languages Scala, Java, C, C++, PL/SQL.

Big Data Technologies Hadoop, Hive, HBase, Pig, Mongo DB, Spark, Flume, Sqoop, Impala, Kafka

Scripting Languages/ DevOps Tools Python, UNIX Shell, gawk, VMS Scripting, KSH, HTML, Ansible, Docker

Relational Database MySQL, Oracle, Sybase, Sql Server

Operating Systems Red Hat Linux, OpenVMS, HP-UX, MS Windows, SUN Solaris, SCO

OOAD UML, Design Patterns, Rational Rose

Configuration Management Clear case, SCCS, CVS, Perforce, Git

IDE IntelliJ, Eclipse, Visual Studio

Web Restful, JavaScript

Java/J2EE Technologies Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

No SQL Databases Hbase, mongo DB, Cassandra

Certifications:

● Cloudera Certified Developer for Apache Hadoop (CCDH) License: 100-010-307

● Java 7 Oracle Certified Professional

● Brain Bench Certifications for C++

Work Experience:

Bank of America Merrill Lynch (New York City, NY) Aug 2019 – Present

Role: Lead Big Data / Hadoop Architect

Description: Cesium is an enterprise reference data system that contains data about parties, accounts, and many other entities. Helix is a big data platform we use in bank as data mart.

Cesium Reference Data Platform

Involved in following Scala/Spark and Big Data Projects:

● Highly Scalable Spark/Scala in gestor to stream real time oracle data with updates to Hadoop/Hive/Impala.

● Integrated Oracle Golden Gate with Kafka/Spark/HDFS for real time ingestion of data.

● Worked on Spray/Play/Akka web Apps used across bank for reference data.

● Worked on Bank Reference Data technology with Graph databases including RDF Semantic web.

● Very good working knowledge of writing complicated sparql queries.

● Very Good knowledge of various systems in reference data domain.

Responsibilities:

● In this role I am architect/developer for all the projects involved.

● Designed a customised highly scalable Read Write solution in Hadoop File System.

● Worked on Graph databases on Big Data Stack.

● Did spark streaming and micro-batch processing using Scala as programming language.

● Using Hive Script in Spark for data cleaning and transformation purpose.

● importing of data from various data sources; perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop.

● Export the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

● created data pipeline process for structuring, processing, and transforming data using Kafka and Scala.

● created Kafka spark streaming data pipelines for consuming the data from external source and performing the transformations in Scala.

● Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Spark cluster.

● Evaluating and improving the architecture to process data quickly and optimize the query time.

● Extensively used Pig for data cleansing. Create partitioned tables in Hive.

● Use Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.

● Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

● Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.

● Created custom python/shell scripts to import data via SQOOP from Oracle databases.

● Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.

● Load and transform Design efficient Spark code using Python and Spark SQL, which can be forward engineered by our code generation developers.

● Fine-tuned and Refactored existing Spark jobs to be more Functional style than imperative.

● Trained various teams in USA, UK and offshore locations in using Big Data Stack effectively.

● Acted as consultant for various teams in their project architecture review meetings.

Barclays Investment Bank, New York City, NY Nov 2017 – July 2019

Role: Big Data / Hadoop Architect

Description:

Radial initiative by Barclays is to get all data in entire bank into Hadoop data warehouse. Various projects across different teams globally are built in this platform.

Market Risk Hub (Radial Projects)

Involved in following Big Data Projects

● Generic Data Quality & Controls Project

● Spark Data Ingestion and Scala orchestration projects.

Generic Data Quality & Controls Project: This project is event based and will perform data quality and controls checks for any number of configured rules. All the rules are in JSON config files and stored in HBase. Continuously running spark jobs perform these controls and write back results to HDFS/Hive. This project uses HBase/Spark, Kafka, HDFS, Scala, CATS Library, REST web services. AngularJS and Akka-Http framework.

Spark Data Ingestion/Orchestration Project This ingests data into Hadoop as Avro/parquet files from Oracle/ Sql server/big files or any other source either streaming or in batches. Because it writes to HDFS in Avro/parquet with schema data can be accessible by consumers using various technologies and this project supports schema evolution.

Responsibilities:

● Designed and Developed Spark Engine for performing controls for data in HDFS.

● Developed RESTful Webservices using Scala/Akka-Http.

● Wrote code to write/read data from HBase into case classes using Scala.

● Fine Tuned spark jobs for better performance.

● Debugged spark/Scala applications for production issues

● Involved in iteration planning to get user stories, assign to team, and get deliverables ready for two weeks iterations.

● Created prototype to demonstrate an end-end solution to client implementation partners (i.e. ingest client specific data, cleanse and standardize data as per industry standards, processing the data to create concept specific data, load data to fact tables, create KPI(s) and reports).

● Implemented the Crunch pipelines (map reduce jobs) to process standard data in Hadoop cluster.

● Involved in production support and performance enhancement by analysing the Crunch pipeline plans, workflows, joins, configuration parameters etc.

● Enhance data-ingestion platform to move towards data hub i.e. centralized location (HDFS/HBase) to hold the raw data to prevent silos and data duplication problem, near real time delivery using Kafka and Storm, archive raw data for batch processing using Crunch.

● Load processed data to Solar nodes and use the filter and aggregations to perform the OLAP operations i.e. slicing, dicing and roll-ups.

● Wrote Unit and integration tests for the project covering all code.

● Worked extensively with Scala Futures, Promises and akka actors.

Bank of America Merrill Lynch, New York City, NY Nov 2016 - Oct 2017

Senior Scala/Spark/ Java Developer

Description: Cesium is an enterprise reference data system that contains data about parties, accounts, and many other entities. Helix is a big data platform we use in bank as data mart.

Cesium Reference Data Platform

Involved in following Scala/Spark and Big Data Projects:

● Highly Scalable Spark/Scala ingestor to stream real time oracle data with updates to Hadoop/Hive/Impala.

● Integrated Oracle Golden Gate with Kafka/Spark/HDFS for real time ingestion of data.

● Worked on Spray/Play/Akka web Apps used across bank for reference data.

● Worked on Bank Reference Data technology with Graph databases including RDF Semantic web.

Responsibilities:

● In this role I am architect/developer for all the projects involved.

● Designed a customised highly scalable Read Write solution in Hadoop File System.

● Worked on Graph databases on Big Data Stack.

● Discussing with the client business teams to confirm the solution design and changing the requirements if needed.

● Worked on importing and exporting data from DB2 into AWS and HIVE using Sqoop for analysis, visualization and to generate reports.

● Developed Pig Latin scripts to perform Map Reduce jobs.

● Developed product profiles using Pig and commodity UDFs.

● Developed Hive scripts in Hive QL to de-normalize and aggregate the data.

● Created HBase tables and column families to store the user event data.

● Written automated HBase test cases for data quality checks using HBase command line tools.

● Used Hive and Impala to query the data in HBase.

● Developed and implemented core API services using Java/Scala/Python and Spark.

● Convert CSV files into parquet format and load the parquet file into data frames and query them using Spark and SQL.

● Migrating data from Amazon AWS to databases such as MYSQL, Vertica using Spark data frames.

● Build a continuous ETL pipeline by using Kafka, Spark Streaming and HDFS.

● Perform ETL on the data from different formats like JSON, Parquet, and Database. Then run ad-hoc querying using Spark SQL.

● Perform complex data transformations in Spark using Scala language.

● Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.

● Connect Tableau and Squirrel SQL clients to Spark SQL (Spark thrift server) via data source and run queries.

● Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

● Managed and scheduled Jobs on a Hadoop cluster.

● Fine-tuned and Refactored existing Spark jobs to be more Functional style than imperative.

● Trained various teams in USA, UK and offshore locations in using Big Data Stack effectively.

● Acted as consultant for various teams in their project architecture review meetings.

Barclays Investment Bank, New York City, NY Feb 2015 – Nov 2016

Role: Senior Hadoop Developer

Description: Mercury initiative by Barclays is to get all data in entire bank into Hadoop data warehouse. Various projects across different teams globally are built in this platform.

Market Risk Hub (Mercury Projects)

Involved in following Big Data Projects

● BCBS239 Regulatory Reconciliation Project (June 2015 – Present)

● Generic Spark Data Ingestion Project (Feb 2015 – May 2015)

BCBS239 Regulatory Reconciliation Project reconciles all the risk data from Front Office to Risk data in back office. Because this data is transformed a lot due to reporting purposes, this project basically does lot of transformations on front office/back office data to reconcile. This is a mandatory regulatory requirement. Various rules are applied on the data based on config from users before reconciling. This project uses HBase/Spark, Kafka, HDFS, Scala, REST web services. AngularJS and Spray framework.

Generic Spark Data Ingestion Project This ingests data into Hadoop as avro/parquet files from Oracle/ Sql server/big files or any other source either streaming or in batches. Because it writes to HDFS in Avro/parquet with schema data can be accessible by consumers using various technologies and this project supports schema evolution.

Responsibilities:

● Designed and Developed Spark Engine for reading data in HDFS using Data Frames/RDD’s for Reconciliation project.

● Developed RESTful Webservices using Scala/spray.

● Wrote code to write/read data from HBase into case classes using Scala.

● Fine Tuned spark jobs for better performance.

● Strong HDFS and MRFS file system knowledge and handling data in various file formats such as ORC, and Parquet.

● Specialized in Bigdata/Hadoop applications, Retail POS systems, Payment Gateways processing and solutions.

● Experience in working with NoSQL databases such as HBase, MapR DB, and integrating other components like Pig and Hive.

● Strong skills in RDBMS concepts and some products such as SQL Server 2008, Oracle 11i, MySQL, DB2, Flat-files, VSAM and DMS2200.

● Hands-on experience in executing the projects in Agile methodology using Rally application.

● Excellent knowledge in Hadoop distributions such as Hortonworks, Cloudera and MapR.

● Used RESTful Web services for transferring data between applications.

● Developed POJO classes and used annotations to map with database tables.

● Responsible for gathering specification, Analyzing & Designing the system, developing the module with the below technologies by Implementing Business Logic, Preparing Unit Test Cases.

● Debugged spark/Scala applications for production issues

● Wrote Unit and integration tests for the project covering all code.

● Worked extensively with Scala Futures, Promises and akka actors.

Euronext Technologies May 2010 - Jan 2015

Senior Systems Analyst / Java Developer

Liffe Connect Trading Platform

Involved in following Big Data Projects: June 2012 – Jan 2015

• Trade Reconciliation Project.

• Market Data Repository.

• Trade Latency Reporter.

• Used JDBC for database connectivity with MySQL Server.

• Experienced in analysing data using HiveQL and Pig Latin and custom MapReduce programs in Java.

• Worked hands on with ETL process.

• Handled importing of data from various data sources, performed transformations using Hive,

MapReduce, and

• Loaded data into HDFS.

• Extracted the data from Teradata into HDFS using Sqoop.

• Analysed the data by performing Hive queries and running Pig scripts to know user behaviour like

shopping

• Exported the patterns analysed back into Teradata using Sqoop.

• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

• Installed Oozie workflow engine to run multiple Hive.

• Developed Hive queries to process the data and generate the data cubes for visualizing

• Diverse experience in utilizing Java and python tools in business, web, and client server environments

including Java platform, JSP, Servlet, Java beans, JSTL, JSP custom tags, EL, JSF and JDBC

Involved in following Java/C++ Projects: May 2010 – Jun 2012

● Trade Capture and Reporting (Core Java, JDBC, JMS, Multi-Threading)

● Low Latency Trading Development (Core Java, C++, Containers, Multi-Threading)

● Actively involved in various phases of the Software Development Life Cycle and the project was

developed by using the Agile (SCRUM, SPRINT) methodologies.

● Customized RESTful Web Service using RESTful API, sending XML format data packets between front-end

and middle-tier controller.

● Designing functionality for integrating API services, deployment activities and testing.

● Extend and updated to REST API and created client API library.

● Performance optimization and debugging

● Contributed user interface design mock-ups and designs

● Used MAVEN 3 for building the application and deployed on WebSphere 6.1 Application Server.

● Used Maven to build the J2EE application.

● Provide assistance to the bank-end developers in troubleshooting and coding

● Called the restful web service calls for POST and GET methods.

● Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Model

Tools/Environment: Java, Collections, Hadoop, HBase, MongoDB, Hive, Sqoop, Oracle, pl/sql, Purify, UML,

Python, ClearCase

Thomson Reuters Feb 2008 – Apr 2010

Role: Senior Developer Java/C++/Database

Worked in below projects:

● Entitlement Replica Subsystem

● Apache Proxy Server

● Exchange Working Hours

● Involved in analysis and design of the application.

● Involved in preparing the detailed design document for the project.

● Developed the application using J2EE architecture.

● Involved in developing JSP forms.

● Designed and developed web pages using HTML and JSP.

● Designed various applets using JBuilder.

● Designed and developed Servlets to communicate between presentation and business layer.

● Used EJB as a middleware in developing a three-tier distributed application.

Tools/Environment: Java, Collections, Solaris, Sybase, Oracle, Core Java, J2EE, JSP, Servlets, XML, XSLT UML, Shell scripts

Hewlett Packard Jul 2006 – Jan 2008

Role: Senior Developer Java/C++

• Developed Session Beans and Entity beans to business and data process.

• Used JMS in the project for sending and receiving the messages on the queue.

• Developed the Servlets for processing the data on the server.

• The processed data is transferred to the database through Entity Bean.

• Used JDBC for database connectivity with MySQL Server.

BASE star open 3.2 Product suite

Tools/Environment: C, C++, and Java on OpenVMS, HPUX, TRU64 UNIX and Windows, STL, Oracle.

Nokia India Pvt Ltd Feb 2004 - Jul 2006

Worked in below projects:

● TELNET Gateway for HIA (Host Interface for Administration) Server

● Secure Messaging Gateway (SMG) for NOKIA SMSC

● NOKIA MSC CDR DECODER

Education:

Bachelor’s Degree in engineering, India, 2004

Major: Electronics and communications engineering.

Contact this candidate