Resume

Cobol Data

Location:

Hyderabad, Telangana, India

Posted:

April 28, 2021

Contact this candidate

Resume:

Venkat

Phone No: 804-***-**** E-mail ID: adl00n@r.postjobfree.com

Professional Summary:

Hadoop Developer with 6 years of IT experience and enthusiast to work in an environment which hone my skills and knowledge.

5 years in Big Data Technologies and great hands-on experience in HDFS, Spark, Scala, Map Reduce, Hive, Sqoop, Oozie, Pig, Kafka, Flume.

Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and various Hadoop processing frameworks.

Great experience on Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming by creating RDD’s, Data Frames, Datasets using Scala.

Experienced in Big data solutions and Hadoop ecosystem related technologies. Well versed with Big Data solution planning, designing and development.

Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP and vice versa.

Created and worked Sqoop jobs with incremental load to populate Hive External tables.

Good exposure to performance tuning hive queries on hive tables.

Hands on experience in Python, Scala and Hadoop while working in production environment.

Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.

In depth knowledge of database like MS SQL, MySQL and extensive experience in writing SQL queries, Stored Procedures, Triggers, Cursors, Functions and Packages.

Hands on experience in NOSQL databases like HBase, MongoDB and Cassandra.

Provisioned Amazon EC2instances and have knowledge on all resource areas of EC2 like Instances, Dedicated hosts, volumes, Key pairs, Elastic IP's, Snapshots, Load Balancers, Security Groups.

Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift, Hands on experience with the AWS CLI and SDKs/API tools.

Experience on Server less services offering from AWS such as Lambda, API Gateway, SNS, SQS, Cloud watch, DynamoDB.

Good Knowledge on VPC, ECS, Step Functions, Amazon Simple DB, IAM, Cloud formation, Glue, Athena.

Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and MS SQL Server.

Experienced in scheduling and monitoring the production jobs using Oozie.

Delivering report on various business KPI’s using Power BI, Excel Pivot Table, VLOOKUP & Slicer.

Experience in Analytics - With expertise in Predictive and Prescriptive analytics and data mining techniques with emphasis on Machine Learning algorithms (Supervised and Unsupervised learning) using R and Python programming.

Experience in using statistical languages (R, Python) to Clean, Integrate, Transform, Reduce, Analyzed, and Interpret large sets of Data. Experienced in Python's modules NumPy, matplotlib, pandas etc. for data pre-processing, web scraping, visualization and for machine learning.

Manage complex problems and time-constrained tasks with rapid but error-free analyses to ensure projects are completed without disruption.

Technical Skills:

Big Data Technologies

Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS.

Languages

C, C++, Scala, Core Java, Python, COBOL, JCL

Scripting Languages

Python, R

DBMS / RDBMS

Oracle, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase

Java/J2EE & Web Technologies

J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML.

IDE and Build Tools

Eclipse, Jupiter Notebook, R-Studio, IntelliJ IDEA, SBT and Maven.

Operating systems

Windows, Linux and Unix

Version Control

Git, SVN

Hadoop Distributions

Cloudera, Hortonworks.

Web Servers

Web Logic, Web Sphere, Apache Tomcat

Professional Experience:

Client: Cerner Corporation, Kansas City, MO March 2020 – Till Date

Role: Big Data Engineer

Responsibilities:

Involved in designing, developing, and deploying a multitude application utilizing the AWS stack (Including EC2, EMR, Lambda, Dynamo DB, SNS, SQS, API Gateway, VPC, Cloud watch) focusing on managing the flow of data to and from machine learning models.

Implemented a server less architecture to ingest data using API Gateway, AWS Lambda, and SNS with OAuth2 to authenticate to secure the calls.

Enhanced and implemented a 'serverless' architecture using Lambda, Dynamo DB, SNS, SQS and cloud watch to journal data into S3 which is to be used for training of machine learning models.

Created CloudFormation Templates for different environments (DEV/stage/prod) with YAML to automate infrastructure deployment on click of a button.

Worked with ELASTIC MapReduce and setup Hadoop environment in AWS EC2 Instances to run Apache Flink jobs.

Developed Spark Applications by using Scala, Java, and Implemented Apache Spark data processing project to transform data from AVRO to Parquet.

Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable, and flexible systems that handled expected and unexpected load bursts.

Extensively worked on AVRO, Parquet, JSON and XML data formats.

Worked on Synchronous and Asynchronous data pipelines that interact with machine learning models to produce predictions, which are then to be communicated back to the client.

Worked with Topic / Subscription-based notification routing via SNS service.

Developed applications using AWS Java SDK, GIT as version controller and Jenkins to build the applications and to transfer the artifacts to AWS S3, which are then deployed in AWS using the cloud formation templates.

Used Maven tool to build and packaged, deploy an application project, and integrated with Jenkins, and used Log4J framework for logging debug, info & error data of the application.

Used JIRA for project management. Epics, tasks, and subtasks are created to track the progress.

Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: AWS Java SDK, AWS CLI, Java8 j2ee, Spark2.4, Hadoop, Hive, Scala, YAML, Jenkins, Maven, AWS Serverless(SNS, SQS, Lambda, API Gateway, Cloud-watch, DynamoDB), EC2, S3, EC2, ECS, EMR, Athena, Glue, VPC.

Client: Recorded Books, Prince Frederick, MD Aug 2019 – Feb 2020

Role: Big Data Engineer

Responsibilities:

Developed data pipeline using Sqoop, Spark, Hive, HBase, Scala to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala.

Loading the data from the different Data sources like (Oracle, Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.

Developed HiveQL queries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation, and execution.

Used Pig as ETL tool to do transformations and some pre-aggregations before storing the analysed data into HDFS.

Developed Spark code for saving data into AVRO and Parquet format and building hive tables on top of them.

Understanding the existing Enterprise data warehouse setup and provided design and architecture suggestions converting to Hadoop using Spark, HIVE and SQOOP.

Exporting of the result set to RDBMS and various file formats using Sqoop export tool for further processing of data and for utilization to downstream teams such as data science.

Automated workflows using shell scripts to pull data from various databases into Hadoop.

Developed Spark programs using Scala, involved in creating Spark SQL Queries and developed Oozie workflow for Spark jobs.

Developed Real-time streaming applications to integrate data from REST API to Hadoop environment using Kafka.

Developed data pipeline to Loading and analyse all the archive data and log files from AWS S3 and load the useful data into HBase tables for downstream.

Scheduling the jobs using OOZIE workflow and coordinator jobs.

Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Map Reduce, pig, Hive, Sqoop, Spark 2.4, Scala, Oozie, HBase, Java, Python, Shell, and bash Scripting.

Client: BrownGreer PLC, Richmond, VA May 2018 – Aug 2019

Role: Big Data Engineer

Responsibilities:

Involved in the development of Hadoop System and improving multi-node Hadoop Cluster performance.

Worked on analysing Hadoop stack and different Big data tools including Pig and Hive, Hbase database and Sqoop.

Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.

Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting.

Worked with different data sources like Avro data files, XML files, JSON files, SQL server and Oracle to load data into Hive tables.

Used Spark to create structured data from a large amount of unstructured data from various sources.

Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.

Built an Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest Financial data into HDFS.

Completed data extraction, aggregation, and analysis in HDFS by using PySpark and store the data needed to Hive.

Involved in the process of data acquisition, data pre-processing and data exploration of claims settlement project in Spark using PySpark.

Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis

Implemented a distributed messaging queue to integrate with Hbase using Apache Kafka and Zookeeper.

Involved in various phases of development analysed and developed the system going through Agile Scrum methodology.

Environment: Hadoop, Hive, HBase, Oozie, Pig, Zookeeper, MapR, HDFS, MapReduce, Java, MS, Jenkins, Agile, Apache Solr, Apache Flume, Spark, Python, Hbase, Apache Kafka, Talend.

Client: Tata Consultancy Services, India Nov 2015 – Dec 2017

End Client: Target, Minneapolis, MN, USA (Offshore Project)

Role: Hadoop Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.

Written MapReduce code to process & parsing the data from various sources & storing parsed data into Hive.

Involved in developing UML Use case diagrams, Class diagrams, and Sequence diagrams using Microsoft Visio.

Worked on moving all log files generated from various sources to HDFS for further processing.

Developed workflows using custom MapReduce, Pig, Hive and Sqoop.

Developing predictive analytic product for using Apache Spark, SQL/HiveQL and High Charts.

Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.

Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (JSON) for visualization, and generating.

Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.

Developed Map Reduce program for parsing and loading into HDFS information.

Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.

Written Hive UDF to sort Structure fields and return complex data type.

Responsible for loading data from UNIX file system to HDFS.

Developed ETL Applications using HIVE, SPARK, & SQOOP and Automated using Oozie

Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.

Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.

Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG.

Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.

Project Worked on GRE Merchandising.

End Client: Target, Minneapolis, MN, USA (Offshore Project)

Role: Software Developer.

Project Description: Maintaining and enhancing a merchandising system of a retail firm which comprises components such as ordering, Replenishment, Warehousing, Transportation, Pricing, and many more parts of it. The main objective of this project is to enhance the application based on the change in environment and with fixing the repeated problems which are encountered in production. Merchandising wing being the most important in running the stores this job requires high attention to detail and working in pressure to deliver on time.

Responsibilities:

Worked as mainframes engineer in Target GRE Merchandising team, where I was involved in supporting and developing Pricing application that automates the price changes in target store based on a set of variables

Designed and deployed programs into production environment to support the batch flow of jobs using COBOL, JCL and DB2.

Involved in Troubleshooting job abends of production mainframe jobs and automating the manual work involved in resolving them

Helped in Created dashboard to track updates and inflow of tickets into ServiceNow helping client to monitor the stability of application across teams

Provided production support to existing applications by analyzing technical and business defects

Participated in Agile processes including Release Planning and Sprint Planning. Responsible for daily and weekly scrum report to the scrum master and higher management.

Involved in studying the feasibility of moving mainframe jobs to the latest environments.

Environment: COBOL, JCL, CA7, IBM DB2, Control-M, Service Now, IBM Data Stage, File-Aid, Abend-Aid, Oracle SQL.

Client: EPSoft Technologies, India Feb 2014 – Mar 2015

Role: SQL Developer

Responsibilities:

Participated in developing Logical design of database incorporating business logic and user requirement and building entity relationship (ER) diagrams.

Identified relationships between tables, enforced referential integrity using foreign key constraints. Used Views, Triggers and Stored Procedures for updating and cleaning the existing data and the new data.

Created Database Objects like Tables, Stored Procedures, Functions, Views, Constraints and user defined data types.

Created T- SQL objects & scripts used to maintain data integrity across all databases throughout database servers.

Written Stored Procedures and functions as per business requirement.

Defined Check constraints, Business Rules, Indexes and Views and wrote SQL procedures with optimized complex queries to extract data.

Design, Development, Implementation and support of ETL process involving various data sources and destinations using SSIS.

Managing the credit card and debit card sales by writing the query as per branch requirements.

Participated in system specification meetings, analyzed user requirements, coordinated with coders and team members.

Created SSIS packages to extract data from OLTP to OLAP systems and scheduled Jobs to call the packages and Stored Procedures.

Environment: MS SQL Server 2008/2005, Excel (pivot tables), SSIS, SSAS, SSRS, T-SQL, VB Script, Visual studio 2005, MS Access.

Contact this candidate