Close to Three years of experience as a Big Data Engineer and as a Data Analyst.
Experience with all Hadoop components such as HDFS, YARN, HIVE, Impala, Oozie, Sqoop,Spark (Pyspark), Ansible, Shell scripting, JSON scripting, YAML, AWS.
Extensive experience with automation tool Ansible.
Experience with developing cloudformation script and shell scripting.
Knowledge on Kafka and Spark.
Experience with PySpark.
Knowledge on AWS (EC2, S3, EMR, cloud Formation ).
Experience with Python programming, SQL querying, Shell scripting, YAML, JSON
Experience as both Hadoop Admin and Developer.
Big Data Components
Apache Hadoop, HDFS, YARN, Hive, HBase, Impala, Sqoop, Oozie, Kafka, Zookeeper, Spark
Linux (RedHat, CentOs), Windows.
Shell scripting, JSON, Python scripting
SQL Server, MySQL, Postgresql, Oracle.
Location Cincinnati, USA
Client : Procter and Gamble
May 2018 – Present.
Big Data Platform Engineer.
Extensive experience with AWS cloud platform, good hands on experience with services such as Ec2, S3, IAM, RDS, EMR, Clouformation etc.
Developed clouformation script to spawn EC2 instance with deploying infosec security scripts.
Developed cloud formation script to spawn Cloudera Director machine, Ansible and Ansible-Tower machine.
Developed shell scripting for the installation of Cloudera Director and Ansible-tower machine.
Deployed Ansible and Ansible-tower in the big data platform.
Developed Ansible playbook to spawn EC2 instance.
Developed Ansible playbook to spawn EMR auto terminating cluster which run’s the hive and spark jobs.
Installed and configured Ansible-Tower in the big data environment.
Involved in resolving SNOW tickets for day to day Big Data Platform related issues.
Taking care of day to day operation and managing the cluster with help of Cloudera Manager.
Knowledge on Cloudera 360.
Location: NJ, USA.
Handling Hadoop cluster setup involved in start to end process of installation, configuration and monitoring.
Configured High Availability on CDH 5.9.0 Sandbox, Production clusters, involved in configuring other clusters and did Testing on all the components.
Involved in cluster maintenance, commissioning and decommissioning of nodes.
Troubleshoot, Manage and review Data backups, Manage and review Hadoop log files.
Developed Python code to consume the data in Spark.
Worked on AWS provisioning and good knowledge of AWS services like EC2, Elastic Load-balancers, Elastic Container Service, S3, Elastic Beanstalk, Cloud Front, Elastic File system, RDS, Dynamo DB, DMS, VPC, Direct Connect, Route53, Cloud Watch, Cloud Trail, Cloud Formation, IAM, EMR, Elastic Search.
Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
Defined AWS Security Groups, which acted as virtual firewalls that controlled the traffic, allowed reaching one or more AWS EC2 instances.
Hands on Experience in Writing Python Scripts for Data Extract and Data Transfer from various data sources.
Involved in transformations using various Spark Actions and Transformations by Creating RDD's from the required files in HDFS.
Worked on with spark dataframe operations that are required to develop a data format file.
Experience with Hbase, creating new Databases, Tables, loading data.
Knowledge on Talend.
Experience with Python programming.
Loading the data from the different Data sources into HDFS using Sqoop and load into Hive tables, which are partitioned.
Dividing the Streaming data into batches using the Spark Streaming and given as input to the Spark engine for batch processing.
Knowledge in writing SQL queries and makes table queries to profile and analyze the data in MS Access.
June-Oct 2012 Internship. Oct2012 – April2014
Collaborated with different teams for Cluster Planning, Hardware requirement, Server configurations, network equipment’s to implement nine node Cloudera Distributed Hadoop.
Excellent understanding or knowledge of Hadoop architecture and various components such as Big Data and Hadoop Files System HDFS, Job Tracker, Task Tracker, Name Node, Data Node (Hadoop1.x), YARN concepts like Resource Manager, Node Manager (Hadoop 2.x) and Hadoop MapReduce programming paradigm.
Involved in implementation and ongoing administration of Hadoop infrastructure.
Installing, Upgrading and Maintaining Hadoop cluster on Cloudera Distribution.
Making Hadoop cluster ready for development team working on POCs.
Good understanding in Kerberos and how interacts with Hadoop and LDAP.
Experience in configuring High-Availability.
Worked on analyzing Hadoop stack and different big data analytic tools including MapReduce, Pig, Hive, HBase database, Sqoop and Cloudera Manager.
Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
Involved in creating Hive tables and loading and analyzing data using hive queries.
Manage and review Hadoop log files.
Screening of Hadoop cluster Job performances and capacity planning.
Worked on analyzing Hadoop stack and different big data analytic tools including MapReduce, Hive, HBase database, Sqoop and Cloudera Manager.
Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Dumped the data from one cluster to another cluster by using DistCp.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Created namespace, tables in Hbase.
Assisted in exporting analyzed data to relational databases using Sqoop.
Implemented test script to support test driven development and continuous integration.
Worked on Hive query optimization.
Monitor Hadoop cluster connectivity and security.
Experience with full cycle Supply chain management, production forecast, Analysis of production forecast data, Planning the production, procuring parts ( releasing PO, coordinating with vendors for rescheduling of parts ).
Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework.
Knowledge in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Phyton 3 knowledge.
Provided 24x7 on call support on a rotation basis.
Mar 2016 Attended Cloudera hadoop administration training
Aug 2017 Attended Python training.
Nov 2017 Attended Spark Training.
2017 AWS Training.
Bachelor of Technology in EEE from Anna University, Chennai India.