Ph: 860-***-**** firstname.lastname@example.org
Around 5 years of IT experience in development, implementation and testing of BusinessIntelligence and Data Warehousing solutions with BigData technologies.
Excellent Knowledge in understanding Big Data infrastructure, distributed file systems –HDFS, parallel processing – Map Reduce framework and complete Hadoop ecosystem – Hive, Hue, Pig, Hbase, Zookeeper, Sqoop, Kafka, Spark, Flume and Oozie.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
In depth knowledge of real-time ETL/Spark analytics using Spark Sql with visualization
Extensive experience with big data query tools like Pig Latin and HiveQL.
Experience in extracting the data from RDBMS into HDFS using Sqoop.
Experience in collecting the logs from log collector into HDFS using Flume.
Good understanding of NoSQL databases such as HBase, Cassandra and Mongo DB.
Experience in analyzing data in HDFS through MapReduce, Hive and Pig.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH4 & CDH5 clusters, HDP 2.2 with Kafka-Storm and EC2 platform, IBM’s Big Insight Hadoop ecosystem.
Knowledge on Hadoop administration activities such as installation, configuration and management of clusters using Cloudera Manager, Hortonworks and Apache Ambari.
Hands on experience on performing ETL by using Talend and excellent understanding of creating dashboard reports using Tableau.
Experience in Scala, Multithreaded processing, Sql, Plsql.
Hands on experience in loading unstructured data (Log files, Xml data) into HDFS using Flume.
Good knowledge on Apache Spark, Kafka, Splunk and BI tools such as Pentaho and Talend.
Experience in tuning the performances by using Partitioning, Bucketing and Indexing in HIVE.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
Detailed knowledge and experience of Design, Development and Testing Software solutions using Java and J2EE technologies with developing and maintaining the Web Applications using the Web Server Tomcat
Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos, Redhat, Ubuntu.
Bachelor of Technology in Electronics from JNTU, Hyderabad, India.
Masters in Electrical Engineering from SDSU, San Diego, CA, USA.
HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala
Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight
Java & J2EE Technologies
Core Java, Servlets, JSP, JDBC, Java Beans
UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio
Eclipse, Net beans, JUnit testing tool, Log4j for logging
Oracle, DB2, MS-SQL Server, MySQL, MS- Access, Teradata, NoSQL (HBase, MongoDB, Cassandra )
Web Logic, Web Sphere, Apache Tomcat 7
Maven, Scala Build Tool(SBT), Ant
Operating systems and Virtual Machines
Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11
Talend for Big data, Informatica
Nike Inc – OR Jan 2017 – Present
Role: Senior Software/Big Data Engineer
Worked on buildingan ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive
Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.
Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.
Involved in developing a data quality tool for checking all data ingestion into Hive tables
Collaborated with BI teams to ensure data quality and availability with live visualization
Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines
Design and support multi-tenancy on our data platform to allow other teams to run their applications
Used Impala for low latency queries, visualization and faster querying purposes.
Created HIVEQueries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Coded to ENCRYPT/DECRYPT data for PII groups.
Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.
Processed JSON files and ingested into Hive tables
Used python to parse XML files and created flat files from them.
Used Hbase to support front end applications that retrieve data using row keys.
Used Control-M as Enterprise Scheduler to schedule all our jobs
Used Bit-Bucket extensively for code repository
Toyota Insurance Management Solutions - TX Sep 2015 –Dec 2016
Role: Senior Software/Big Data Engineer
Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
Co-ordinated with the other team members to write and generate test scripts, test cases for numerous user stories.
Communicate regularly with business and I.T leardership.
Analyzed driving behavior of the customers as part of User Based Insurance(UBI) Program.
Developed an algorithm to score drivers based on their driving behavior.
Developed pyspark/Spark SQL scripts to analyze various customer behaviors.
Responsible for data extraction and data ingestion from different data sources into HDFS Data Lake Store by creating ETL pipelines using Sqoop, Oozie, Spark and Hive.
Extensively worked with pyspark / Spark SQL for data cleansing and generating Data Frames and RDDs.
Worked on Hortonworks distribution for processing Big Data across a Hadoop Cluster of virtual servers.
Used sqoop to export data to relational database.
Used Bit Bucket to collaboratively interact with the other team members.
Involved in creating Hive tables, loading data of formats like avro,json,csv,txt,parquet and writing hive queries to analyze data using HQL.
Developed Spark Programs for Batch Processing.
Developed Spark code using python for pyspark, scala and Spark-SQL for faster testing and processing of data.
Scheduled various spark jobs for daily and weekly.
Monitored various cluster activities using Apache Ambari.
Created data visualizations using Microsoft Power BI and Tableau.
Modelled Hive partitions extensively for faster data processing.
Implemented various udfs in python as per the requirement.
Involved in data movement between two clouds.
Involved in Agile methodologies, daily scrum meetings and sprint planning.
Environment: Hortonworks, MapReduce, HDFS, HQL, Python, Spark, Hive, Pyspark, Spark SQL,Bit Bucket, Ambari, Jupyter,JIRA, Sqoop, Zookeeper, Scala, Shell Scripting, Sql.
Client: People's United Bank - Bridgeport, CT Mar 2015 – Sep 2015
Role: Hadoop Developer
Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server
Configured MySql Database to store Hive metadata
Responsible for loading unstructured data into Hadoop File System (HDFS)
Created POC to store Server Log data in MongoDB to identify System Alert Metrics
Created Reports and Dashboards of Server Alert Data
Created Map Reduce Jobs using Pig Latin and Hive Queries
Built Big Data Edition & Hadoop based architecture remodelling for one reporting stream.
Involved in importing data from relational databases like Teradata, Oracle, MySQL using Sqoop Used Sqoop tool to load data from RDBMS into HDFS
Cluster coordination services through Zoo Keeper
Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows
Created Reports and Dashboards using structured and unstructured data
Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
Extensively worked with SQL scripts to validate the pre and post data load.
Created unit test plans, test cases and reports on various test cases for testing the data loads
Worked on integration testing to verify load order, time window.
Performed the Unit Testing which validate the data is processed correctly which provides a qualitative check of overall data flow up and deposited correctly into targets.
Responsible for post production support and SME to the project.
Involved in the System and User Acceptance Testing.
Involved in POC working with R for data analysis.
Environment: Hadoop, Cloudera, Pig, Hive, Java, Sqoop, HBase, noSQL, Informatica Power Center 8.6, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures.
Western Union - San Francisco, CA April, 2013 – June, 2014
Role: Hadoop Consultant
Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH4 (Hortonworks) Hadoop cluster on CentOS/Linux. Assisted with performance tuning and monitoring
Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW
Capturing data from existing databases that provide MySQL interfaces using Sqoop
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS
Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, XML, JMS, JBoss and Web Services
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems
Managed and reviewed Hadoop log files
Tested raw data and executed performance scripts
Shared responsibility for administration of Hadoop, Hive and Pig
Exposure to Machine Learning using R and Mahout.
Developed Hive queries for the analysts, used ETL tool Talend for processing and further did visualization for transactional data
Helped business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers to HDFS
Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios
Supported code/design analysis, strategy development and project planning
Developed multiple MapReduce jobs in Java, further any required coding in Java for data cleaning, filtering and preprocessing with experience of testing.
Assisted with data capacity planning and node forecasting
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and Cassandra installing updates, patches and upgrades.
Handling structured and unstructured data and applying ETL processes.
Environment: Hadoop, MapReduce, HDFS, Hive, Cassandra, Java (jdk1.7), Hadoop distribution of Hortonworks, Cloudera, MapR, IBM DataStage 8.1(Designer, Director, Administrator), MySQL, Windows, Linux