Data Engineer

Location:

Charlotte, NC

Posted:

February 25, 2019

Contact this candidate

Resume:

Hamsa Rajasekhara

980-***-**** - ********@****.*** - www.linkedin.com/in/hamsa-sr

PROFESSIONAL SUMMARY

• Highly-skilled data engineer bringing 4 years of industry experience in designing and developing bigdata analytics platform, data migration and integration. Seeking Full-Time opportunities as a data engineer in Hadoop, Spark ecosystem and related bigdata, cloud technologies.

• Data enthusiast with magnificent technical background, excellent analytical ability, effective communication skills, team player and versatile to new technologies. Expertise in serving multitude of clients across different domains which includes dealing with TB's of data with various cutting-edge technology and tools.

EDUCATION

University of North Carolina, Charlotte, NC Jan 2018 – May 2019 Master of Science – Information Technology GPA:3.80 (Expected) University of North Carolina, Charlotte, NC Jan 2018 – Dec 2018 Graduate Certificate in Advance Databases and Knowledge Discovery GPA:3.80 Visvesvaraya Technological University, Belgaum, India Aug 2009 – July 2013 Bachelor of Engineering – Information Science GPA: 4.0 TECHNICAL SKILLS

Programming: Python (Numpy, Pandas, Scikit-learn), R, Pig, Hive, Unix Scripting, Java Frameworks: MapReduce, Spark.

Data Platforms: Cloudera, Hortonworks, MapR.

Cloud Technologies: AWS (S3, EC2, EMR), Google Cloud Platform. Databases: DB2, MySQL, Hive, Impala, HBase, MapR DB, BigQuery, SparkSQL. Tools: Eclipse, Pentaho DI, Oozie, IntelliJ, Apache Solr, Apache Sqoop, Apache Flume, Kafka, Kibana, Tableau, Control M, Git, Jira Web Technologies: HTML, CSS.

Operating Systems: Windows, ZOS, Unix

CERTIFICATIONS

IBM Certified Academic Associate – DB2 TCS Internal Foundation Certification -Bigdata Hadoop MapR Certified -Apache Hadoop Essentials Developing Hadoop Applications Apache HBase Data Model and Architecture Apache Spark Essentials Data camp certified: “Introduction to R Course”, "Python Basics" and “Basic Statistics Course”. PROFESSIONAL EXPERIENCE

Graduate Teaching Assistant, CCI – UNC Charlotte Aug 2018 - Present Data Engineer/Hadoop Developer – Tata Consultancy Services Ltd. Dec 2013 – Dec 2017 Substantial experience as Bigdata Engineer with deep understanding of the Hadoop distributed file system and Ecosystem and executing solutions for complex business problems involving large scale data warehousing, real-time analytics and reporting solutions.

• Designing and implementing fast and efficient data acquisition using Big Data processing techniques and tools((HDFS,Mapreduce,Hive,Pig,Sqoop,Oozie,Hbase).

• Migration of data from hierarchical, relational and network-based databases, Web sources or APIs to Hadoop ecosystem (Hive tables and Hbase).

• Identification of required data transformations and performed the same on hive and python scripts.

• Designing, reviewing, implementing and optimizing data transformation processes in the Hadoop using Hive and SparkQL.

• Built report interfaces, data feeds and pipelines. Importing streaming data into HDFS using flume sources.

• Fluent in developing complex SQL queries that includes analytical functions and stored procedures. Expertise in writing regular expressions, analytics on structured data with hive queries, views, partitioning, bucketing and UDF’s using HiveQL.

• Creating Pig latin scripts to carry out essential data operations and tasks.

• Hands on experience in creating apache spark RDD transformations on datasets in Hadoop data lake.

• Formulated procedures for integration of R- Python scripts to ensure proper data access.

• Used apache Oozie to combine multiple jobs for MapReduce, Hive, Pig, Sqoop into logical unit of work.

• Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another.

• Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc.

• Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies.

• Create analytic indexing on data using apache solr.

• Adequate knowledge on Cloud computing with amazon web services like EC2, EMR, S3 to provide fast processing of Bigdata.

• Agile methodology to work with IT and business to progress efficient system development PROJECTS

Scotia EDL: Canadian banking technology – Enterprise data lake (CBT – EDL) project is a key initiative from Scotia’s Canadian Banking Technology

(CBT) group to ingest the data from legacy source systems into Enterprise Data Lake (EDL), Technically Standardize the data, perform data transformation and enrichment for developing regulatory reports and enabling advanced analytics. Product Data Intelligence - Intel Security: It is a business analytical approach to improvise the product sales and revenue for a software product by 3%. The aim was to migrate data from different sources to Hadoop ecosystem for analytics. This project aimed at developing an application which provides a single consolidated view of firm wide client profitability. It involves re-engineering of existing batch application. The millions of records are processed to acquire the client profitability and so Apache Hadoop is used. Bifurcation of Profile and Subscription context data from single data source to different data sources. Migration of telemetry data from multiple sources. Conversion of time and date format from PST to UTC using hive UDF. Analysis on data stored in HDFS to pull various results. Data modelling, query and analyse data to gain insights. Predict future trends. HSBC_ RBWM Data Factory: Data is ingested from 6 different servers (US, Mexico,Canada,France,UK,HongKong) to hadoop datalake hosted on AWS. Moved data from on premises to S3. Created tables in RedShift and load data using copy command. Perform ELT processing using Pentaho. Data coming from the servers are in 7BDAT format which is then converted to csv using SAS DI. Data is moved to the RBWM cluster which contains two different layers: RAL (report abstraction layer) and SAL (strategic abstraction layer). Merging all transformation tables to a final table for data analytics.

Enterprise Datalake - HP Inc.: Invoking Sqoop action via Oozie to migrate the data from MySQL database to HDFS. Developed shell script to fetch current date, time and pass the same to Hive tables. Using Apache Falcon’s Process Entity to schedule and monitor the above Oozie workflows. Implemented a Cloud based solution to provide single solution across HP sites and Business Partner sites enabling greater efficiency for deploying new capabilities.

ACADEMIC PROJECTS

Advanced Classification

Purpose was to build and test advanced classifiers and prescribe strategies. Using data from 2010 Congressional elections, we intend to build a classifier that would predict the election’s outcome. The data set includes information about the campaign funds, social media (Twitter, Facebook, and YouTube) campaigns, and demographics (age, gender) of 941 candidates who were in race in the general elections for the 112th House of Representatives seats in The U.S. Congress.

Market Incentives Dashboard - Continental

This project displays near-real-time accurate data on State Incentives Programs and Incentives Deals which companies make with state and local governments. This data is to be made available to Continental so that they can make business decisions based on visualizations and aggregations of the data in the dashboard.

This product is composed of multiple submodules which function together to display the dashboard and update it. The web crawler deposits the data as the CSV which is then transferred to JSON format and then indexed into Elasticsearch, visualized by Kibana, and displayed in a web app created with a Flask framework. Web-Crawled the data from external job website using Python. Successfully interpreted data to identify key metrics and draw conclusions regarding the job market trends.

Google PageRank Implementation

Applied Page Rank algorithm on set of pages to calculate page rank of individual pages. Created a link graph, implementation of map and reduce function to return sorted pages based on their rank. Product Data Maintenance

A web application to maintain Products. Developed an application using JSP and Servlets for user product maintenance where products can be customized. Implemented using JSP, JPA, JSTL, MySQL. Authenticated using JDBC Realm. Rossmann Sales Prediction

Predicting Sales Performance is one of the predominant challenges in every business. It becomes very crucial for any firm to predict customer demands that helps them to offer the customer the right product at the right time and right place. For this project, we looked at the Rossman Store Sales problem and are tasked to predict the next six weeks of daily sales for 1,115 stores across Germany. By creating a strong predictive model for sales forecasts, we were then able to aid the store managers in creating effective schedules for their staff to optimize productivity and motivation.

Analytics for National Institutes of Health

To perform text analytics including creating word clouds, perform sentiment analysis and topic modelling. The data for this assignment has been collected from psychcentral.com. This website offers an online forum for posting questions and answers related to mental health. Please visit https://forums.psychcentral.com for more information. Our objective is to perform text analytics to discover useful information related to mental health. Truxxit Website Database

Truxxit is an online truck service where truck comes exactly to the location you need it and brings your items to where you want it to go! Implemented the backend for the website's data model using MySQL database. Used triggers, stored procedures, views, indexes, conditional queries, nested queries, joins, normalization and transactions to make the database stable and ready to use and to map to the UI.

Contact this candidate