Resume

Big Data, Java, Python, C, C++

Location:

Maple Grove, MN

Posted:

September 17, 2017

Contact this candidate

Resume:

RASANJALEE DISSANAYAKA M

**** ******** **** ******** ** 55340

Phone: 678-***-****

Email: ac2chm@r.postjobfree.com

SUMMARY

15+ years of IT experience, this includes design and development of applications using Java, Python, C and C++.

Strong programming knowledge of Core Java - Objects & Classes, Inheritance and Interfaces, Exception Handling, Reading and Writing Files, Multithreading, Collections, Generics, Swing.

Background in all aspects of software engineering with strong skills in parallel data processing and big data computing.

Strong understanding of Hadoop fundamentals.

Big Data hands on experience: HDFS, Python, MapReduce, Spark, Spark SQL Spark Streaming, Hive, Impala, Sqoop, Flume, Pig, HBase, Kafka

Strong understanding of RDBMS concepts with a good knowledge of writing SQL

Good knowledge in NoSQL database HBase and ability to interact using HBase shell and programmatically using java api.

Strong understanding of Hadoop File Formats – Parquet, Avro

Hands-on programming/scripting experience skills – UNIX shell, Python, and JavaScript.

Proficient with application build and continuous integration tools – Maven, SVN, Git.

Exceptional problem solving, algorithm analysis and designing skills.

Strong knowledge and experience in HTML, CSS, XML and related technologies (XQuery/XPath, XSLT)

Excellent written and verbal communication skills

Ability to meet deadlines in a fast-paced environment

PhD in Computer Science

SKILLS

Languages: Core Java, RESTful services using java jerey, Java Spring, Swing, JUnit, Python, C, C++, Scala, VB.NET, HTML, SQL, XML, XPATH, XQUERY, Assembly (Intel), JavaScript, PHP BigData: HDFS, MapReduce, YARN, SQOOP, Impala, Hive, Data formats, Data partitioning, Pig, Flume, Spark, Spark Streaming, Spark SQL, HBase, Kafka

Computer Science: OOA/OOD (UML, Design patterns), Parallel and distributed computing, Algorithms and complexity, Data communication and networking, Data Structures, Design Patterns DBMS: MySQL, HBase, MS Access

Parallel System Programming: C with MPI and OpenMP Operating Systems: Linux, Windows

Version Control: TortoiseSVN, Git

Applications: Maven, Putty, cygwin, Microsoft Office, Microsoft Visio, Spring Tool Suite, Eclipse, Intellij Idea, Wireshark PROFESSIONAL TRAINING

Cloudera developer training for spark and hadoop Project: Loudacre wireless career, migrate existing infrastructure to Hadoop to handle massive, dynamic and variety of data sources.

Responsibilities:

o Used HDFS command line tool to manipulate files in HDFS o Used the Hue File Browser to Browse, View and Manage Files o Submitted spark applications to the YARN cluster o Monitored the application using both the Hue Job Browser and the YARN Web UI o Configured Flume to ingest web log data from a local directory to HDFS o Used Sqoop to import data from MySQL into HDFS and also Import Directly into Hive and Impala o Exported data in HDFS back to MySQL database using Sqoop o Defined Impala/Hive tables to model and view data in HDFS o Imported data in Avro format and created Impala/Hive tables to access it o Saved existing Impala table in different data formats such as Parquet o Created and loaded Impala/Hive tables with data, partitioned by fields o Used HBase shell to perform CRUD operations on HBase tables storing census data o Used HBase java API to perform CRUD operations on HBase tables and filter data based on conditions o Wrote MapReduce programs to perform complex operations on HBase tables. o Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS o Developed spark applications using Scala and Python o Used spark to perform ETL operations to parse data in XML format o Used Spark Pair RDDs to join different datasets such as web server log files and user account data o Set various Spark configuration options in a properties file and set log levels o Used the Spark Application UI to view the execution stages for jobs o Analyzed the performance effect of caching of RDDs o Implemented iterative algorithms in Spark

o Used Spark to create partitioned datasets

o Used Spark SQL to load data from MySQL, process it, and store it to HDFS Environment: HDFS, MapReduce, YARN, SQOOP, Impala, Hive, HBase, Pig, Flume, Spark, Spark SQL, HBase, Scala, Python

Project: Spark streaming project: Streaming application that track error messages from log file stream from a websites and apply clustering to cluster live stream of twitter data. Responsibilities:

o Developed streaming applications to track error messages received from a web log stream using DStream transformations

o Performed stateful transformations to maintain statistics from a netcat source o Calculated the cumulative error rate of log messages received in a stream from a website using window operations o Applied ML algorithms(Streaming K-Means clustering) on twitter streaming data to identify where tweets are coming from and trending hash tags

o Built robustness of streaming application using checkpointing to recover from crashes from last checkpoint state Environment: HDFS, Python, Spark Streaming, MLLib

Behind and Beyond Bigdata - Stanford University

o Topics covered: Driving factors of Bigdata, Creating new Knowledge, Internet of Things, New Markets, Privacy of Bigdata, Security of Bigdata

Tackling Challenges of BigData- Massachusetts Institute of Technology o Topics covered: data collection (smartphones, sensors, the Web), data storage and processing (scalable relational databases, Hadoop, Spark, etc.), extracting structured data from unstructured data, analytics (machine learning, data compression, efficient algorithms), visualization, and a range of applications RECENT PROFESSIONAL EXPERIENCE:

Employer: College of Saint Benedict and Saint John’s University 2014-2017 Role: Assistant Professor, Computer Science

Project#1: Movie Recommender System Lead a project where team worked on a big data application (implemented in python) that recommends movies based on user ratings Environment: Python, Cloudera CDH 5.4

Project#2: Network Analysis. A project where custom python based network server and client applications were implemented and analyzed for network traffic.

Responsibilities:

o Developed python applications to implement TCP and UDP clients and the corresponding servers that serve browser requests.

o Implemented a single segment IP network

o Setup, configured network routers and established IP addresses for the Linux workstations o Configured the routers to use dynamic routing using the RIP protocol o Analyzed Wireshark traces that include HTTP, DNS, IP, TCP and UDP traffic Environment: Python Socket programming, Wireshark

Employer: North Georgia University 2014-2014

Role : Instructor, Computer Science

Courses taught: CSCI 1301 Python: A Multimedia Approach Employer: Georgia State University 2007-2014

Role : Research Assistant, Distributed and Mobile Systems Laboratory (DiMos) Designed, implemented and tested distributed search algorithms.

Project#1: Text Mining. A java project to perform text processing of Reuters21578 newswire dataset Responsibilities:

o Developed a custom java library to perform text mining of Reuters newswire dataset o Developed a java application to performed word-sense disambiguation of a given text corpus. Environment: Java, semantic java libraries ( edu.mit.jwi, edu.smu.tspell.wordnet) apache commons java libraries( org.apache.commons.lang, org.apache.commons.logging, org.apache.commons.collections, org.apache.commons.io )

Project#2: NeuronBank. A web-based knowledge management system and an online reference source and informatics tool for exploring the vast knowledge of neurons and the circuits that they form. Responsibilities:

o Involved in analysis of the existing application and contributed towards refactoring/upgrade plans. o Developed the client tier components using HTML, and JavaScript. o Developed and maintained code to implement the web based neuron search capability. o Used code version controlling using TortoiseSVN

o Deployed the application in the web server and wrote shell scripts for checking web resource accessibility. Environment: Java, servelets, php, tortoiseSVN

Project#3: Network Search. A resource search tool for distributed overlay networks. Responsibilities:

o Designed and implemented an unsupervised resource search mechanism for distributed networks. o Implement the mechanism using java using Peersim simulator. o Experimentally analyzed the performance of search algorithm compared to state-of-art search algorithms. Environment: Java, Peersim(implemented using java)

Project#4: XML

Responsibilities:

o Developed a new Java application to validate an XML document against a schema document. o Designed and implemented XML schemas

o Implemented XPath and XQuery expressions and XSLT transforms to transform the content Environment: EditX XML Editor,XML, XPath, XQuery, XML Schema, java, java libraries (org.w3c.dom, oracle.xml.parser.v2)

Courses taught: CSC 2010 Java – Lab, CSC 2310 Java,CSC 3210 Computer Organization and Programming (RISC Assembly)

EDUCATION

PhD in Computer Science, Georgia State University, Atlanta, GA, USA, 2014 MS in Computer Science, Georgia State University, Atlanta, GA, USA, 2012

Contact this candidate