Sign in

Data Aws

San Carlos, CA
January 06, 2020

Contact this candidate


Jiazhen Tang

*** * ** ****** ****, Apt.***, San Mateo, CA 94401



Northeastern University, San Jose, CA

College of Computer and Information Science Jan. 2018 — Present Related Courses: Intensive Fundamentals of Computer Science, Discrete and Data Structures, Object-Oriented Design, Introduction to Algorithms, Introduction to Database Management Systems, Algorithms, Computer Systems

San Jose State University, San Jose, CA May 2017


Languages: Java, C/C++, SQL

Software/Frameworks: Hadoop, Flink,Maven, Docker, MySQL, Eclipse, IntelliJ PROJECT

Virtual File System Implementation Summer 2019

● Implemented the fsx600 file system (a simple derivative of the Unix FFS file system) in C

● Used Fuse toolkit to implement the file system as a user-space process.

● Used a data file which is accessed through a block device interface instead of a physical disk Dynamic Memory Allocator Summer 2019

● Learned original C dynamic memory allocator design and managed the heap segment

● Implemented a dynamic memory allocator using the implicit block list

● Used first fit algorithm and implemented the doubly linked list to improve the performance Website Summer 2019

● Collected the wage data from USCIS. Parsed the csv data to parquet format using pandas

● Persisted all wage data into dynamodb. Used employer name as partition key to speed up queries

● Used AWS lambda functions to handle all the http requests and retrieve data from dynamodb Job Searching Database Design Fall 2018

● Developed a management system for the job seekers to find their desired jobs and for the recruiters to find the right candidates

● Created Java servlets to handle HTTP requests and responses

● Built a relational database to persist data

● Deployed server to Amazon EC2 and tested by JMeter Real-time Flink Tweets Analytics/Processing Summer 2018

● Used flink twitter connector to get tweets

● Calculated the trend by hashtags using flink window function

● Persisted tweets in parquet format for offline processing

● Used spark/pandas to read and analyzed the parquet data

● Dockerized flink for easy local testing

● Deployed flink using AWS ECS

Cloud Computing Course Project Spring 2018

● Learned to use AWS mapreduce and cascading

● Wrote mapreduce program to analyze enron email dataset

● Wrote connected component algorithm using mapreduce and cascading

● Benchmarked serialization/deserialization speed/size between json/avro/protobuf/msgpack

● Benchmarked columnar (Parquet) and row based (Avro) storage format

● Benchmarked different compression codec, for example, snappy, gzip/deflate and brotli INTERESTS/ACTIVITIES

● Northeastern University ALIGN Scholar award 2018 — 2019

● San Jose State University College of Social Science Dean’s List 2016 — 2017

Contact this candidate