Guohui Wang
PhD student, ECE Department, Rice University, Houston, Texas
Webpage: www.GuohuiWang.com Email: ***@****.***
Mobile computing, GPGPU computing, parallel processing, wireless communication.
Research
Interests
Rice University 2008 - now
Education
PhD candidate in Electrical Engineering (GPA: 4.12)
Chinese Academy of Sciences, Beijing, China 2005 - 2008
M.S. in Computer Science
Peking University, Beijing, China 2001 - 2005
B.S. major in Electrical Engineering
B.S. minor in Economics
Qualcomm San Diego, CA
Work
Intern (Received Qualstar Diamond Award) Summer 2012
Experience
GPGPU computing research on mobile GPU
Study general-purpose computing on mobile GPU using OpenCL framework. Imple-
ment and optimize Qclbenchmark (Qualcomm OpenCL Benchmark) on Snapdragon
MSM8960/APQ8064 mobile chip.
Optimize computer vision and image processing algorithm on Adreno320 GPU and build
Android demos to showcase the capability of GPGPU on mobile devices.
National Instruments R&D Austin, TX
Research Intern Summer 2011
1Gbps 8x8 MIMO LTE-Advanced transceiver prototype
Implemented FPGA-based LTE-Advanced prototype that achieves close to 1Gbps data rate
and was demonstrated in the keynote presentation at NIWeek 2011 Conference. I developed
high performance channel estimation and MIMO detection using LabVIEW FPGA. Fixed-
point simulations are written in C and MathScript. The whole design was synthesized with
Xilinx ISE. The channel estimation and MIMO detection modules consume 91.1% slices and
52.5% DSP48s on a Xilinx Virtex5 XC5VSX95t FPGA when targeting at 100MHz.
Rice University Houston, TX
Research
Research Assistant May 2010 - present
Experience
Research on multicore mobile platform
Studied the mobile CPU-GPU co-design and workload partitioning for augmented reality
applications to reduce power consumption on mobile devices. OpenGL ES, C/C++ and
Java are used to develop benchmarks on Android platform for NVIDIA Tegra-2 device.
GPGPU parallel computing
(Related publications: ASILOMAR 2012, SASP 2011, ASILOMAR 2011, JSPS 2011)
Studied massively parallel accelerators on GPGPU for high performance DSP algorithms
such as error correction codes and MIMO detection. Focused on algorithm mapping onto
GPGPU architecture and performance optimization. The techniques used to improve per-
formance include parallelism optimization, memory optimization, adaptive thread/thread
block con guration and so on. For example, the GPGPU-based LDPC decoder achieves
over 100 Mbps throughput on an NVIDIA Fermi GPU.
Rice University Houston, TX
Research Assistant August 2008 - present
Algorithms and architecture for high performance communication systems
(Related publications: ISCAS 2013, ASILOMAR 2012, ASAP 2011, ISCAS 2011, ASILO-
MAR 2009)
Studied and improved the algorithms and architectures for 4G wireless communication sys-
tems such as channel decoder and MIMO detector. Designed and implementation very high
throughput, high e ciency, low complexity ASIC architectures using Verilog HDL.
Tools: MATLAB simulation, xed-point simulation in C, Verilog HDL.
Proposed and implemented a exible router architecture to eliminate memory con icts
in parallel decoding systems and enable high-throughput multi-standard interleaver.
Designed a exible Turbo decoder supporting HSPA+, LTE and WiMAX standards.
Designed VLSI architecture of High Throughput multi-layered LDPC Decoder.
Designed and implemented an FPGA prototype of 3GPP LTE Uplink Receiver.
Rice University Houston, TX
Research Assistant September 2010 - May 2011
Use High-Level Synthesis (HLS) tools to design DSP accelerators.
Implemented ASIC accelerators for several key modules in wireless communication sys-
tems such as QR decomposition, CORDIC module and fully parallel matrix multiplica-
tion. Mentor Graphics Catapult C HLS tool and Design Compiler were used.
Chinese Academy of Sciences Beijing, China
Research Assistant September, 2005 - June, 2008
VLSI architecture for 2K HD cinema system
(Related publications: High Technology Letters 2008)
Designed and implemented VLSI architecture for high throughput 2K High-de nition
digital cinema (DCI-complaint) playback system. Developed a high-throughput bu er
system to handle concurrent multi-channel streams and to achieve real-time video-audio
synchronization. Implemented color space conversion, bu ering systems and package
control module using Verilog HDL. The system can decode 250Mbps JPEG2000 data
and output dual-channel 1.8Gbps digital video.
Developed MXF package parsing tools and JPEG2000 decoding software.
[Book chapter]
Publications
Y. Sun, G. Wang, B. Yin, J. R. Cavallaro and T. Ly, High-level Design Tools for Complex
DSP Applications, DSP for Embedded and Real-Time Systems: Expert Guide, Elsevier,
2012.
[Journal Papers]
G. Wang, Y. Sun, J. R. Cavallaro and Y. Guo, High-Throughput Low-Complexity In-
terleaver Architecture Solving Memory Contention Problem for Parallel Turbo Decoder, in
preparation to submit to Journal of Signal Processing Systems.
Y. Sun, G. Wang and J. R. Cavallaro, A 1.2Gbps 3GPP LTE Turbo Decoder, in prepa-
ration to submit to IEEE Transaction on VLSI Systems.
M. Wu, Y. Sun, G. Wang, and J. R. Cavallaro, Implementation of a High Throughput
3GPP Turbo Decoder on GPU, Journal of Signal Processing Systems (JSPS), 2011.
G. Wang, Z. Zhu, K. Zhang, Z. Wang, A Novel Design Of the High Speed Bu er and
Video/audio Synchronization in High Resolution Digital Cinema System, High Technology
Letters (In Chinese), Vol.9, 2008.
[Conference Papers]
G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro Accelerating Computer Vision Algo-
rithms Using OpenCL Framework on the Mobile GPU - A Case Study, Submitted to IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
B. Rister, G. Wang, M. Wu and J. R. Cavallaro An Fast and E cient SIFT Detector using
the Mobile GPU, Submitted to IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2013.
G. Wang, A. vosoughi, H. Shen, J. R. Cavallaro, and Y. Guo Parallel Interleaver Architec-
ture with New Scheduling Scheme for High Throughput Con gurable Turbo Decoder, To
appear at IEEE International Symposium on Circuits and Systems (ISCAS), 2013.
G. Wang, H. Shen, B. Yin, Y. Sun and J. R. Cavallaro, High Performance E cient Parallel
Nonbinary LDPC Decoding on GPU, 46th Asilomar Conference on Signals, Systems, and
Computers (ASILOMAR), 2012.
B. Yin, M. Wu, G. Wang, and J. R. Cavallaro, Low Complexity Opportunistic Decoder for
Network Coding, 46th Asilomar Conference on Signals, Systems, and Computers (ASILO-
MAR), 2012.
G. Wang, M. Wu, Y. Sun and J. R. Cavallaro, GPGPU Accelerated Scalable Parallel
Decoding of LDPC Codes, 45th Asilomar Conference on Signals, Systems, and Computers
(ASILOMAR), 2011.
G. Wang, Y. Sun, J. R. Cavallaro and Y. Guo, High-Throughput Contention-Free Con-
current Interleaver Architecture for Multi-Standard Turbo Decoder, IEEE International
Conference on Application-speci c Systems, Architectures and Processors (ASAP), 2011.
G. Wang, M. Wu, Y. Sun, J. R. Cavallaro, A Massively Parallel Implementation of QC-
LDPC Decoder on GPU, IEEE Symposium on Application Speci c Processor (SASP), 2011.
Y. Sun, G. Wang and J. R. Cavallaro, Multi-Layer Parallel Decoding Algorithm and VLSI
Architecture for Quasi-Cyclic LDPC Codes, IEEE International Symposium on Circuits and
Systems (ISCAS), 2011.
G. Wang, B. Yin, K. Amiri, Y. Sun, M. Wu and J. R. Cavallaro, FPGA Prototyping of
A High Data Rate LTE Uplink Baseband Receiver, 43rd Asilomar Conference on Signals,
Systems and Computers (ASILOMAR), 2009.
[Patents]
G. Wang, A. Vosoughi, H. Shen, J. R. Cavallaro, and Y. Guo, System and method for
parallel interleaver for high data rate turbo decoder . U.S. Patent Application. Filed in July,
2012.
G. Wang, Y. Sun, J. R. Cavallaro, and Y. Guo, System and Method for Contention-Free
Memory Access in an Interleaver . U.S. Patent Application. Filed in Nov, 2010.
A. Vosoughi, G. Wang, H. Shen, J. R. Cavallaro, and Y. Guo, Scalable interleaved address
generation for UMTS/HSPA+ turbo decoder . U.S. Patent Application. Filed in July, 2012.
G. Wang, Z. Wang, Z. Wei, Z. Zhu, The Method, System and Device to Implementing
Video/audio Synchronization, led in August, 2007; China Patent No.200710120585.0.
G. Wang, Z. Wei, Z. Wang, A Fast and High Performance Method for Multimedia Video
Zooming, led in November, 2007; China Patent No.200710178188.9.
Z. Wei, G. Wang, Z. Wang A method of Watermark Generation and Detection for digital
cinema Copyright Protection, led in April, 2008; China Patent ZL200810103472.4.
Z. Zhu, Z. Wang, X. Wang, Z. Wei, G. Wang, A copyright protection method and sys-
tem for audio and video contents in digital cinema, led in June, 2008; China Patent
ZL200810114749.3.
Computer Architecture Parallel Computing Operating Systems
Course Work
Stochastic Process Numerical Analysis Information Theory
Advanced VLSI Design VLSI System Test Arch. of Wireless Comm.
Comm. Theory & Systems Error Correcting Codes Communication Network
Digital System Design Digital Image Processing Computer Vision
Course Lab Instructor, ECE Department, Rice University
Teaching
Teach lab sessions, prepare and conduct weekly three-hour lectures reviewing the week s
Experience
course material and explaining lab project materials, grade homework and projects.
ELEC 220: Fundamental of Computer Engineering (Spring 2009, 2010, 2011, 2012)
Teaching Assistant, ECE Department, Rice University
ELEC 303: Random Signals (Fall 2009)
ELEC 522: Advanced VLSI Design (Fall 2010)
Paper reviewer
Professional
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2012
Services
IEEE Communications Letters, 2012
Journal of Computer Science and Technology (JCST) 2012
Springer Frontiers of Computer Science Journal (FCSJ), 2012
IEEE Computer Architecture Letters (CAL), 2012
EURASIP Journal on Wireless Communications and Networking 2011
IEEE International Symposium on Circuits and Systems (ISCAS) 2011, 2012, 2013
European Signal Processing Conference (EUSIPCO) 2011
IEEE International Conference on Communications (ICC) 2011, 2013
Great Lakes Symposium on VLSI (GLSVLSI) 2011
IEEE Workshop on Signal Processing Systems (SiPS) 2012
International Symposium on Information Theory and its Applications (ISITA) 2010
IEEE International Conference on Application-speci c Systems, Architectures and Pro-
cessors (ASAP) 2009, 2012
Committee member of Rice Center for Engineering Leadership 2009 - 2012
Activities
Graduate Student Mentor Program in ECE Depart., Rice Univ. 2009 - 2012
Available upon request.
References
Copyright Protection†, ï¬ led in April, 2008; China Patent ZL200810103472.4.