Jakub Kurzak
Innovative Computing Laboratory
Electrical Engineering and Computer Science Department
University of Tennessee
Ste 203 Claxton
Knoxville, TN 37996-3450
OFFICE: 318 Claxton
PHONE: 865-***-****
FAX: 865-***-****
EMAIL: ******@****.***.***
WWW: http://web.eecs.utk.edu/~kurzak
EDUCATION
PhD
Computer Science Department ISBN: 978-0542512018
University of Houston
Houston, Texas
2005
MS
Electrical Engineering Department
Wroc aw University of Technology
Wroc aw, Poland
2000
EXPERIENCE
Research Director September 2010 present
Research Scientist March 2009 August 2010
Senior Research Associate January 2006 February 2009
Innovative Computing Laboratory
Electrical Engineering and Computer Science Department
University of Tennessee
Knoxville, Tennessee
Research Assistant January 2001 December 2005
Institute for Molecular Design
Department of Chemistry
University of Houston
Houston, Texas
EXPERTIESE
All aspects of utilizing silicon to the fullest, from exploiting instruction-level parallelism with SIMD vectorization,
through multithreading on multi-core processors, to message-passing on large scale distributed-memory systems.
Experience with hardware accelerators / co-processors: Cell B. E. (RIP), GPUs, MIC.
Extensive knowledge of numerical algorithms for scientific computing, linear algebra in particular.
BOOK EDITOR
[1] J. Kurzak, D. Bader, J. Dongarra (editors)
Scientific Computing with Multicore and Accelerators
Computational Science series, Chapman & Hall/CRC, 2010
ISBN: 978-1439825365
BOOK CHAPTERS
[9] J. Dongarra, J. Kurzak, P. Luszczek, S. Tomov
Dense Linear Algebra on Accelerated Multicore Hardware
In High-Performance Scientific Computing: Algorithms and Applications
Springer-Verlag, 2012
ISBN: 978-1447124368
[8] J. Kurzak, P. Luszczek, A. YarKhan, M. Faverge, J. Langou, H. Bouwmeester, J. Dongarra
Multithreading in the PLASMA Library
In Handbook of Multi and Many-Core Processing: Architecture, Algorithms, Programming, and Applications
Computer & Information Science Series, Chapman & Hall/CRC, 2012
ISBN: 978-1447124368
[7] P. Luszczek, J. Kurzak, J. Dongarra
Changes in Dense Linear Algebra Kernels: Decades-Long Perspective
In Solving the Schr dinger equation: has everything been tried?
Imperial College Press
ISBN: 978-1848167247
[6] W. Alvaro, J. Kurzak, J. Dongarra,
Implementing Matrix Multiplication on the Cell B. E.
In Scientific Computing with Multicore and Accelerators
Computational Science series, Chapman & Hall/CRC, 2010
ISBN: 978-1439825365
[5] J. Kurzak, J. Dongarra
Implementing Matrix Factorizations on the Cell B. E.
In Scientific Computing with Multicore and Accelerators
Computational Science series, Chapman & Hall/CRC, 2010
ISBN: 978-1439825365
[4] J. Kurzak, H. Ltaief, J. Dongarra, R. Badia
Scheduling for Numerical Linear Algebra Library at Scale
In High Speed and Large Scale Scientific Computing
Advances in Parallel Computing series, IOS Press, 2010
ISBN: 978-1607500735
[3] A. Buttari, J. J. Dongarra, J. Kurzak, J. Langou
Parallel Dense Linear Algebra Software in the Multicore Era
In Cyberinfrastructure Technologies and Applications
Nova Science Publishers, Inc., 2009
ISBN: 978-1606920633
[2] A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, S. Tomov
Using Mixed Precision in Solving Linear Systems of Equations
In High Performance Computing and Grids in Action
Advances in Parallel Computing series, IOS Press, 2008
ISBN: 978-1586038397
[1] J. Demmel, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, E. J. Riedy, C. Voemel,
J. Kurzak, A. Buttari, J. Langou, S. Tomov, J. Dongarra, X. Li, O. Marques, J. Langou, P. Luszczek
Prospectus for a Dense Linear Algebra Software Library
In Handbook of Parallel Computing: Models, Algorithms and Applications
Computer and Information Science series, Chapman & Hall/CRC, 2008
ISBN: 978-1584886235
JOURNAL PUBLICATIONS
[19] J. Kurzak, P. Luszczek, J. Dongarra
LU Factorization with Partial Pivoting for a Multicore System with Accelerators
IEEE Transactions on Parallel and Distributed Systems (submitted)
http://www.computer.org/portal/web/tpds
[18] J. Kurzak, S. Tomov, J. Dongarra
Autotuning GEMM Kernels for the Fermi GPU
IEEE Transactions on Parallel and Distributed Systems (accepted)
http://www.computer.org/portal/web/tpds
[17] J. Kurzak, H. Ltaief, J. Dongarra, Rosa M. Badia
Scheduling Dense Linear Algebra Operations on Multicore Processors
Concurrency and Computation: Practice and Experience 22(1):15-44, 2010
DOI: 10.1002/cpe.1467
[16] H. Ltaief, J. Kurza, J. Dongarra,
Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures
Scientific Programming 18(1):35-50, 2010
DOI: 10.3233/SPR-2010-0297
[15] H. Ltaief, J. Kurza, J. Dongarra
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
IEEE Transactions on Parallel and Distributed Systems 21(4):417-423, 2010
DOI: 10.1109/TPDS.2009.79
[14] M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, S. Tomov
Accelerating Scientific Computations with Mixed Precision Algorithms
Computer Physics Communications, 40th Anniversary Issue 180(12):2526-2533, 2009
DOI: 10.1016/j.cpc.2008.11.005
[13] J. Kurzak, J. Dongarra
QR Factorization for the CELL Processor
Scientific Programming, Special Issue: High Performance Computing
with the Cell Broadband Engine 17(1-2):31-42, 2009
DOI: 10.3233/SPR-2009-0268
[12] J. Kurzak, W. Alvaro, J. Dongarra
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture CELL Processor
Parallel Computing: Systems & Applications, Special Issue: Revolutionary Technologies for Acceleration of
Emerging Petascale Applications 35(3):138-150, 2009
DOI: 10.1016/j.parco.2008.12.010
[11] A. Buttari, J. Langou, J. Kurzak, J. Dongarra
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
Parallel Computing: Systems and Applications 35:38-53, 2009
DOI: 10.1016/j.parco.2008.10.002
[10] A. Buttari, J. Langou, J. Kurzak, J. Dongarra
Parallel Tiled QR Factorization for Multicore Architectures
Concurrency and Computation: Practice and Experience 20(13):1573-1590, 2008
DOI: 10.1002/cpe.1301
[9] A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, S. Tomov
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance
While Achieving 64-bit Accuracy
ACM Transactions on Mathematical Software 34(4), article 17, 22 pages, 2008
DOI: 10.1145/1377596.1377597
[8] A. Buttari, J. Dongarra, J. Langou, J. Langou, P. Luszczek, J. Kurzak
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems
International Journal of High Performance Computing Applications 21(4):457-466, 2007
DOI: 10.1177/1094342007084026
[7] J. Kurzak, A. Buttari, J. Dongarra
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems 19(9):1175-1186, 2008
DOI: 10.1109/TPDS.2007.70813
[6] J. Kurzak, J. Dongarra
Implementation of Mixed Precision in Solving Systems of Linear Equations on the CELL Processor
Concurrency and Computation: Practice and Experience 19(10):1371-1385, 2007
DOI: 10.1002/cpe.1164
[5] J. Kurzak, B. M. Pettitt
Message-Passing Implementation of the Data Diffusion Communication Model
in Fast Multipole Methods: Large Scale Biomolecular Simulations
Journal of Algorithms & Computational Technology 2(4):557-579, 2008
DOI: 10.1260/174830108786231722
[4] J. Kurzak, D. Mirkovic, B. M. Pettitt, S. L. Johnsson
Automatic Generation of FFTs for Translations of Multipole Expansions in Spherical Harmonics
International Journal of High Performance Computing Applications 22(2):219-230, 2008
DOI: 10.1177/1094342008090915
[3] J. Kurzak, B. M. Pettitt
Fast Multipole Methods for Particle Dynamics
Molecular Simulation 32(10/11):775-790, 2006
DOI: 10.1080/08927020600991161
[2] J. Kurzak, B. M. Pettitt
Massively Parallel Implementation of a Fast Multipole Method for Distributed Memory Machines,
Journal of Parallel and Distributed Computing 65(7):870-881, 2005
DOI: 10.1016/j.jpdc.2005.02.001
[1] J. Kurzak, B. M. Pettitt
Communications Overlapping in Fast Multipole Particle Dynamics Methods
Journal of Computational Physics 203(2):731-743, 2005
DOI: 10.1016/j.jcp.2004.09.012
CONFERENCE PUBLICATIONS
[12] J. Kurzak, P. Luszczek, J. Dongarra
Programming the LU Factorization for a Multicore System with Accelerators
VECPAR'12: International Meeting on High-Performance Computing for Computational Science, Kobe, Japan, 2012
Lecture Notes in Computer Science XXXX:xxx-xxx, Springer, 201x
http://nkl.cc.u-tokyo.ac.jp/VECPAR2012/ (accepted)
[11] G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak,
J. Langou, P. Lemarinier, H. Ltaief, P. Luszczek, A. YarKhan, J. Dongarra
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA
IPDPSW'11: International Parallel and Distributed Processing Symposium, Workshops and PhD Forum, Anchorage, AK, 2011
DOI: 10.1109/IPDPS.2011.299
[10] J. Kurzak, R. Nath, P. Du, J. Dongarra
An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs
PARA'10: State of the Art in Scientific and Parallel Computing, Reykjav k, Iceland, 2010
Lecture Notes in Computer Science 7134:248-257, Springer, 2012
DOI: 10.1007/978-3-642-28145-7
[9] E. Agullo, H. Bouwmeester, J. Dongarra, J. Kurzak, J. Langou, L. Rosenberg
Towards and Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures
VECPAR'10: High Performance Computing for Computational Science, Berkeley, California, 2010
Lecture Notes in Computer Science 6449:129-138, Springer, 2011
DOI: 10.1007/978-3-642-19328-6_14
[8] E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, S. Tomov
Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
SciDAC'09: Scientific Discovery through Advanced Computing, San Diego, California, 2009
Journal of Physics: Conference Series 180:012037, IOP Publishing, 2009
DOI: 10.1088/1742-6596/180/1/012037
[7] W. Alvaro, J. Kurzak, J. Dongarra
Fast and Small Short Vector SIMD Matrix Multiplication Kernels
for the Synergistic Processing Element of the CELL Processor
ICCS'08: International Conference on Computational Science, Krak w, Poland, 2008
Lecture Notes in Computer Science 5101:935-944, Springer, 2008
DOI: 10.1007/978-3-540-69384-0_98
[6] A. Buttari, J. Langou, J. Kurzak, J. Dongarra
Parallel Tiled QR Factorization for Multicore Architectures
PPAM'07: International Conference on Parallel Processing and Applied Mathematics, Gda sk, Poland, 2007
Lecture Notes in Computer Science 4967:639-648, Springer, 2007
DOI: 10.1007/978-3-540-68111-3_67
[5] A. Buttari, J. Dongarra, P. Husbands, J. Kurzak, K. Yelick
Multithreading for Synchronization Tolerance in Matrix Factorization
SciDAC'07: Scientific Discovery through Advanced Computing, Boston, Massachusetts, 2007
Journal of Physics: Conference Series 78:012028, IOP Publishing, 2007
DOI: 10.1088/1742-6596/78/1/012028
[4] J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Dongarra
Exploiting the Performance of 32 Bit Floating Point Arithmetic in Obtaining 64 Bit Accuracy
(Revisiting Iterative Refinement for Linear Systems)
SC'06: ACM/IEEE Conference on Supercomputing, Tampa, Florida, 2006
DOI: 10.1145/1188455.1188573
[3] J. Kurzak, J. Dongarra,
Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead
PARA'06: State of the Art in Scientific and Parallel Computing, Ume, Sweden, 2006
Lecture Notes in Computer Science 4699:147-156, Springer, 2007
DOI: 10.1007/978-3-540-75755-9_18
[2] J. Demmel, J. Dongarra, B. Parlett, W. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques,
E. J. Riedy, C. Voemel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, S. Tomov
Prospectus for the Next LAPACK and ScaLAPACK Libraries
PARA'06: State of the Art in Scientific and Parallel Computing, Ume, Sweden, 2006
Lecture Notes in Computer Science 4699:11-23, Springer, 2007
DOI: 10.1007/978-3-540-75755-9_2
[1] A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek, S. Tomov
Impact of Multicore on Math Software
PARA'06: State of the Art in Scientific and Parallel Computing, Ume, Sweden, 2006
Lecture Notes in Computer Science 4699:1-10, Springer, 2007
DOI: 10.1007/978-3-540-75755-9_1
TECHINICAL REPORTS (not published elsewhere)
[5] J. Kurzak, P. Luszczek, S. Tomov, J. Dongarra
LAPACK Working Note 267:
Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture GeForce GTX 680
Technical Report UT-CS-12-XXX, Department of Compter Science, University of Tennessee, 2012
http://www.netlib.org/lapack/lawnspdf/lawn267.pdf
[4] J. Kurzak, J. Dongarra
LAPACK Working Note 220:
Fully Dynamic Scheduler for Numerical Computing on Multicore Processors
Technical Report UT-CS-09-643, Department of Compter Science, University of Tennessee, 2009
http://www.netlib.org/lapack/lawnspdf/lawn220.pdf
[3] H. Ltaief, J. Kurzak, J. Dongarra
LAPACK Working Note 208:
Parallel Block Hessenberg Reduction Using Algorithms-by-Tiles for Multi-core Architectures Revisited
Technical Report UT-CS-08-624, Department of Compter Science, University of Tennessee, 2009
http://www.netlib.org/lapack/lawnspdf/lawn208.pdf
[2] A. Buttari, J. Dongarra, J. Kurzak
LAPACK Working Note 185:
Limitations of the PlayStation 3 for High Performance Cluster Computing
Technical Report UT-CS-07-597, Department of Computer Science, University of Tennessee, 2007
http://www.netlib.org/lapack/lawnspdf/lawn186.pdf
[1] A. Buttari, P. Luszczek, J. Kurzak, J. Dongarra, G. Bosilca
SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3
Technical Report UT-CS-07-595, Department of Computer Science, University of Tennessee, 2007
www.netlib.org/utk/people/JackDongarra/PAPERS/scop3.pdf
POPULAR SCIENCE
J. Kurzak, A. Buttari, P. Luszczek, J. Dongarra
The PlayStation 3 for High Performance Scientific Computing
Computing in Science and Engineering 10(3):84-87, 2008
ISSN: 1521-9615
TUTORIALS
J. Dongarra, J. Kurzak
LINPAK on Future Manycore & GPU Based Systems
ISC 2010'11'12: International Supercomputing Conference, Hamburg, Germany, 2010'11'12
J. Dongarra, J. Demmel, M. Heroux, J. Kurzak
Linear Algebra Libraries for High-Performance Computing: Scientific Computing with Multicore and Accelerators
SC 2011: ACM/IEEE Conference on Supercomputing, Seattle, WA, 2011
D. G ddeke, J. Kurzak, J. P. Wei
Scientific Computing on GPUs
PPAM 2011: Parallel Processing and Applied Mathematics, Toru, Poland, 2011
J. Kurzak
Cell Broad Engine Programming to the Metal
AFRL / Griffis Institute, Rome, NY, 2009
J. Kurzak, A. Buttari
Introduction to Programming High Performance Applications on the CELL Broadband Engine
HOTI 2007: 15th Annual IEEE Symposium on High-Performance Interconnects, Stanford, CA, 2007
REVIEWER
Transactions on Parallel and Distributed Systems (IEEE)
Journal of Parallel and Distributed Computing (Elsevier)
Parallel Computing: Systems and Applications (Elsevier)
Concurrency and Computation: Practice and Experience (Wiley)
International Journal of High Performance Computing Applications (SAGE)
Journal of Computer and System Sciences (Elsevier)
IBM Journal of Research and Development (IBM)
Transactions on Mathematical Software (ACM)
Parallel Processing Letters (World Scientific)
Journal of Computational Science (Elsevier)
Embedded Systems Letters (IEEE)
Computing in Science & Engineering (IEEE)
International Conference on Supercomputing (ICS)
International Parallel & Distributed Processing Symposium (IPDPS)
Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA)
International Conference on Parallel Processing and Applied Mathematics (PPAM)
International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC)
Springer
Taylor & Francis
U.S. Department of Energy, Office of Science
Natural Sciences and Engineering Research Council of Canada
PROGRAM COMMITTEE
CCGrid: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing ('10'11'12)
SAAHPC: Symposium on Application Accelerators in High Performance Computing ('11'12)
PPAM: International Conference on Parallel Processing and Applied Mathematics ('09)
PPAC: Workshop on Parallel Programming on Accelerator Clusters ('09'10'11)
Euro-Par: European Conference on Parallel Processing ('10)
GRANTS
J. Dongarra, J. Kurzak, P. Luszczek
PULSAR: Parallel Unified Linear Algebra with Systolic Arrays
National Science Foundation
J. Dongarra, J. Kurzak, J. Langou
PLASMA: Parallel Linear Algebra Software for Multiprocessor Architectures
National Science Foundation
COLLABORATORS
E. Agullo J. Demmel L. Johnsson D. Mirkovic
W. Alvaro J. Dongarra W. Kahan B. Parlett
M. Baboulin M. Faverge J. Langou (Julie) M. Pettitt
R. Badia M. Gu J. Langou (Julien) J. Riedy
D. Bindel B. Hadri P. Lemarinier L. Rosenberg
G. Bosilca A. Haidar X. Li S. Tomov
A. Bouteiller T. Herault H. Ltaief C. Voemel
H. Bouwmeester Y. Hida P. Luszczek A. YarKhan
A. Buttari P. Husbands O. Marques K. Yelick
A. Danalis