Main
Program (Room: Lalezar) |
|
Wednesday, June 1, 2016
Keynote I: Performance = Bandwidth divided by Latency
Yale Patt,
The University of Texas at Austin
Chair: Onur Mutlu
·
Abstract: Supercomputing has always required the ultimate in performance, and if you
look at the evidence (e.g., the Famous Top 500), you would think "Performance"
equals "Bandwidth." Unfortunately, it matters how long it takes to get
something done. Otherwise, for example, it would be a win to do a minimal
power 64 bit integer add by streaming the bits serially through a full adder
and latch in 64 cycles. Clearly, latency matters. In this talk I would like
to look at some of the opportunities for decreasing latency, and how the
language designer, programmer, compiler writer, and microarchitect can all
contribute to this aspect of Supercomputing performance.
·
Bio: Yale Patt is Professor of Electrical and Computer Engineering and
the Ernest Cockrell, Jr. Centennial Chair in Engineering at The University
of Texas at Austin. He enjoys equally teaching freshmen, teaching graduate
students, and directing the research of six PhD students in high
performance computer implementation. He has, for more than 50 years,
combined an active research program with extensive consulting and a strong
commitment to teaching. The focus of his research is generally five to ten
years beyond what industry provides at that point in time. His rationale
has always been that he does not do revenue shipments, preferring to
produce knowledge that will be useful to future revenue shipments and, more
importantly, graduates who will design those future products. More information
is available on his website:www.ece.utexas.edu/~patt.
Lightning Talks
Chair: Onur Mutlu
As part of the conference, this year, we will hold a lightning session. This session will be held after the first keynote on the first day of the conference, June 1. It is a single large session, so we expect almost all conference attendees to be present in this session.
The purpose of this session is to:
For authors: We will follow up with more instructions on the lightning session later. We will ask you to submit your slides beforehand, in PDF, so that we can upload them to a single computer and run the session from that computer. Please plan on a presentation that is less than 60 seconds. The 60-second limit will be strictly enforced.
Session 1: Heterogeneous Systems
Chair: Mark Silberstein, Technion
·
Polly-ACC: Transparent
compilation to heterogeneous hardware
Tobias
Grosser and Torsten Hoefler
(ETH Zurich)
·
Hybrid CPU-GPU scheduling and
execution of tree traversals
Jianqiao
Liu, Nikhil Hegde, and Milind
Kulkarni (Purdue University)
·
Exploiting Dynamic Reuse
Probability to Manage Shared Last-level Caches in CPU-GPU Heterogeneous
Processors
Siddharth
Rai and Mainak Chaudhuri (Indian Institute of Technology Kanpur)
Lunch Talk by Yale Patt on Education (Room: Galata)
Chair: Onur Mutlu
Session 2: Power, Energy, Variation
Chair: Ulya Karpuzcu, Minnesota
·
AEQUITAS: Coordinated Energy
Management Across Parallel Applications
Haris Ribic and Yu David Liu (SUNY Binghamton)
·
Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes
Dimitrios
Chasapis, Marc Casas, and Miquel
Moretó (Barcelona Supercomputing Center), Martin
Schulz (Lawrence Livermore National Laboratory), and Eduard Ayguadé,
Jesus Labarta, and Mateo Valero (Barcelona Supercomputing Center)
·
Variation Among Processors Under
Turbo Boost in HPC Systems
Bilge
Acun, Phil Miller, and Laxmikant
V. Kale (University of Illinois at Urbana-Champaign)
Session 3: NVMs & Persistent
Memory
Chair: Heon Yeom, Seoul National University
·
Mini-Ckpts:
Surviving OS Failures in Persistent Memory
David
Fiala and Frank Mueller (North Carolina State
University), Kurt Ferreira (Sandia National Laboratories), and Christian
Engelmann (Oak Ridge National Laboratory)
·
High Performance Design for HDFS
with Byte-Addressability of NVM and RDMA
Nusrat
Sharmin Islam, Md. Wasi-ur- Rahman, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda (The Ohio State
University)
·
Write-Aware Management for Fast
NVM-based Memory Extensions
Amro Awad (North Carolina State University), Sergey Blagodurov (Advanced Micro Devices (AMD)), and Yan Solihin (North Carolina State University)
Session 4: Data Centers
Chair: Dhabaleswar Panda, Ohio State
·
HOPE: Enabling Efficient Service
Orchestration in Software-Defined Data Centers
Yang
Hu (University of Florida), Chao Li (Shanghai Jiaotong
University), Longjun Liu (unaffiliated), and Tao Li
(University of Florida)
·
Towards an Adaptive
Multi-Power-Source Datacenter
Longjun
Liu and Hongbin Sun (Xi'an Jiaotong
University), Chao Li (Shanghai Jiao Tong University), Jingmin
Xin and Nanning Zheng
(Xi'an Jiaotong University), and Tao Li (University
of Florida)
·
GreenGear:
Leveraging and Managing Server Heterogeneity for Improving Energy Efficiency in
Green Data Centers
Xu Zhou and Qiang Cao (Huazhong university of science and technology), Hong Jiang (The
University of Texas at Arlington), Lei Tian (Tintri), and Changsheng Xie (Huazhong university of
science and technology)
·
Noise Aware Scheduling in Data
Centers
Hameedah
Sultan, Arpit Katiyar, and Smruti R. Sarangi (IIT Delhi)
·
Fast Multiplication in Binary
Fields on GPUs via Register Cache
Eli
Ben Sasson, Matan Hamilis, and Mark Silberstein (Technion)
and Eran Tromer (Tel Aviv
University)
Thursday, June 2, 2016
Keynote II: Innovative Applications and Technology Pivots - A Perfect Storm in Computing
Wen-mei
Hwu, University of Illinois Urbana-Champaign
Chair: Onur Mutlu
·
Abstract:
Since early 2000, we have been experiencing two very important developments in computing. One is that a
tremendous amount of resources have been invested into innovative applications such as first-principle based models,
deep learning and cognitive computing. The other part is that the industry has been taking a technological path where
application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism,
heterogeneity, and locality. Since then, most of the top supercomputers in the world are heterogeneous parallel computing systems.
New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development.
Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement.
What are the applications that continue to drive the technology development? How hard is it to program these systems today?
How will we programming these systems in the future? How will innovations in memory devices present further opportunities and challenges?
What is the impact on long-term software engineering cost on applications? In this talk, I will present some research
opportunities and challenges that are brought about by this perfect storm.
·
Bio:
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical
and Computer Engineering, University of Illinois at Urbana-Champaign. He is also CTO of MulticoreWare Inc.,
chief scientist of UIUC Parallel Computing Institute and director of the
IMPACT research group (www.crhc.uiuc.edu/Impact).
He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the
NSF Blue Waters Petascale supercomputer. For his contributions, he received the ACM SigArch Maurice Wilkes Award,
the ACM Grace Murray Hopper Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and
the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM.
Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.
Session 5A: GPUs and SIMD
Chair: Milind Kulkarni, Purdue
Room: Lalezar
·
Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on
GPU
Guoyang
Chen and Xipeng Shen (North
Carolina State University)
·
SFU-Driven Transparent
Approximation Acceleration on GPUs
Ang Li (Eindhoven University of Technology), Shuaiwen Leon Song (Pacific Northwest National Lab), Mark Wijtvliet (Eindhoven University of Technology), Akash Kumar (Technische Universität Dresden) and Henk Corporaal (Eindhoven University of Technology)
·
Reusing Data Reorganization for
Efficient SIMD Parallelization of Dynamic Irregular Applications
Peng Jiang, Linchuan Chen, and Gagan Agrawal (The Ohio State University)
Session 5B: Communication and
Coherence
Chair: Andreas Moshovos, Toronto
Room: Ayasofya
·
SReplay:
Deterministic Group Replay for One-Sided Communication
Xuehai
Qian (University of Southern California), Koushik Sen (University of
California Berkeley), and Paul Hargrove and Costin Iancu (Lawrence Berkeley National Laboratory)
·
Lynx: Using OS and Hardware
Support for Fast Fine-Grained Inter-Core Communication
Konstantina
Mitropoulou, Vasileios Porpodas, Dennis Zhang, and Timothy Jones (University of
Cambridge)
·
Efficient Timestamp-Based Cache
Coherence Protocol for Many-Core Architectures
Yuan
Yao (Zhejiang University), Guanhua Wang (National
University of Singapore), Zhiguo Ge
(Huawei International Pte. Ltd.), Tulika
Mitra (National University of Singapore), Wenzhi Chen (Zhejiang University), and Naxin
Zhang (Huawei International Pte. Ltd.)
Session 6A: Tools and Libraries
Chair: Sanyam Mehta, Cray
Room: Lalezar
·
BLASX: A High Performance Level-3
BLAS Library for Heterogeneous Multi-GPU Computing
Linnan
wang (UESTC and Georgia Institute of Technology), Wei
Wu (The University of Tennessee, Knoxville), Zenglin Xu (UESTC), Jianxiong Xiao
(Princeton), and Yi Yang (NEC Labs)
·
Peruse and Profit: Estimating
the Accelerability of Loops
Snehasish
Kumar (Simon Fraser University), Vijayalakshmi Srinivasan (IBM Research), and Amirali
Sharifian, Nick Sumner, and Arrvindh
Shriraman (Simon Fraser University)
·
Simulation and Analysis Engine for Scale-Out Workloads
Nadav Chachmon (Intel), Daniel Richins
(The University of Texas at Austin), Robert Cohn and Magnus Christensson
(Intel), Wenzhi Cui (The University of Texas at
Austin), and Vijay Janapa Reddi
(University of Texas at Austin)
Session 6B: Potpourri
Chair: Hao Wang, Virginia Tech
Room: Ayasofya
·
Proteus: Exploiting Numerical
Precision Variability in Deep Neural Networks
Patrick
Judd and Jorge Albericio (University of Toronto), Tayler Hetherington and Tor Aamodt
(University of British Columbia), and Natalie Enright Jerger
and Andreas Moshovos (University of Toronto)
·
Galaxyfly:
A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales
Interconnection Networks
Fei
Lei, Dezun Dong, Xiangke
Liao, Xing Su, and Cunlu Li (National University of Defence Technology)
·
Replichard:
Towards Tradeoff between Consistency and Performance for Metadata
Zhiying
Li, Ruini Xue, and Lixiang Ao (University of
Electronic Science and Technology of China)
Friday, 9:00am-10:00am
Session 7: Memory
Chair: Daniel Wong, UC Riverside
·
TokenTLB:
A Token-Based Page Classification Approach
Albert
Esteve (Department of Computer Engineering, Universitat Politècnica de València), Alberto Ros (Departamento de Ingeniería y Tecnología de Computadores,
Universidad de Murcia), and Antonio Robles, Maria Engracia
Gómez, and José Duato (Department of Computer
Engineering, Universitat Politècnica
de València)
·
Exploiting Private Local Memories
to Reduce the Opportunity Cost of Accelerator Integration
Emilio
G. Cota, Paolo Mantovani,
and Luca P. Carloni (Columbia University)
·
GCaR:
Garbage Collection aware Cache Management with Improved Performance for
Flash-based SSDs
Suzhen
Wu, Yanping Lin, and Bo Mao (Xiamen University) and
Hong Jiang (University of Texas at Arlington)
Session 8: Scheduling
Chair: Frank Mueller, NC State
·
Fairness-oriented OS Scheduling
Support for Multicore Systems
Changdae
Kim and Jaehyuk Huh (KAIST)
·
Scheduling Tasks with Mixed Timing
Constraints in GPU-Powered Real-Time Systems
Yunlong
Xu (Xi'an Jiaotong
University), Rui Wang (Beihang
University), Tao Li and Mingcong Song (University of
Florida), Lan Gao and Zhongzhi Luan (Beihang
University), and Depei Qian
(Xi'an Jiaotong University, Beihang
University)
·
CuMAS:
Data Transfer Aware Multi-Application Scheduling for Shared GPUs
Mehmet
E Belviranli, Farzad Khorasani, Laxmi N Bhuyan, and Rajiv Gupta (UC Riverside)
Session 9: Parallelism Issues
Chair: Didem Unat, Koc University
·
DSMR: A Parallel Algorithm for
Single-Source Shortest Path Problem
Saeed Maleki (University of Illinois at Urbana-Champaign), Donald
Nguyen and Andrew Lenharth (The University of Texas
at Austin), and Mari´a Garzarán
and David Padua (University of Illinois at Urbana-Champaign)
·
Parallel Transposition of Sparse
Data Structures
Hao Wang
(Virginia Tech), Weifeng Liu (University of
Copenhagen), and Kaixi Hou
and Wu-chun Feng (Virginia
Tech)
·
SARVAVID: A Domain Specific
Language for Developing Scalable Computational Genomics Applications
Kanak Mahadik, Chris Wright, Jinyi
Zhang, Milind Kulkarni, Saurabh Bagchi, and Somali Chaterji (Purdue University)
Session 10: Multiplication
Chair: Gagan Agarwal, Ohio State
·
Balanced Hashing and Efficient
GPU Sparse General Matrix-Matrix Multiplication
Pham
Nguyen Quang Anh, Rui Fan, and Wen Yonggang (School
of Comupter Engineering, Nanyang
Technological University)
·
Optimizing Sparse Matrix-Vector
Multiplication for Large-Scale Data Analytics
Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, Tai-Ching Tuan (IBM Research)
Session 11: Prefetching
Chair: Tobias Grosser, ETH Zurich
·
TurboTiling:
Leveraging prefetching to boost performance of tiled codes
Sanyam
Mehta, Rajat Garg, Nishad Trivedi, and Pen-Chung Yew
(University of Minnesota)
·
Graph Prefetching Using Data
Structure Knowledge
Sam
Ainsworth and Timothy M. Jones (University of Cambridge)
·
Prefetching techniques for
near-memory throughput processors
Reena Panda
(University of Texas at Austin), Yasuko Eckert, Nuwan
Jayasena, Onur Kayiran, and
Michael Boyer (AMD Research), and Lizy Kurian John (University of Texas at Austin)
Session 12: GPU Architecture
Chair: Ozcan Ozturk, Bilkent University
·
Origami: Folding Warps for
Energy Efficient GPUs
Mohammad Abdel-majeed (USC), Daniel Wong (UCR), Justin Kuang, and Murali Annavaram (USC)
·
Barrier-Aware Warp Scheduling for
Throughput Processors
Yuxi Liu
(Peking University / Shenzhen Institute of Advanced Technology, CAS), Zhibin Yu (Shenzhen Institute of Advanced Technology, CAS),
Lieven Eeckhout (Ghent
University, Belgium), Yingwei Luo
and Xiaolin Wang (Peking University), Zhenlin Wang (Michigan Tech University), Chengzhong Xu (Shenzhen Institute
of Advanced Technology, CAS / Wayne State University), and Vijay Janapa Reddi (UT Austin)
·
Tag-Split Cache for Efficient
GPGPU Cache Utilization
Lingda
Li and Ari B. Hayes (Rutgers University), Shuaiwen
Song (Pacific Northwest National Lab), and Eddy Zheng
Zhang (Rutgers University)