ICS 2016 Program

Workshop/Tutorial Program (Room: Ayasofya)

Tuesday, May 31, 2016

08:30AM-10:20PM

Tutorial - Simulation and Analysis Engine: A Framework for Full-System Simulation and Analysis of Large-Scale Workloads

Webpage

10:20AM-10:40AM

Break

10:40AM-12:30PM

Tutorial - Simulation and Analysis Engine: A Framework for Full-System Simulation and Analysis of Large-Scale Workloads

Webpage

12:30PM-1:30PM

Lunch

1:30PM-3:20PM

Tutorial - PGAS And Hybrid MPI+PGAS Programming Models On Modern HPC Clusters With Accelerators

Webpage

3:20PM-3:40PM

Break

3:40PM-5:30PM

Tutorial - PGAS And Hybrid MPI+PGAS Programming Models On Modern HPC Clusters With Accelerators

Webpage

Main Program (Room: Lalezar)

Wednesday, June 1, 2016
AM	8:30-9:00	Opening
	9:00-10:00	Keynote I: Performance = Bandwidth divided by Latency Yale Patt, The University of Texas at Austin Chair: Onur Mutlu
	10:00-10:30	Break
	10:30-11:30	Lightning Talks Chair: Onur Mutlu
	11:30-12:30	Session 1: Heterogeneous Systems Chair: Mark Silberstein, Technion
PM	12:30-1:30	Lunch (Room: Galata)
	1:30-2:30	Lunch Talk by Yale Patt on Education (Room: Galata) Chair: Onur Mutlu
	2:30-2:45	Short Break
	2:45-3:45	Session 2: Power, Energy, Variation Chair: Ulya Karpuzcu, Minnesota
	3:45-4:45	Session 3: NVMs & Persistent Memory Chair: Heon Yeom, Seoul National University
	4:45-5:15	Break
	5:15-6:55	Session 4 : Data Centers Chair: Dhabaleswar Panda, Ohio State
Thursday, June 2, 2016
AM	9:00-10:00	Keynote II: Innovative Applications and Technology Pivots - A Perfect Storm in Computing Wen-mei Hwu, University of Illinois Urbana-Champaign Chair: Onur Mutlu
	10:00-10:30	Break
	10:30-11:30	Session 5A: GPUs and SIMD Chair: Milind Kulkarni, Purdue Room: Lalezar	Session 5B: Communication and Coherence Chair: Andreas Moshovos, Toronto Room: Ayasofya
	11:30-12:30	Session 6A: Tools and Libraries Chair: Sanyam Mehta, Cray Room: Lalezar	Session 6B: Potpourri Chair: Hao Wang, Virginia Tech Room: Ayasofya
	12:30-13:30	Lunch
PM		Excursion: Old City Tour, Dinner at Bosphorus
	1:00-6:00	Old City Tour · Blue Mosque · Hagia (St.) Sophia · Hippodrome · Grand Covered Bazaar · Obelisk of Theodosius · Serpentine Column · Basilica Cistern
	7:00-8:00	Cocktail at Bosphorus
	8:00-11:30	Dinner at Bosphorus
	11:30-12:00	Transportation back to hotel
Friday, June 3, 2016
AM	9:00-10:00	Session 7: Memory Chair: Daniel Wong, UC Riverside
	10:00-10:30	Break
	10:30-11:30	Session 8: Scheduling Chair: Frank Mueller, NC State
	11:30-12:30	Session 9: Parallelism Issues Chair: Didem Unat, Koc University
PM	12:30-2:00	Lunch
	2:00-2:40	Session 10: Multiplication Chair: Gagan Agarwal, Ohio State
	2:40-3:40	Session 11: Prefetching Chair: Tobias Grosser, ETH Zurich
	3:40-4:15	Break
	4:15-5:15	Session 12: GPU Architecture Chair: Ozcan Ozturk, Bilkent University
	5:15-5:30	Closing Remarks

Wednesday, June 1, 2016

Wednesday, 9:00am-10:00am

Keynote I: Performance = Bandwidth divided by Latency
Yale Patt, The University of Texas at Austin

Chair: Onur Mutlu

· Abstract: Supercomputing has always required the ultimate in performance, and if you look at the evidence (e.g., the Famous Top 500), you would think "Performance" equals "Bandwidth." Unfortunately, it matters how long it takes to get something done. Otherwise, for example, it would be a win to do a minimal power 64 bit integer add by streaming the bits serially through a full adder and latch in 64 cycles. Clearly, latency matters. In this talk I would like to look at some of the opportunities for decreasing latency, and how the language designer, programmer, compiler writer, and microarchitect can all contribute to this aspect of Supercomputing performance.

· Bio: Yale Patt is Professor of Electrical and Computer Engineering and the Ernest Cockrell, Jr. Centennial Chair in Engineering at The University of Texas at Austin. He enjoys equally teaching freshmen, teaching graduate students, and directing the research of six PhD students in high performance computer implementation. He has, for more than 50 years, combined an active research program with extensive consulting and a strong commitment to teaching. The focus of his research is generally five to ten years beyond what industry provides at that point in time. His rationale has always been that he does not do revenue shipments, preferring to produce knowledge that will be useful to future revenue shipments and, more importantly, graduates who will design those future products. More information is available on his website:www.ece.utexas.edu/~patt.

Wednesday, 10:30am-11:30pm

Lightning Talks

Chair: Onur Mutlu

As part of the conference, this year, we will hold a lightning session. This session will be held after the first keynote on the first day of the conference, June 1. It is a single large session, so we expect almost all conference attendees to be present in this session.

The purpose of this session is to:

enable each paper's authors to present the "key idea" of their work in less than 60 seconds to almost all attendees.
enable the conference attendees to get a quick grasp of the key contributions of each of the papers (and perhaps decide whether or not to attend the associated conference talk).

For authors: We will follow up with more instructions on the lightning session later. We will ask you to submit your slides beforehand, in PDF, so that we can upload them to a single computer and run the session from that computer. Please plan on a presentation that is less than 60 seconds. The 60-second limit will be strictly enforced.

Wednesday, 11:30am-12:30pm

Session 1: Heterogeneous Systems

Chair: Mark Silberstein, Technion

· Polly-ACC: Transparent compilation to heterogeneous hardware

Tobias Grosser and Torsten Hoefler (ETH Zurich)

· Hybrid CPU-GPU scheduling and execution of tree traversals

Jianqiao Liu, Nikhil Hegde, and Milind Kulkarni (Purdue University)

· Exploiting Dynamic Reuse Probability to Manage Shared Last-level Caches in CPU-GPU Heterogeneous Processors

Siddharth Rai and Mainak Chaudhuri (Indian Institute of Technology Kanpur)

Wednesday, 1:30pm-2:30pm

Lunch Talk by Yale Patt on Education (Room: Galata)

Chair: Onur Mutlu

Wednesday, 2:45pm-3:45pm

Session 2: Power, Energy, Variation

Chair: Ulya Karpuzcu, Minnesota

· AEQUITAS: Coordinated Energy Management Across Parallel Applications

Haris Ribic and Yu David Liu (SUNY Binghamton)

· Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

Dimitrios Chasapis, Marc Casas, and Miquel Moretó (Barcelona Supercomputing Center), Martin Schulz (Lawrence Livermore National Laboratory), and Eduard Ayguadé, Jesus Labarta, and Mateo Valero (Barcelona Supercomputing Center)

· Variation Among Processors Under Turbo Boost in HPC Systems

Bilge Acun, Phil Miller, and Laxmikant V. Kale (University of Illinois at Urbana-Champaign)

Wednesday, 3:45pm-4:45pm

Session 3: NVMs & Persistent Memory

Chair: Heon Yeom, Seoul National University

· Mini-Ckpts: Surviving OS Failures in Persistent Memory

David Fiala and Frank Mueller (North Carolina State University), Kurt Ferreira (Sandia National Laboratories), and Christian Engelmann (Oak Ridge National Laboratory)

· High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

Nusrat Sharmin Islam, Md. Wasi-ur- Rahman, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda (The Ohio State University)

· Write-Aware Management for Fast NVM-based Memory Extensions

Amro Awad (North Carolina State University), Sergey Blagodurov (Advanced Micro Devices (AMD)), and Yan Solihin (North Carolina State University)

Wednesday, 5:15pm-6:55pm

Session 4: Data Centers

Chair: Dhabaleswar Panda, Ohio State

· HOPE: Enabling Efficient Service Orchestration in Software-Defined Data Centers

Yang Hu (University of Florida), Chao Li (Shanghai Jiaotong University), Longjun Liu (unaffiliated), and Tao Li (University of Florida)

· Towards an Adaptive Multi-Power-Source Datacenter

Longjun Liu and Hongbin Sun (Xi'an Jiaotong University), Chao Li (Shanghai Jiao Tong University), Jingmin Xin and Nanning Zheng (Xi'an Jiaotong University), and Tao Li (University of Florida)

· GreenGear: Leveraging and Managing Server Heterogeneity for Improving Energy Efficiency in Green Data Centers

Xu Zhou and Qiang Cao (Huazhong university of science and technology), Hong Jiang (The University of Texas at Arlington), Lei Tian (Tintri), and Changsheng Xie (Huazhong university of science and technology)

· Noise Aware Scheduling in Data Centers

Hameedah Sultan, Arpit Katiyar, and Smruti R. Sarangi (IIT Delhi)

· Fast Multiplication in Binary Fields on GPUs via Register Cache

Eli Ben Sasson, Matan Hamilis, and Mark Silberstein (Technion) and Eran Tromer (Tel Aviv University)

Thursday, June 2, 2016

Thursday, 9:00am-10:00am

Keynote II: Innovative Applications and Technology Pivots - A Perfect Storm in Computing
Wen-mei Hwu, University of Illinois Urbana-Champaign

Chair: Onur Mutlu

· Abstract: Since early 2000, we have been experiencing two very important developments in computing. One is that a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. The other part is that the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Since then, most of the top supercomputers in the world are heterogeneous parallel computing systems. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement. What are the applications that continue to drive the technology development? How hard is it to program these systems today? How will we programming these systems in the future? How will innovations in memory devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will present some research opportunities and challenges that are brought about by this perfect storm.

· Bio: Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. He is also CTO of MulticoreWare Inc., chief scientist of UIUC Parallel Computing Institute and director of the IMPACT research group (www.crhc.uiuc.edu/Impact). He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale supercomputer. For his contributions, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Thursday, 10:30am-11:30am

Session 5A: GPUs and SIMD

Chair: Milind Kulkarni, Purdue

Room: Lalezar

· Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU

Guoyang Chen and Xipeng Shen (North Carolina State University)

· SFU-Driven Transparent Approximation Acceleration on GPUs

Ang Li (Eindhoven University of Technology), Shuaiwen Leon Song (Pacific Northwest National Lab), Mark Wijtvliet (Eindhoven University of Technology), Akash Kumar (Technische Universität Dresden) and Henk Corporaal (Eindhoven University of Technology)

· Reusing Data Reorganization for Efficient SIMD Parallelization of Dynamic Irregular Applications

Peng Jiang, Linchuan Chen, and Gagan Agrawal (The Ohio State University)

Thursday, 10:30am-11:30am

Session 5B: Communication and Coherence

Chair: Andreas Moshovos, Toronto

Room: Ayasofya

· SReplay: Deterministic Group Replay for One-Sided Communication

Xuehai Qian (University of Southern California), Koushik Sen (University of California Berkeley), and Paul Hargrove and Costin Iancu (Lawrence Berkeley National Laboratory)

· Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication

Konstantina Mitropoulou, Vasileios Porpodas, Dennis Zhang, and Timothy Jones (University of Cambridge)

· Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures

Yuan Yao (Zhejiang University), Guanhua Wang (National University of Singapore), Zhiguo Ge (Huawei International Pte. Ltd.), Tulika Mitra (National University of Singapore), Wenzhi Chen (Zhejiang University), and Naxin Zhang (Huawei International Pte. Ltd.)

Thursday, 11:30am-12:30pm

Session 6A: Tools and Libraries

Chair: Sanyam Mehta, Cray

Room: Lalezar

· BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing

Linnan wang (UESTC and Georgia Institute of Technology), Wei Wu (The University of Tennessee, Knoxville), Zenglin Xu (UESTC), Jianxiong Xiao (Princeton), and Yi Yang (NEC Labs)

· Peruse and Profit: Estimating the Accelerability of Loops

Snehasish Kumar (Simon Fraser University), Vijayalakshmi Srinivasan (IBM Research), and Amirali Sharifian, Nick Sumner, and Arrvindh Shriraman (Simon Fraser University)

· Simulation and Analysis Engine for Scale-Out Workloads

Nadav Chachmon (Intel), Daniel Richins (The University of Texas at Austin), Robert Cohn and Magnus Christensson (Intel), Wenzhi Cui (The University of Texas at Austin), and Vijay Janapa Reddi (University of Texas at Austin)

Thursday, 11:30am-12:30pm

Session 6B: Potpourri

Chair: Hao Wang, Virginia Tech

Room: Ayasofya

· Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Patrick Judd and Jorge Albericio (University of Toronto), Tayler Hetherington and Tor Aamodt (University of British Columbia), and Natalie Enright Jerger and Andreas Moshovos (University of Toronto)

· Galaxyfly: A Novel Family of Flexible-Radix Low-Diameter Topologies for Large-Scales Interconnection Networks

Fei Lei, Dezun Dong, Xiangke Liao, Xing Su, and Cunlu Li (National University of Defence Technology)

· Replichard: Towards Tradeoff between Consistency and Performance for Metadata

Zhiying Li, Ruini Xue, and Lixiang Ao (University of Electronic Science and Technology of China)

Friday, June 3, 2016

Friday, 9:00am-10:00am

Session 7: Memory

Chair: Daniel Wong, UC Riverside

· TokenTLB: A Token-Based Page Classification Approach

Albert Esteve (Department of Computer Engineering, Universitat Politècnica de València), Alberto Ros (Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia), and Antonio Robles, Maria Engracia Gómez, and José Duato (Department of Computer Engineering, Universitat Politècnica de València)

· Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator Integration

Emilio G. Cota, Paolo Mantovani, and Luca P. Carloni (Columbia University)

· GCaR: Garbage Collection aware Cache Management with Improved Performance for Flash-based SSDs

Suzhen Wu, Yanping Lin, and Bo Mao (Xiamen University) and Hong Jiang (University of Texas at Arlington)

Friday, 10:30am-11:30am

Session 8: Scheduling

Chair: Frank Mueller, NC State

· Fairness-oriented OS Scheduling Support for Multicore Systems

Changdae Kim and Jaehyuk Huh (KAIST)

· Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems

Yunlong Xu (Xi'an Jiaotong University), Rui Wang (Beihang University), Tao Li and Mingcong Song (University of Florida), Lan Gao and Zhongzhi Luan (Beihang University), and Depei Qian (Xi'an Jiaotong University, Beihang University)

· CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs

Mehmet E Belviranli, Farzad Khorasani, Laxmi N Bhuyan, and Rajiv Gupta (UC Riverside)

Friday, 11:30am-12:30pm

Session 9: Parallelism Issues

Chair: Didem Unat, Koc University

· DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem

Saeed Maleki (University of Illinois at Urbana-Champaign), Donald Nguyen and Andrew Lenharth (The University of Texas at Austin), and Mari´a Garzarán and David Padua (University of Illinois at Urbana-Champaign)

· Parallel Transposition of Sparse Data Structures

Hao Wang (Virginia Tech), Weifeng Liu (University of Copenhagen), and Kaixi Hou and Wu-chun Feng (Virginia Tech)

· SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications

Kanak Mahadik, Chris Wright, Jinyi Zhang, Milind Kulkarni, Saurabh Bagchi, and Somali Chaterji (Purdue University)

Friday, 2:00pm-2:40pm

Session 10: Multiplication

Chair: Gagan Agarwal, Ohio State

· Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication

Pham Nguyen Quang Anh, Rui Fan, and Wen Yonggang (School of Comupter Engineering, Nanyang Technological University)

· Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics

Daniele Buono, Fabrizio Petrini, Fabio Checconi, Xing Liu, Xinyu Que, Chris Long, Tai-Ching Tuan (IBM Research)

Friday, 2:40pm-3:40pm

Session 11: Prefetching

Chair: Tobias Grosser, ETH Zurich

· TurboTiling: Leveraging prefetching to boost performance of tiled codes

Sanyam Mehta, Rajat Garg, Nishad Trivedi, and Pen-Chung Yew (University of Minnesota)

· Graph Prefetching Using Data Structure Knowledge

Sam Ainsworth and Timothy M. Jones (University of Cambridge)

· Prefetching techniques for near-memory throughput processors

Reena Panda (University of Texas at Austin), Yasuko Eckert, Nuwan Jayasena, Onur Kayiran, and Michael Boyer (AMD Research), and Lizy Kurian John (University of Texas at Austin)

Friday, 4:15pm-5:15pm

Session 12: GPU Architecture

Chair: Ozcan Ozturk, Bilkent University

· Origami: Folding Warps for Energy Efficient GPUs

Mohammad Abdel-majeed (USC), Daniel Wong (UCR), Justin Kuang, and Murali Annavaram (USC)

· Barrier-Aware Warp Scheduling for Throughput Processors

Yuxi Liu (Peking University / Shenzhen Institute of Advanced Technology, CAS), Zhibin Yu (Shenzhen Institute of Advanced Technology, CAS), Lieven Eeckhout (Ghent University, Belgium), Yingwei Luo and Xiaolin Wang (Peking University), Zhenlin Wang (Michigan Tech University), Chengzhong Xu (Shenzhen Institute of Advanced Technology, CAS / Wayne State University), and Vijay Janapa Reddi (UT Austin)

· Tag-Split Cache for Efficient GPGPU Cache Utilization

Lingda Li and Ari B. Hayes (Rutgers University), Shuaiwen Song (Pacific Northwest National Lab), and Eddy Zheng Zhang (Rutgers University)