CEO and founder, Skymizer
Title: Connecting ONNX to Proprietary DLAs: An introduction to Open Neural Network Compiler
Date/Time: March 18 (Mon) 09:00-10:30, 10:45-12:15
This tutorial introduces a retargetable Open Neural Network Compilation (ONNC) framework that connects Open Neural Network eXchange (ONNX) models to proprietary deep learning accelerators. ONNX enables interchangeability among neural network models designed in different frameworks and has become a pervasive format supported by many high-tech titans and a community of researchers. ONNC is the first open source compiler project designed from ground up to support ONNX. The target audience of this tutorial includes researchers, graduate students, engineers, and whoever is interested in porting a compiler backend and implementing compiler optimization algorithms for deep learning hardware. The tutorial consists of four short talks. The first talk describes the top-level architecture and major design features of ONNC, how ONNC differentiates itself from other frameworks, and how users benefit from adopting ONNC as their compilation framework. For those who are interested in the broad view of ONNC, welcome to join us for 15-minute quick update on the latest ONNC progress. The second talk introduces the “Vanilla” backend in ONNC and demonstrates how fast porting to a new target DLA is carried out. For those who have demand for porting a new compiler backend to a designated target device, do not miss the chance to see ONNC backend porting at a glance. The third talk focuses on how to explore architecture design tradeoff and perform optimizations via the pass manager in ONNC. This topic is especially designed for those who like to use ONNC as a research framework to explore DLA design space for either commercial products or research topics. The last talk provides an opportunity to get your hands dirty building ONNC, running benchmarks, and playing around the source code with us. The last section is designed to get you started on ONNC programming. We welcome more engineers as well as researchers to join the ONNC open source community and make contribution to the project.
Luba Tang, CEO and founder, Skymizer. Luba Tang has 10+ years of compiler-related work experience. His research interests include electronic system level (ESL) design, compilers and virtual machines. His most recent work focus is on optimization algorithms for AI compiler and on architecture design for blockchain virtual machine. He was the original writer of Marvell iterative compiler; the software architect of the MCLinker project; the architect of the ONNC project; and the co-founder of the Lity Language Project.
Title: Neuromorphic Artificial Intelligence
Date/Time: March 18 (Mon) 09:00-10:30, 10:45-12:15
With hundreds of millions of dollars flowing this year into silicon developments for training and running artificial intelligence (AI) deep neural networks (DNNs) via Nvidia, Intel, Nervana, Qualcomm, Graphcore, ARM and dozens of others, it is worthwhile asking what is left to be done. Won't silicon AI follow the same course as GPUs and become more and more tailored to efficiently compute industrial AI?
This tutorial addresses this question from the context of neuromorphic engineering (NE), which takes its inspiration from the brain's organizing principles. What have these principles of using sparsity, local memory, time, and physics brought to the table? I will show historical development of both NE and AI, how ideas from neuroscience came into both (e.g. ReLU, local adaptation, max pooling), show recent developments of neuromorphic silicon from IBM, Intel, Zurich, and Manchester. I will also compare these with upcoming industrial AI digital accelerators, and then show principles of sparsity and local memory reuse can bring immediate benefit to both convolutional and recurrent DNNs implemented in synchronous logic without requiring a new memory hierarchy. Finally I will relate these ideas to event sensors, which our group has specialized in developing. I plan to include a live demonstration of some of these ideas.
Speaker BiographyTobi Delbruck (IEEE M’99–SM’06–F’13) received a Ph.D. degree from Caltech in 1993 as student of Carver Mead. He was in the first group of students for the newly founded Computation and Neural Systems program started by John Hopfield. He is currently a professor of physics and electrical engineering at ETH Zurich in the Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland, where he has been since 1998. The Sensors Group that he co-organizes with PD Dr. Shih-Chii Liu focuses on neuromorphic event-based sensors, sensory processing, and efficient deep neural network hardware architectures. He co-organizes the Telluride Neuromorphic Cognition Engineering summer workshop and has organized the live demonstration sessions at ISCAS and NIPS. Delbruck is past Chair of the IEEE CAS Sensory Systems Technical Committee. He worked on electronic imaging at Arithmos, Synaptics, National Semiconductor, and Foveon and has founded 4 spin-off companies, including inilabs.com, a community-oriented organization that has distributed R&D prototype neuromorphic sensors around the world. He has been awarded 9 IEEE awards.
Professor, Seoul National University
Title: Memory-Centric Chip Architecture for Deep Learning
Date/Time: March 18 (Mon) 13:15-14:45
Memory is a critical component in designing chips for deep learning in terms of energy consumption and area cost.
In this tutorial, we will first explain memory access behavior of state-of-the-art neural networks. Then, we will introduce recent works on neural network accelerators where memory accesses are optimized exploiting the memory access behavior. Specifically, we will focus on data reuse by broadcast and sparsity which reduce the frequency of memory accesses and low precision which reduces the bit width of each memory access.
Sungjoo Yoo received Ph.D. from Seoul National University in 2000. From 2000 to 2004, he was researcher at system level synthesis (SLS) group, TIMA laboratory, Grenoble France. From 2004 to 2008, he led, as principal engineer, system-level design team at System LSI, Samsung Electronics. From 2008 to 2015, he was associate professor at POSTECH. In 2015, he joined Seoul National University and is now full professor. His current research interests are software/hardware co-design of deep neural networks and machine learning-based optimization of computer architecture.
Andreas G. Andreou
Professor, Electrical and Computer Engineering, Center for Language and Speech Processing and Whitaker Biomedical Engineering Institute,
Johns Hopkins University
Title: BRAINWAY and Nano-Abacus Architecture: Brain-Inspired Cognitive Computing Using Energy Efficient Physical Computational Structures, Algorithms and Architecture Co-Design
Date/Time: March 18 (Mon) 13:15-14:45, 15:00-16:30
Since the invention of the integrated circuit -the chip in short- in the 1950’s, the microelectronics industry has seen a remarkable evolution from the centimeter scale devices created by Jack Kilby millimeter scale integrated circuits fabricated by Robert Noyce to today’s 5nm feature size MOS transistors. During this time, not only have exponential improvements been made in the scaling of size and the density of devices, but CAD and workstation technologies have advanced at a similar pace enabling the design of complete truly complex Systems On a Chip (SOC). The advances in the microelectronics industry have also enabled the proliferation of computational fields for bioinformatics, systems biology imaging and multi-scale multi-domain modeling. Semiconductor technology is contributing to the advancement of biotechnology, medicine and health care delivery in ways that it was never envisioned; from scientific grade CMOS imagers to silicon photomultiplier and ion sensing arrays. The stunning convergence of semiconductor technology and life science research is transforming the landscape of the pharmaceutical, biotechnology, and healthcare industries, signaling the arrival of personalized and molecular-level imaging diagnosis and treatment therefore speeding up the pace of scientific discovery, and changing the practice and delivery of patient care. Whether through tissue and organ imaging, Labs-on-Chip or genome sequences, biotechnology and modern medical diagnostics are generating a staggering amount of data stored in data centers! However, computing in data centers, the engines behind our insatiable desire for global communication, instant connectedness and interaction comes at an economic and environmental cost. Future projected needs in data centers are data intensive applications in Cognitive Computing Technology (CCT). CCT is the foundation of the Third Wave of AI aims at advancing intelligent software and hardware that can process, analyze, and distill knowledge from vast quantities of text, speech, images and biological data ultimately with and as much nuance and depth of understanding as a human would. To meet the scientific demand for future data-intensive CCT for every day mundane tasks such as searching via images to the uttermost serious health care disease diagnosis in personalized medicine , we urgently need a new cloud computing paradigm and energy efficient i.e. green technologies.
The BRAINWAY project in my lab is aimed at the design of an energy efficient Cognitive Multi-Processor Unit (CogMPU) that combines Ultra-Low-Voltage (ULV) circuit techniques with brain-inspired chip-multiprocessor network-on-chip (NoC) architecture. The design of the CogMPU architecture is based on the recently developed mathematical framework for architecture exploration and optimization . The computational principles  and architectural ideas in the BRAINWAY project have been embodied in the nano-Abacus SOC aimed at real-time processing, information extraction and prediction from in streaming Wide Area Motion Imagery (WAMI) data. Availability of full motion high resolution data over large, city-size, geographical areas, (100 square kilometers) offers unprecedented capabilities for situational awareness. The dynamic nature of the imagery offers insights about actions and patterns of activities that static images do not. Civilian applications of WAMI data allow for the monitoring and intelligent control of traffic across large geographical area and inference of a hierarchy of events and activities and ultimately to “life-patterns”. Additional applications include the coordination of activities in disaster areas and the monitoring of wildlife. In the nano-Abacus SOC, high performance and high throughput is achieved through approximate computing and fixed-point arithmetic in a variable precision (6 bits to 18 bits) architecture. The architecture implements a variety of processing algorithms in what we consider today as Third Wave AI and Machine Intelligence ranging from convolutional networks (ConvNets) to linear and non-linear morphological processing, probabilistic inference using exact and approximate Bayesian methods. The processing pipeline is implemented entirely using spike based neuromorphic computational primitives.
- System design considerations and architectures for compute in memory (CIM) and computational imagers.
- Algorithm-architecture co-design methodology and optimization for throughput, latency and or energy efficiency. We outline challenges and present a solution to the design of a multiprocessor system architecture using mathematical framework in  and .
- Mixed signal computational structures in 55nm GF and state of the art 16nm TSMC CMOS technology that might be more efficient for scientific computing and machine learning aimed at computational memory architectures.
- Digital stochastic computation and computational structures that compute with probabilities
- Digital morphological processing blocks for non-linear image processing.
- Physical Analog to probability converters, architecture, and mixed signal circuits using a Random Telegraph Noise physical random signal generator.
- Charge based mixed-signal circuits. Mixed signal vector-vector multiplier architecture that can yield a factor of 2X to 10X energy efficiency improvement over comparable optimized digital multiply-accumulator unit fabricated in the same technology.
 A. G. Andreou, “Johns Hopkins on the chip: microsystems and cognitive machines for sustainable, affordable, personalized medicine and health care (invited paper),” IEE Electronics Letters (special supplement on semiconductors for personalized medicine), pp. s34–s37, Dec. 2011. http://digital-library.theiet.org/dbt/dbt.jsp?KEY=ELLEAK&Volume=47&Issue=26
 A. S. Cassidy and A. G. Andreou, “Beyond Amdahl's Law: an objective function that links multiprocessor performance gains to delay and energy,” IEEE Transactions on Computers, vol. 61, no. 8, pp. 1110–1126, Aug. 2012.
 A. S. Cassidy, J. Georgiou, and A. G. Andreou, “Design of silicon brains in the nano-CMOS era: spiking neurons, learning synapses and neural architecture optimization,” Neural Networks, pp. 1–28, Jun. 2013.
Andreas G. Andreou is a professor of electrical and computer engineering, computer science and the Whitaker Biomedical Engineering Institute, at Johns Hopkins University. Andreou is the co-founder of the Johns Hopkins University Center for Language and Speech Processing. Research in the Andreou lab is aimed at brain inspired microsystems for sensory information and human language processing. Notable microsystems achievements over the last 25 years, include a contrast sensitive silicon retina, the first CMOS polarization sensitive imager, silicon rods in standard foundry CMOS for single photon detection, hybrid silicon/silicone chip-scale incubator, and a large scale mixed analog/digital associative processor for character recognition. Significant algorithmic research contributions for speech recognition include the vocal tract normalization technique and heteroscedastic linear discriminant analysis, a derivation and generalization of Fisher discriminants in the maximum likelihood framework. In 1996 Andreou was elected as an IEEE Fellow, “for his contribution in energy efficient sensory Microsystems.”
Assistant Professor, Arizona State University
Title: SRAM and RRAM based In-Memory Computing for Deep Learning: Opportunities and Challenges
Date/Time: March 18 (Mon) 15:00-16:30
Deep learning algorithms have been successful across many practical applications, but state-of-the-art algorithms are compute-/memory-intensive. To bring expensive algorithms to a low-power processor, a number of digital CMOS ASIC solutions have been previously proposed, but limitations still exist on memory access and footprint.To improve upon the conventional row-by-row operation of memories, several works recently demonstrated “in-memory computing” designs, which performs analog computation inside memory arrays (e.g. along the bitline) by asserting multiple or all rows simultaneously. This tutorial will present recent silicon demonstrations of in-memory computing for deep learning systems, based on both SRAM and denser resistive RAM (RRAM) fabrics. New memory bitcell circuits, array peripheral circuits, architectures, and optimizations for accurate deep learning acceleration will be covered. Promising opportunities of in-memory computing (e.g. large energy gains over digital ASIC), as well as particular challenges (e.g. variability) and new device/circuit design considerations will be discussed.
Jae-sun Seo received his Ph.D. degree from University of Michigan in 2010. From 2010 to 2013, he was with IBM T. J. Watson Research Center, where he worked on cognitive computing chip design for the DARPA SyNAPSE project. In January 2014, he joined Arizona State University as an assistant professor in the School of ECEE. His research interests are energy-efficient hardware design for deep learning and neuromorphic computing. During the summer of 2015, he was a visiting faculty at Intel Circuits Research Lab. He was a recipient of IBM Outstanding Technical Achievement Award in 2012 and NSF CAREER Award in 2017.