Computer Science Colloquium
Friday, November 17
2:35pm in Wege
Photonic Collective Communication for Distributed Machine Learning
Distributed ML training and inference requires intermediate model parameters on accelerators to be accumulated, reduced and transferred over the network between accelerators using collective communication primitives. This talk will focus on mechanisms that accelerate collective communication during distributed training and inference on multi-accelerator systems. We leverage novel server-scale photonic interconnects for inter-accelerator communication to tackle this challenge. We harness the programmability of photonics to implement circuit-switched connections between accelerators. We will develop efficient algorithms to leverage these photonic circuits for inter-accelerator collective communication.
Rachee Singh is an assistant professor of computer science at Cornell University. She develops systems and algorithms for programmable optical interconnects. Her work has been awarded a Google PhD fellowship and the SIGCOMM doctoral dissertation award. She was named a rising star in computer networking by N2Women and a rising star in EECS by UC Berkeley.