Media Summary: USHER: Holistic Interference Avoidance for DistServe: Disaggregating Prefill and Decoding for Goodput- Performance Interfaces for Hardware Accelerators Jiacheng Ma, Rishabh Iyer, Sahand Kashani, Mahyar Emami, Thomas ...
Osdi 24 Optimizing Resource Allocation - Detailed Analysis & Overview
USHER: Holistic Interference Avoidance for DistServe: Disaggregating Prefill and Decoding for Goodput- Performance Interfaces for Hardware Accelerators Jiacheng Ma, Rishabh Iyer, Sahand Kashani, Mahyar Emami, Thomas ... Llumnix: Dynamic Scheduling for Large Language Model Serving Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi ... Harmonizing Efficiency and Practicability: IronSpec: Increasing the Reliability of Formal Specifications Eli Goldweber, Weixin Yu, Seyed Armin Vakil Ghahani, and Manos ...
Identifying On-/Off-CPU Bottlenecks Together with Blocked Samples Minwoo Ahn and Jeongmin Han, Sungkyunkwan University; ... ServerlessLLM: Low-Latency Serverless Inference for Large Language Models Yao Fu, Leyang Xue, Yeqi Huang, and ... Data-flow Availability: Achieving Timing Assurance in Autonomous Systems Ao Li and Ning Zhang, Washington University in St. Characterizing Storage Workloads with Counter Stacks Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and ... Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, and ...