Media Summary: Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Chaofan Lin, Shanghai Jiao Tong University; Zhenhua ... Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences Neeraj Kumar, Pol Mauri Ruiz, ... Performance Interfaces for Hardware Accelerators Jiacheng Ma, Rishabh Iyer, Sahand Kashani, Mahyar Emami, Thomas ...
Osdi 24 Data Flow Availability - Detailed Analysis & Overview
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable Chaofan Lin, Shanghai Jiao Tong University; Zhenhua ... Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences Neeraj Kumar, Pol Mauri Ruiz, ... Performance Interfaces for Hardware Accelerators Jiacheng Ma, Rishabh Iyer, Sahand Kashani, Mahyar Emami, Thomas ... Llumnix: Dynamic Scheduling for Large Language Model Serving Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi ... Aragog: Scalable Runtime Verification of Distributed Middleboxes Nofel Yaseen, University of Pennsylvania; Behnaz Arzani and ... Identifying On-/Off-CPU Bottlenecks Together with Blocked Samples Minwoo Ahn and Jeongmin Han, Sungkyunkwan University; ...
Kamino: Efficient VM Allocation at Scale with Latency-Driven Cache-Aware Scheduling David Domingo, Rutgers University; Hugo ... Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, and ... Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents Qizheng Zhang, Stanford University; Ali Imran, ... Scality unveiled its new product, Scality ADI, or Autonomous High-throughput and Flexible Host Networking for Accelerated Computing Athinagoras Skiadopoulos, Zhiqiang Xie, and Mark ... DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Yinmin Zhong and ...
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models Yao Fu, Leyang Xue, Yeqi Huang, and ...