Xinference: Making Large Model Serving Easy
Weizheng Lu, Lingfeng Xiong, Feng Zhang, Xuye Qin, Yueguo Chen
Abstract
The proliferation of open-source large models necessitates dedicated tools for deployment and accessibility. To mitigate the complexities of model serving, we develop Xinference, an open-source library designed to simplify the deployment and management of large models. Xinference effectively simplifies deployment complexities for users by (a) preventing users from writing code and providing built-in support for various models and OpenAI-compatible APIs; (b) enabling full model serving lifecycle management; (c) guaranteeing efficient and scalable inference and achieving high throughput and low latency. In comparative experiments with similar products like BentoML and Ray Serve, Xinference outperforms these tools and offers superior ease of use.Xinference is available at https://github.com/xorbitsai/inference.- Anthology ID:
- 2024.emnlp-demo.30
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Delia Irazu Hernandez Farias, Tom Hope, Manling Li
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 291–300
- Language:
- URL:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-demo.30/
- DOI:
- 10.18653/v1/2024.emnlp-demo.30
- Cite (ACL):
- Weizheng Lu, Lingfeng Xiong, Feng Zhang, Xuye Qin, and Yueguo Chen. 2024. Xinference: Making Large Model Serving Easy. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 291–300, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Xinference: Making Large Model Serving Easy (Lu et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.emnlp-demo.30.pdf