Opensource LLM

Created: Dec 01, 2023 by Pradeep Gowda Updated: May 09, 2024 Tagged: llm · opensource

Appears to be an exhaustive survey paper – Chen, Hailin, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, and Shafiq Joty. “ChatGPT’s one-year anniversary: Are open-source large language models catching up?” 2023. https://arxiv.org/abs/2311.16989.

Zhuang, Xiaomin, Yufan Jiang, Qiaozhi He, and Zhihua Wu. “ChuXin: 1.6B technical report,” 2024. https://arxiv.org/abs/2405.04828.

ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. The weights for both models are available at Hugging Face to download and use.