Small LLMs aka SLMs

smoler the better

Created: Sep 17, 2023 by Pradeep Gowda Updated: May 10, 2024 Tagged: smol-llm · llm · slm

LLMs that you can run on the desktop or a “regular(ish) PC”.

A look at Apple’s new Transformer-powered predictive text model

the model being used by AppleSpell, an internal macOS application that checks for spelling and grammar mistakes as you type.

found the predictive text model in /System/Library/LinguisticData/RequiredAssets_en.bundle/AssetData/en.lm/unilm.bundle. The bundle contains multiple Espresso model files that are used while typing (Espresso appears to be the internal name for the part of CoreML that runs inference on models).

a set of 15,000 tokens in unilm.bundle/sp.dat that pretty clearly look like they form the vocabulary set for a large language model.

Read the rest of the above blog post to see how the tokenizer works, model architecture (GPT-2?) of about 34M parameters and hidden size of 512 units, which makes it smaller than GPT-2 models.

Orca 2: Teaching Small Language Models How to Reason - Microsoft Research; see

M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4 via

moondream

moonbeam is a computer-vision model can answer real-world questions about images. It’s tiny by today’s models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.

Apache 2.0. You can use moondream for commercial purposes.

Applications:

Security
Drone and Robotics
Retail and shopping –

Prem 1B and Prem 1B chat

apache 2.0 license
“Our goal is to create models that excel at RAG. Since RAG works by processing information at runtime, the main constraint is LLM size. For RAG, models don’t need to be huge; they just need strong text comprehension to give accurate answers when provided with the right context.”
blog post: SLM Journey Unveiled – “In recent months, the landscape of language models has been enriched by the emergence of several small language models (e.g. TinyLlama, Phi2, Gemma, and StableLM2)”

MacOS desktop

Quantized Gemma 2B running at 157 toks/sec in MLX on M1 Max laptop

Phone

“phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.” Abdin, Marah, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, et al. “Phi-3 technical report: A highly capable language model locally on your phone,” 2024. https://arxiv.org/abs/2404.14219. (No code, or model was announced with the paper).

aiOS™ by Hyperspace “Organizing the World’s AI Agents. Join the world’s largest peer-to-peer AI network and start earning points”