Updates, ideas and resources

Tags

published 1 year ago by devthanos

LLMs

author

devthanos

date

‣

slug

llms

status

Published

tags

Introduction

summary

Large Language Models

contentType

externalBlogLink

type

en

highlights

isCoverOnMain

isCoverOnMain

isFitCoverImage

isFitCoverImage

isNotice

isNotice

isVerified

isVerified

license

link

paymentTypes

price

coverRatio

lng

EN

Base Model	Variants	Description
OpenAI	gpt-3.5-turbo gpt-4 (coming soon)	ㅤ
Llama	meta-llama/Llama-2-70b-chat-hf meta-llama/Lllam-2-13b-chat-hf meta-llama/Llama-2-7b-chat-hf	Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. Llama 2 comes in a range of parameter sizes --- 7B, 13B, and 70B --- as well as pretrained and fine-tuned variations.
Llama	codellama/CodeLlama-34b-Instruct-hf codellama/CodeLlama-70b-Instruct-hf	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the 70B parameter version, fine tuned for instructions. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.
mistral	mistralai/Mistral-7B-Instruct-v0.1	The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets. This model supports function calling and JSON mode.
mistral	mistralai/Mixtral-8x7B-Instruct-v0.1	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
mistral	Open-Orca/Mistral-7B-OpenOrca	We have used our own OpenOrca dataset to fine-tune on top of Mistral 7B. This dataset is our attempt to reproduce the dataset generated for Microsoft Research's Orca Paper. We use OpenChat packing, trained with Axolotl. This release is trained on a curated filtered subset of most of our GPT-4 augmented data. It is the same subset of our data as was used in our OpenOrcaxOpenChat-Preview2-13B model. HF Leaderboard evals place this model as #1 for all models smaller than 30B at release time, outperforming all other 7B and 13B models! This release provides a first: a fully open model with class-breaking performance, capable of running fully accelerated on even moderate consumer GPUs. Our thanks to the Mistral team for leading the way here. We affectionately codename this model: "MistralOrca"
mistral	HuggingFaceH4/zephyr-7b-beta	Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). We found that removing the in-built alignment of these datasets boosted performance on MT Bench and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so.