In this post, I review the major alternative models to ChatGPT and GPT-4. I assume you are familiar with OpenAI’s models and want to find out about the alternatives.
Baselines
GPT-4
Price per 1mil tokens:
Prompt: $30
Output: $60
Cost of processing 100 typical emails: $1.2 prompt + $1.2 output = $2.41
ChatGPT
Price per 1mil tokens: $2
Cost of processing 100 typical emails: ~$0.08
Example:
Contenders
Claude (instant) by Anthropic
Access: Restricted, need to request. But you can use the Slack bot or use Poe
Commercial use: Yes
Price per 1mil tokens:
Prompt: $1.63
Output: $5.51
Cost of processing 100 typical emails: ~$0.065 prompt + ~$0.1102 output = ~$0.1752
How to try: Request access, then can use as a Slack bot or via API.
My subjective review:
I tried it as a Slack bot, which I assume uses Claude Instant. Absolutely ChatGPT level if not better, but in rare cases the results are not as coherent. Has a significant advantage: usually outputs shorter and on-point results. It is less censored than ChatGPT and GPT-4. It does not respond with “As an AI language model I can’t do this…” Still, it will happily make a joke about a man, but won’t make a joke about a woman, just like ChatGPT.
Example:
Dolly V2 by Databricks
Access: Available, with weights
Commercial use: Yes
API: No, self-hosted
Price per 1mil tokens: Depends on how you run it2 .
Cost of processing 100 typical emails: See above. If you use Huggingface and the smallest GPU endpoint, you will probably process 100 emails in one hour, which will cost you $0.60. This is much more than ChatGPT. However, it might become feasible if you process much more emails in one hour.
How to try: collab
My subjective review:
Seems to be much worse at conversation than ChatGPT or Claude. There are complaints about incoherent responses. One major advantage is that you have the weights, so you can fine-tune this model. It’s quite small: 12 billion parameters versus GPT-3’s 175 billion.
Example:
Alpaca by Meta
Access: Restricted, but you can torrent it
Commercial use: No
API: No
Self-hosted: Yes
Price per 1mil tokens: Depends on how you run it
Subjective review:
I only tried the quantized Alpaca.cpp 4B, the smallest and fastest. The responses are very coherent. The model can be fine-tuned. Unfortunately, the license is very restrictive, so practically you can only use it for research and personal things. The model works very fast on my Macbook M1. I think it gives the best answer to the bicycle question. On controversial topics it just answers, will makes jokes about both men and women and such.
Example:
StableLM tuned alpha 7B by Stability
Access: Free
Commercial use: Yes
API: No
Self-hosted: Yes
Price per 1mil tokens: Depends on how you run it
How to try: collab
OpenAssistant Pythia 12B by LAION AI
Access: Free
Commercial use: Yes
API: No
Self-hosted: Yes
Price per 1mil tokens: Depends on how you run it
How to try: official website or Huggingface
Subjective review:
Very coherent. Not very censored. An official GUI is provided, which makes trying much easier. Overall can be a useful assistant on par with ChatGPT and GPT-4.
Example:
ChatRKW
Access: Free
Commercial use: Yes
API: No
Self-hosted: Yes
Price per 1mil tokens: Depends on how you run it
How to try: Huggingface
Subjective review:
Quite coherent. Given the relatively small size of 7B can be an effective solution for own projects.
Example:
I assume a typical email to contain prompt 400 tokens and the output summary 200 tokens.
Huggingface inference is $0.06 per hour.
Nice article Boris. Thanks for putting it up
how much did you play with rwkv ? To my eyes it's so much more elegant than transformers but I've been disappointed by it on "real" issue around data extraction. It seems to be much more stochastic parroty, but maybe it boils down to the training more so than anything.