Two easiest ways to solve a task with LLMs: ChatGPT API or Huggingface Inference Endpoints. Both provide an API you can call from your code to get model outputs. Let’s see which is cheaper.
ChatGPT charges $2 for 1 million tokens. Let's translate it into something understandable: email summarization. Let one email be 400 prompt tokens and 200 exit tokens. Then processing 100 emails will cost $0.08.
How much does it cost to do the same on Huggingface? Here is how Huggingface takes money for Inference endpoints:
1. Upon request, an endpoint with a model is raised. You can choose one of the configurations ranging from $0.6 to $45 per hour for a GPU instance.
2. It counts how many hours the endpoint was used. As far as I understand, the number of hours is rounded up. That is, if I make an instance and use it for a minute then I owe money for the whole hour. You need to delete endpoints yourself, otherwise, they will quietly eat up your money.
3. At the end of the month, all costs are summed up and deducted.
100 emails can definitely be processed within an hour. If the simplest GPU endpoint is used, which is enough to fit a small Alpaca model, then it will cost $0.6, which is almost 10 times more expensive than ChatGPT.
It seems like ChatGPT is much better. However, the tables turn if you need to process more than 100 emails. For example, 10,000 emails will cost $8 using ChatGPT. If you can process the same amount within an hour using Huggingface, it will be almost 10 times cheaper.
I measured: Huggingface inference API processes a typical email in about 3 seconds. You can process 1200 emails per hour. Processing these emails with ChatGPT costs $0.96 versus $0.6 per Huggingface instance hour. Huggingface wins.
Take away: most likely using Huggingface is cheaper when you have a constant flow of tasks,, while ChatGPT is cheaper for one-off computations.