4 Comments

Nice article Boris. Thanks for putting it up

Expand full comment

how much did you play with rwkv ? To my eyes it's so much more elegant than transformers but I've been disappointed by it on "real" issue around data extraction. It seems to be much more stochastic parroty, but maybe it boils down to the training more so than anything.

Expand full comment
author
Apr 22, 2023·edited Apr 22, 2023Author

Admittedly I did not play much with it. I would expect their flagman model trained on Pile and finetuned (https://huggingface.co/BlinkDL/rwkv-4-raven) to behave roughly like the transformer models. If there is a significant difference while the parameters and data are roughly same then perhaps it tells us something about RNNs vs transformers.

What was the request when it disappointed you? We can run it through all these models and see what happens

Expand full comment
Apr 22, 2023Liked by Boris Tseitlin

Already tried those and a few others.

Labeled data is somewhat proprietary and the validation complex I'm afraid.

The results are: ai.eurekahealth.com

But I think for now gpt-4 is so above everything else there's not much use in doing anything else.

I'm hopeful anthropic will catch up soon and on 1&2 years timelines I think more cpu friendly architectures might come to dominante

Expand full comment