how much did you play with rwkv ? To my eyes it's so much more elegant than transformers but I've been disappointed by it on "real" issue around data extraction. It seems to be much more stochastic parroty, but maybe it boils down to the training more so than anything.
Admittedly I did not play much with it. I would expect their flagman model trained on Pile and finetuned (https://huggingface.co/BlinkDL/rwkv-4-raven) to behave roughly like the transformer models. If there is a significant difference while the parameters and data are roughly same then perhaps it tells us something about RNNs vs transformers.
What was the request when it disappointed you? We can run it through all these models and see what happens
Nice article Boris. Thanks for putting it up
how much did you play with rwkv ? To my eyes it's so much more elegant than transformers but I've been disappointed by it on "real" issue around data extraction. It seems to be much more stochastic parroty, but maybe it boils down to the training more so than anything.
Admittedly I did not play much with it. I would expect their flagman model trained on Pile and finetuned (https://huggingface.co/BlinkDL/rwkv-4-raven) to behave roughly like the transformer models. If there is a significant difference while the parameters and data are roughly same then perhaps it tells us something about RNNs vs transformers.
What was the request when it disappointed you? We can run it through all these models and see what happens
Already tried those and a few others.
Labeled data is somewhat proprietary and the validation complex I'm afraid.
The results are: ai.eurekahealth.com
But I think for now gpt-4 is so above everything else there's not much use in doing anything else.
I'm hopeful anthropic will catch up soon and on 1&2 years timelines I think more cpu friendly architectures might come to dominante