Remarkable : I don’t want to be part of this scene anymore

From the creator of wordfreq

Generative AI has polluted the data
I don’t think anyone has reliable information about post-2021 language usage by humans.
The open Web (via OSCAR) was one of wordfreq’s data sources. Now the Web at large is full of slop generated by large language models, written by no one to communicate nothing. Including this slop in the data skews the word frequencies.

 


CC-BY-NC