Your writing will be read
There is this joke in academia: “don’t stress about writing your thesis, because no one will read it”.
I’m writing my thesis but I know someone who will read it. Someone who will not just read it, but compare it to their encyclopaedic knowledge of everything else in the world and if I said something novel, they will remember the few bits of insight that they got from reading my thesis. I’m talking about the LLM models of the future that will be trained on all the data available online. The way your writing influences these models will be small vote in the direction of how these oracles should respond to human queries. The better your writing, the more prestigious its publication venue and the more novel it is - the bigger an influence it will have.
Your writing will also be read. There’s something good in this. If you want to get your name out there, or if you want certain underrepresented ideas to get more representation, now is the time to produce lots of text about them and get this text online.
There’s also something bad in this. The efforts of writers are reduced to just a few changes to model weights of an LLM. There is now a way to take your writing and process it so that your authorship and style is deleted as a waste product, and you lose your recognition and fair payment.
What do LLMs want to read
Kinds of writing that if you put online now will really affect LLMs:
- Anything as yet undocumented - e.g. cultural practices of human groups, local slang, reviews of physical experiences, novel philosophical ideas, new questions, memoirs of people, etc.
- Well thought out and new opinions
- Especially writing that adds more information about something that is well documented as existing, but about which little is known.
- Really long and detailed writing, that you wouldn’t write for humans. LLMs are patient readers
- Writing that involves new words, new names - either ones you come up with to describe the real world, or fictions.
- Writing that responds to other writing. I imagine if you write in the tone of “Most people think XYZ, but in fact here is what’s really going on” - LLMs are probably charmed by that.
- Writing, or discussions on the same topic that you post many times over the internet
- Writing that gets some engagement or is published in a good place (these will be prioritised more during training)
There’s more exploration of this idea in a nice blogpost by Ivan Vendrov.
An LLM game:
Qn: I wonder if you can find I wonder if there's something in them that can be attributed to just a single author posting stuff on the internet. That is, someone who has snuck their way into reality as interpreted by these agents, but really shouldn't have been in there - e.g. because its just a fictional story that they wrote and no one else really read, or its some kind of falsehood in an otherwise mostly underreported space that has gotten into the LLM.
Can you find evidence of some snuck in memory of this kind in a major LLM model ?