Synthetic.news
Problem or curiosity
Could GPT-2 be turned into a small engine for making fake environmental blog posts and news articles from nothing more than a headline?
Around 2019 and 2020, before "AI slop" had become the internet's default insult for synthetic content, I was very into generating exactly that kind of thing on purpose. The interesting question was not whether the articles were publishable. The interesting question was how far a small fine-tuned language model could carry the shape, rhythm, and confidence of niche publishing.
What was built
I scraped a set of large environmental and sustainability sites, including places in the Treehugger / Grist neighborhood, then fine-tuned GPT-2 on that corpus. The result was a headline-conditioned article generator: type in a title, get back an article that sounded like it had wandered out of an eco-news content desk.
The project eventually lived around the Carbon Canary / synthetic-news publishing experiment. The source record is rawkintrevo/www.carbon-canary.com.
The fun part, in retrospect, is that this was a complete little content operation: scraping, corpus shaping, model fine-tuning, headline prompting, generated drafts, and a site surface where some of the results could be published or reviewed.
What it proved
It proved that the thin line between "surprisingly plausible" and "obviously fake" is editorial, not just technical. GPT-2 could learn the cadence of the genre. It could produce a lot of words very quickly. Some of them were bad. Some were funny. A few were almost good, in the uncomfortable way that synthetic media can be almost good.
The useful lesson was the same one that keeps coming back in later agent work: generation is cheap; judgment is the scarce layer. The workflow around the model matters more than the fact that the model can produce paragraphs.
What did not survive
Earlier this year I took a bunch of the generated articles down during housekeeping. That was probably the responsible move and also a little sad, because some of those articles were genuinely funny to revisit.
The old publishing surface is historical now, and the project belongs to a specific GPT-2 moment: pre-ChatGPT, pre-content-farm panic, and before everyone had a shared vocabulary for synthetic slop.
Why it matters
Synthetic.news is useful evidence because it shows a long-running interest in the operational parts of generation: sourcing data, shaping prompts, creating reviewable outputs, and deciding what deserves to be shipped, archived, or deleted.
It is also a good reminder that "can generate" is not the same as "should publish." That distinction is basically the whole job.