AI Writing

35% of the New Web Is Now AI. Here's What That Does to Anyone Citing It.

AI articles are citing AI articles. New 2026 research reveals how the citation loop works, what it does to search results, and how to trace real sources.

Milton Overton

07 Apr 2026 — 12 min read

By mid-2025, 35% of newly published websites were AI-generated. Inside the citation loop quietly rewriting the internet and how to spot it.

You can spot the article almost before you finish the first paragraph.

It opens with a sentence that sounds polished but says very little. It explains the topic in the safest possible way. It offers the same five tips you have already seen on three other sites. The examples are vague. The sources, if there are any, are usually secondary. Nothing about the page suggests the writer tested the advice, interviewed anyone, looked at primary data, or had a reason to publish beyond ranking for a search term.

For a long time, that kind of writing was just called low-quality SEO content. Now it has a much faster engine behind it.

The internet has not gone entirely synthetic. That would be too tidy. Reporters, researchers, teachers, students, and specialists still publish useful work every day. But the newly published web is changing fast. A growing share of articles, explainers, listicles, news summaries, study guides, and how-to pages are being generated or heavily assisted by AI.

The real problem is not that AI can write an article. The real problem is that AI can write millions of articles that look like information without adding much of it.

That is where the AI citation loop begins. One synthetic article summarizes another. A second site rewrites both. A fourth cites the third. A student searching before a deadline lands on a clean-looking page and assumes it is reliable. A chatbot trained or retrieved on that content repeats the claim months later. Over time, the internet starts to feel bigger and smaller at the same time. Bigger because there are more pages. Smaller because so many of them say the same thing.

Here is what the most recent research actually shows about how large the AI content problem has become, and why students, in particular, need new habits when they use online articles as sources.

What "Fake AI Articles" Actually Means

The phrase needs handling carefully, because not every AI-assisted article is fake.

A writer might use AI to clean up grammar, summarize notes, organize a draft, or make a difficult explanation easier to read. That does not automatically make the final article dishonest. In most workplaces and classrooms, AI has quietly become part of the normal writing process.

A fake AI article is something narrower: a page that presents itself as a useful human-created resource while showing almost no evidence of human judgment, original research, first-hand experience, expert review, or source verification. It may not disclose AI use. It may cite weak sources. It may repeat claims from other ranking pages without checking where they came from. In worse cases, it includes fabricated facts, fake author profiles, or references to studies that do not exist.

That distinction matters because the internet is not facing one AI content problem. It is facing several at once: AI-assisted writing where a human is still thinking, AI-generated content where the machine produces most of the page, AI content farms publishing thousands of pages mainly to capture traffic, and hallucinated citations entering academic work because no one verified them.

These are not the same. But they all push the web in the same direction: more text, less certainty.

The New Web Is Becoming More Synthetic

The strongest recent estimate comes from an academic paper, The Impact of AI-Generated Text on the Internet, in which researchers built a representative sample of websites published between 2022 and 2025 using the Internet Archive, then ran AI text detection across the sample to estimate how much newly published content was AI-generated or AI-assisted.

The headline finding was stark: by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted. Before ChatGPT launched in late 2022, the figure was essentially zero.

That does not mean 35% of the entire internet is fake. It means the newly created web has changed quickly, and the distinction matters. The old internet still contains decades of human-written pages, forums, institutional resources, books, archives, and news coverage. But students searching today, especially on common topics, are mostly exposed to recently published pages. Recently published pages are exactly where AI content is rising fastest.

The same study found something more subtle. The increase in AI-generated text was associated with lower semantic diversity and more positive sentiment. In plain English, AI-heavy writing is making parts of the web sound more similar and more artificially upbeat. The researchers did not find strong evidence that AI text was directly less factually accurate, which complicates the lazy "AI always lies" framing.

The bigger problem may be sameness.

A page can be mostly accurate and still useless. It can contain no obvious lie and still add nothing. It can define the five-paragraph essay, explain plagiarism, recommend planning ahead, and tell students to proofread carefully. None of that is false. But when 500 sites publish versions of the same advice, the internet gains pages without gaining knowledge.

That is the quiet danger of synthetic content. It does not need to be wrong to make the web worse. It only needs to crowd out original work.

Search Results Are Not Immune

A 2026 research project called DeGenTWeb looked specifically at what its authors called LLM-dominant websites - sites where most content appears to have been generated by language models with little human input.

The researchers classified roughly 100,000 sites from Common Crawl and roughly 20,000 sites surfaced through Bing results for 10,000 how-to queries. The share of LLM-dominant sites in Common Crawl rose from 2.1% in late 2022 to 29.4% in early 2025.

That is a significant shift in under three years.

The search data is even more relevant for everyday users. In 46.6% of the Bing how-to queries studied, at least one of the top 10 results pointed to an LLM-dominant site. For the top 20, the figure rose to 65.7%.

How-to queries are exactly where most students begin. How do I write an essay introduction? How do I cite a website? How do I summarize an article? How do I make my writing sound more natural? Low-stakes searches, but they shape how students write, research, and think about sources. If a student repeatedly lands on pages that are synthetic, shallow, or copied from other shallow pages, nothing visibly goes wrong. The page looks clean. The headings are organized. The advice feels familiar.

The problem is that familiar advice feels trustworthy even when no one has done the work behind it.

Search engines still surface excellent material, and the DeGenTWeb findings are specific to Bing how-to queries rather than the entire web. But they make one thing clear: AI-heavy sites are not sitting unseen in forgotten corners of the internet. They are showing up in normal search results for ordinary questions.

The Content Farm Model Has Adapted to AI

AI content farms are not lone bloggers experimenting with new tools. They are publishing operations.

NewsGuard's AI Tracking Center identified 3,006 AI content farm news and information websites as of March 2026, spanning 16 languages. These sites typically have generic names, publish at high volume, and present themselves in ways that ordinary readers may interpret as legitimate news or information sites.

The business model is not mysterious. Publishing used to require writers, editors, and time. AI compresses all three. A site can generate articles around trending searches, news cycles, celebrity rumors, health questions, finance topics, or education queries. If even a small fraction of pages rank, get shared, or surface in recommendation feeds, ad revenue follows.

This is where the term "AI slop" has caught on. It captures the feeling of endless generated material poured into search results and feeds. But "slop" can make the issue sound unserious. The economics are not unserious. As long as low-cost synthetic content can attract traffic, the incentive to keep making it is structural.

That incentive does not care whether the article helps anyone. It cares whether the page gets impressions.

Education content is especially easy to automate, because so much of it follows predictable formats. A content farm can churn out pages on essay topics, grammar rules, citation formats, book summaries, scholarship advice, admissions essays, and study habits. Some pages may be harmless. Some may be useful at a basic level. But every student has to ask a harder question: who is behind this page, and why should I trust them?

A real teacher can explain what students usually misunderstand. A real researcher can point to primary sources. A real writer can say what worked in practice and what failed. A synthetic article rarely does any of that. It sounds helpful because the language is smooth.

Smoothness is not the same as authority.

Google's Response Shows the Scale of the Problem

Google's March 2024 search update is another important data point. Google described the update as an effort to reduce low-quality, unoriginal content and identify pages that feel created for search engines rather than people. After rollout, Google said users would see 45% less low-quality, unoriginal content in search results.

That update was not aimed solely at AI content. Google's own language focused on scaled content abuse, expired domain abuse, site reputation abuse, and pages created mainly to match search queries without offering real value. Some of that content is AI-generated. Some is human-generated. Most is a mix.

That is the point.

The internet's quality problem is not really "AI versus human." It is original versus unoriginal. Useful versus mass-produced. Verified versus recycled. Written for readers versus written for traffic. AI has simply made the old SEO spam problem faster and cheaper. A site no longer needs a writing team to publish hundreds of pages; an operator can batch-produce article-like pages at speeds that make older content farms look slow.

Search engines can fight this, but they are in a difficult position. They have to index the web rather than pre-approve every sentence. They have to reward helpful content while avoiding false accusations against legitimate publishers. They have to detect scaled manipulation even when the output has been edited to avoid obvious AI fingerprints.

That is hard. For students, the practical lesson is simpler: ranking on Google is not the same as being reliable.

The Citation Problem Is Worse Than Bad Blog Posts

The clearest evidence of AI's reliability problem may come from academic citations, not blog posts.

A 2026 paper, LLM hallucinations in the wild, audited 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. The authors estimated that 146,932 hallucinated citations appeared in 2025 alone.

A hallucinated citation is not just a bad sentence. It is a fake doorway. It points to a source that does not exist, or whose details have been distorted enough that the reference cannot be verified. In academic writing, that is a serious failure - citations are how claims get checked.

This is the most visible version of the citation loop.

An AI tool generates a plausible but fake reference. A writer fails to verify it. The paper or preprint goes online. Someone copies the citation. A database indexes the paper. A future AI system sees the reference pattern and treats it as part of the knowledge environment. The false citation becomes harder to remove because it no longer lives in only one place.

The same pattern can happen with web articles, even when the citations are not formally academic. A blog post cites a weak article. Another blog rewrites the blog post. A content farm summarizes the rewritten version. A student cites the content farm. A chatbot uses retrieved pages to answer a similar question later. Each step adds distance from the original source. Eventually the claim still sounds confident, but nobody knows where it actually came from.

That is why fake AI articles are not just annoying. They damage traceability. Good writing lets you follow the chain of evidence. Bad synthetic writing hides the chain, breaks it, or replaces it with the appearance of research.

Why This Matters More for Students

Students are not just passive readers of the internet. They are asked to use it as a source machine.

They search for background context. They look up definitions. They read explainers before writing essays. They use articles to understand unfamiliar topics. They collect citations. They compare arguments. They do most of this under deadline pressure, often with limited experience judging whether a source is strong or weak.

That is a difficult combination. The students most likely to rely on weak online sources are often the ones who most need clear guidance. A confident-looking AI-generated article can feel safer than a dense academic paper. It is easier to read. It uses headings. It gives quick answers. It seems neutral. But if it has no real author expertise, no primary sources, no original reporting, and no clear editorial process, it is a poor foundation for serious work.

This does not mean students should avoid the internet. That is unrealistic. It means students need a different habit: do not treat an article as reliable just because it is readable.

Readability is only the first layer. After that, check the author, the publication, the sources, the date, the evidence, and the specificity of the claims. A useful article usually leaves fingerprints of real work. It names studies clearly. It links to primary sources. It uses examples that do not feel generic. It acknowledges uncertainty. It does not pretend every topic has five simple tips.

Fake AI articles often show the opposite pattern. They are broad, smooth, repetitive, and strangely empty. They explain without proving. They cite without tracing. They conclude without adding judgment.

How to Spot an AI Citation Loop

The fastest way to spot a citation loop is to follow one claim backward.

Take a sentence from an article and ask: where did this actually come from? If the article links to another article, open it. If that source links to a third, open that too. After two or three clicks, you should arrive at something solid. a study, a report, a government page, an original interview, a dataset, a book, or a named expert.

If the chain never reaches a primary source, slow down. Another warning sign is repeated phrasing across many websites. If five articles use nearly identical wording, structure, and examples, they are likely drawing from the same generic source pool. That does not automatically make the information false, but it does mean you have not found independent confirmation.

Watch for fake precision, too. AI-generated articles often use numbers that look serious but are not clearly sourced. A statistic without a source is not a statistic. It is decoration.

Author pages matter as well. A real byline does not guarantee accuracy, but a missing author, a vague editorial team, or a suspiciously generic bio should make you slow down. The same applies to sites that publish across unrelated topics at unusual speed. A page that posts about health, crypto, celebrities, education, legal advice, and products all day long is usually built for traffic, not expertise.

The strongest sources usually have friction. They are not always the easiest to read. They include caveats. They explain methods. They show where the data came from. They sometimes say we do not know. That is not weakness. That is a sign that a human being is still thinking.

The Future of Writing Is Not AI-Free. It Has to Be More Honest.

It would be naive to say students should never use AI. That is not where the world is going. AI is already inside search, writing software, email, note-taking, research workflows, and classroom debate. The better question is not whether AI touched a piece of writing. The better question is whether a human took responsibility for the final work.

That responsibility includes checking sources. It includes adding original thought. It includes replacing generic phrasing with real reasoning. It includes making sure the final essay sounds like a person who understood the material, not a machine that arranged the most likely sentences.

This is where writing tools have to evolve. Students do not only need faster drafts. They need help turning drafts into something clearer, more personal, more readable, and more honest. Tools like EssayTone fit into that newer workflow when they are used to refine AI-assisted writing into something that sounds more natural and human, not to skip the thinking, but to make sure the final voice on the page is actually yours.

The difference matters.

Using AI to avoid thinking creates more synthetic noise. Using AI carefully, then checking sources and rewriting in your own voice, can still support good writing. The internet does not need more articles that sound finished. It needs more writing that is actually grounded in something.

AI Didn't Kill the Internet. It's Doing Something Worse.

The internet is not dead. But parts of it are being padded. Recent research suggests that a large share of newly published websites and articles now contain AI-generated or AI-assisted text. LLM-dominant sites are appearing in ordinary search results. AI content farms are publishing at scale across languages. Google has changed its search systems to reduce low-quality, unoriginal content. Researchers are finding hallucinated citations entering academic literature at measurable scale.

Together, the evidence points to a clear trend: the web is becoming easier to fill and harder to verify.

For students, that means source judgment is no longer optional. A polished article is not enough. A ranking page is not enough. A confident tone is not enough. The real question is whether the information can be traced back to people, evidence, and original work.

The danger of fake AI articles is not that they all contain false claims. It is that they make weak information look finished. They turn summaries into sources. They turn repetition into authority. And when enough pages copy each other, they create a citation loop that becomes harder to see from the outside.

The answer is not panic. The answer is better habits. Follow the source chain. Prefer primary evidence. Be skeptical of generic advice. Check whether the author has real expertise. Look for original data, not just clean formatting. And when you use AI in your own writing, do not let it flatten your work into the same voice everyone else is publishing.

The future of the internet will not be purely human or purely AI. It will be mixed. The only question worth asking is whether that mix produces more knowledge, or just more text.