<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Temporal Perspective]]></title><description><![CDATA[Reflecting on the past, analyzing the present, and pondering the future.]]></description><link>https://bakagiannis.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!wp28!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f41c29-ff2f-45d1-9fe7-4a8352c68f35_1024x1024.png</url><title>Temporal Perspective</title><link>https://bakagiannis.substack.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 09 Apr 2026 16:41:31 GMT</lastBuildDate><atom:link href="https://bakagiannis.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ioannis Bakagiannis]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[bakagiannis@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[bakagiannis@substack.com]]></itunes:email><itunes:name><![CDATA[Ioannis Bakagiannis]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ioannis Bakagiannis]]></itunes:author><googleplay:owner><![CDATA[bakagiannis@substack.com]]></googleplay:owner><googleplay:email><![CDATA[bakagiannis@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ioannis Bakagiannis]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Six Approaches to AI Content Monetization]]></title><description><![CDATA[And Why None of Them Work (Yet)]]></description><link>https://bakagiannis.substack.com/p/six-approaches-to-ai-content-monetization</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/six-approaches-to-ai-content-monetization</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Tue, 16 Dec 2025 15:04:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/218b7554-28f3-4fc7-984d-632f4fde9c6c_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://context4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png" width="500" height="200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50828,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://context4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/180416357?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2><strong>TL;DR</strong></h2><p>Search is still growing, but clicks are shrinking. Since AI summaries became mainstream, <a href="https://context4gpts.com/resource-center/ai-cannibalizing-publisher-revenue">zero-click behaviour rose from 56% to 69% (May 2024 &#8594; May 2025)</a>&#8212;a structural shift in how <a href="https://www.similarweb.com/blog/marketing/geo/citation-gap-analysis/">value moves through the web</a>].</p><p>Publishers responded with six monetization plays: training licenses, training royalties, inference licensing, pay-per-crawl, vendor marketplaces, and ads inside AI answers.</p><p>AI decouples <strong>content consumption</strong> (when a model learns or retrieves) from <strong><a href="https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision">content delivery</a></strong> (when a user receives value). Publishers used to control the delivery moment. That&#8217;s where the meter lived. But now the unit that used to pay you (a visit) is no longer required to extract value.</p><p>Each approach is trying to re-attach revenue to the old meter. Each breaks&#8212;economically, technically, operationally, or structurally.</p><div><hr></div><h2><strong>The Real Problem: The Billing Unit Disappeared</strong></h2><p>Publishers don&#8217;t just create information. They built a monetization system around delivery: pages, sessions, subscriptions, ad slots, recirculation, brand trust. The page view wasn&#8217;t vanity; it was the <strong>billing unit</strong>.</p><p>The open web&#8217;s economic loop was brutally simple:</p><blockquote><p><strong>Publish &#8594; rank &#8594; click &#8594; monetize</strong></p></blockquote><p>AI removes that billing unit and changes the loop to:</p><blockquote><p><strong>Retrieve/train &#8594; synthesize &#8594; answer</strong></p></blockquote><p>The &#8220;answer&#8221; happens inside the AI interface, not on your domain. The user gets value without the default monetization surface.</p><div><hr></div><h2><strong>Approach 1: Bulk Training Licensing</strong></h2><h3><strong>Lump-sum nostalgia for a training era that&#8217;s already over</strong></h3><h3><strong>What it promises</strong></h3><p>Large, one-time licensing deals offer publishers significant upfront revenue while maintaining familiar enterprise sales motions. Major publishers negotiate multi-million dollar contracts with leading AI companies, monetizing their archives at scale.</p><h3><strong>How it works</strong></h3><ul><li><p>A publisher licenses an archive (articles + metadata) for <strong>model training</strong> (and sometimes for product features like summaries/citations).</p></li><li><p>The AI company pays a large upfront fee or multi-year commitment.</p></li><li><p>The publisher treats it like syndication: monetize the back catalog while core traffic economics weaken.</p></li></ul><h3><strong>Why it fails</strong></h3><p><strong>1) It doesn&#8217;t scale past the top of the market</strong><br><a href="https://context4gpts.com/resource-center/ai-cannibalizing-publisher-revenue">68% of known commercial agreements are with News or Media companies</a>. Only the largest publishers with significant brand leverage can command meaningful deals. If you&#8217;re not News Corp, Cond&#233; Nast, or The New York Times, you&#8217;re not getting a deal worth discussing in board meetings. <a href="https://kaptur.co/the-hidden-economy-behind-ai-data-licensing-takes-center-stage/">Kaptur</a>, <a href="https://digiday.com/media/2024-in-review-a-timeline-of-the-major-deals-between-publishers-and-ai-companies/">Digiday</a></p><p><strong>2) Content Licensing Agreements Will Concentrate Markets and hurt the Open Web if it is the only viable model</strong><br>Procurement overhead, trust requirements, and integration costs push buyers toward a small number of high-authority sources. A failure to maintain non-discriminatory access will result in the consolidation of both the AI and content production markets. Sources: <a href="https://www.promarket.org/2025/11/20/content-licensing-agreements-will-concentrate-markets-without-standardized-access/">ProMarket &#8212; Content licensing agreements will concentrate markets</a></p><p><strong>3) The value center is moving from training to inference</strong><br>Compute and product value are shifting toward inference and deployment. <a href="https://menlovc.com/perspective/2025-mid-year-llm-market-update/">Menlo</a> reports <strong>74% of builders now say most workloads are inference</strong>. That&#8217;s where product value and distribution concentrate. Training becomes less of the <a href="https://www.infoworld.com/article/4087007/ai-is-all-about-inference-now.html">&#8220;rentable choke point.&#8221;</a></p><p><strong>4) Publishers aren&#8217;t set up to &#8220;sell datasets&#8221;</strong><br>Publishers are content creators, not data engineers. Packaging, metadata cleanliness, update semantics, provenance, and compliance are non-trivial&#8212;often pushing publishers toward intermediaries that take margin. Additionally the data that have out-of-the-box can be used only for foundational - next word prediction - training. Currently most of the compute goes to post training for instruction following (e.g. SFT) or behavioural training (e.g. RLHF).</p><p><strong>5) Renewal leverage is legally fragile</strong><br>Even when cash is real, publishers are negotiating under a moving legal landscape: if training ends up broadly protected as transformative fair use, renewals get harder and <a href="https://www.skadden.com/insights/publications/2025/07/fair-use-and-ai-training">&#8220;one-time&#8221;</a> becomes <a href="https://www.jonesday.com/en/insights/2025/06/two-us-courts-address-fair-use-in-genai-training-cases">&#8220;one-and-done.&#8221;</a></p><p><strong>Bottom line:</strong> Bulk licensing is a bridge for a few, not a market design for everyone.</p><div><hr></div><h2><strong>Approach 2: Training Royalties / Model-Output-Level Compensation</strong></h2><h3><strong>Attribution fantasy in neural networks</strong></h3><h3><strong>What it promises</strong></h3><p>&#8220;You should earn forever because the model learned from you.&#8221; Publishers can participate in the long-term upside of AI by receiving ongoing royalties based on how their content is embedded in model weights, with attribution at the model level ensuring fair compensation.</p><h3><strong>How it works</strong></h3><ul><li><p>A model is trained on large datasets that include publisher content (licensed or claimed).</p></li><li><p>An attribution system attempts to estimate how much each publisher contributed to outputs or model capability.</p></li><li><p>Royalties are distributed based on estimated contribution (often framed as &#8220;proportional use&#8221;).</p></li></ul><p>What does not belong here: ProRata&#8217;s <a href="https://www.businesswire.com/news/home/20240806000889/en/ProRata-Invents-Generative-AI-Attribution-Technology-to-Compensate-and-Credit-Content-Owners-While-Facilitating-Fairness-and-Fact">&#8220;fractional attribution&#8221;</a> since it is a solution for RAG synthesis and not training.</p><h3><strong>Why it fails</strong></h3><p><strong>1) Model-level attribution isn&#8217;t auditable in the way money requires</strong><br>The hard constraint isn&#8217;t &#8220;we need better analytics.&#8221; It&#8217;s that modern models store learned representations in distributed weights&#8212;not retrievable &#8220;source records.&#8221; When payments depend on causal attribution, you need something closer to accounting than inference.</p><p>Research confirms this: Given the architectural complexity and intrinsic limitations of today&#8217;s LLMs, failures are not outliers but structural inevitabilities, and their black-box nature makes error diagnosis and causal attribution prohibitively difficult. <a href="https://arxiv.org/html/2510.17256v1">LLM Explainability</a>, <a href="https://arxiv.org/html/2510.10161">Large Language Model Sourcing: A Survey</a>, <a href="https://arxiv.org/html/2404.12691v1">Data Authenticity, Consent, &amp; Provenance for AI are all broken</a></p><p><strong>2) It becomes adversarial immediately</strong><br>Once money depends on attribution, everyone optimizes for it&#8212;poisoning, laundering, prompt manipulation, strategic paraphrase.</p><p><strong>3) Output-level citation &#8800; training-level attribution</strong><br>Some systems can track what appears in an answer. That&#8217;s useful, but it&#8217;s not proof of what trained the model (or what shaped latent knowledge). For example watermarking is the prevalent technique of establishing provenance in GenAI but watermark detection tools, especially for text, may be able to provide <a href="https://www.ntia.gov/issues/artificial-intelligence/ai-accountability-policy-report/developing-accountability-inputs-a-deeper-dive/information-flow/ai-output-disclosures">only a statistical confidence score</a>, not a definitive attribution, for the content&#8217;s origins.</p><p><strong>Bottom line:</strong> Royalties that require model-level attribution collapse under audit, dispute, and adversarial pressure.</p><div><hr></div><h2><strong>Approach 3: Direct Inference Licensing</strong></h2><h3><strong>Premium publisher aristocracy that breaks the open web</strong></h3><h3><strong>What it promises</strong></h3><p>Pay at the moment of use: if an AI system retrieves your content to answer a query, you get paid. This <em>sounds</em> aligned with value delivery.</p><h3><strong>How it works</strong></h3><ul><li><p>Publishers provide licensed APIs/feeds directly to LLMs.</p></li><li><p>AI apps call them during inference for freshness/grounding. Original artifacts like articles, blogs or pieces of them are fed back to the model.</p></li><li><p>Billing is per call, per document, per token returned, or contracted tiers. c</p></li></ul><h3><strong>Why it fails</strong></h3><p><strong>1) Works best for top-tier publishers</strong><br>Procurement overhead, trust requirements, and integration costs push buyers toward a small number of high-authority sources.</p><p>The market structure mirrors Approach 1: &#8220;The likely outcome is a dual consolidation: fewer major publishers controlling content supply, and fewer major AI firms controlling demand.&#8221;</p><p><strong>2) It encourages exclusivity and competitive foreclosure</strong><br><a href="https://www.promarket.org/2025/11/18/anticompetitive-acquiescence-in-ai-content-licensing/">&#8220;Anticompetitive acquiescence&#8221;</a> describes when companies acquiesce in lawsuits, licensing, or regulation to raise rivals&#8217; costs&#8212;potentially benefiting if competitors suffer more or potential competitors never enter the market.</p><p>Publishers want guaranteed revenue and preferential placement; AI companies want stable coverage and advantage. The market consolidates around the biggest players, eliminating the long tail. For example ChatGPT could use only one news source for America - imagine how detrimental that would be - to ensure coverage.</p><p><strong>3) No clean mapping between value, cost, and pricing unit</strong><br>Inference costs vary wildly by output length, context, model choice, and user behaviour, making &#8220;per query&#8221; pricing hard to reconcile with unit economics.<br>Sources: <a href="https://www.cloudzero.com/blog/inference-cost/">CloudZero &#8212; Your Guide To Inference Cost</a>, <a href="https://www.getmonetizely.com/articles/the-ai-inference-cost-problem-how-to-price-when-compute-costs-vary/">Monetizely &#8212; AI Inference Cost Problem</a></p><p>Publishers want to charge per query. AI companies experience costs per token. Users expect value per answer. There&#8217;s no natural mapping between these three.</p><p>A simple query might trigger complex retrieval logic, pulling from dozens of sources, while a complex query might be answered from cached knowledge. Who pays what, and based on which metric?</p><p><strong>Bottom line:</strong> Works for a small set of premium brands; structurally hostile to broad, open supply. <strong>But</strong> it could work for vertical AI application integration with niche / expert content creators.</p><div><hr></div><h2><strong>Approach 4: Pay-Per-Crawl / Access</strong></h2><h3><strong>Metering theater with enforcement gaps</strong></h3><h3><strong>What it promises</strong></h3><p>Charge bots to access content. Simple, usage-based, publisher-controlled.</p><h3><strong>How it works</strong></h3><ul><li><p>A publisher (often via CDN/proxy like <a href="https://blog.cloudflare.com/introducing-pay-per-crawl/">Cloudflare</a>) classifies automated traffic: allow, block, or require payment.</p></li><li><p>Pricing is typically per request (crawl) or per page accessed, sometimes with tiers.</p></li><li><p>Access protocols/standards like <a href="https://rslstandard.org/rsl">RSL</a>) (an XML-based open standard enabling machine-readable licensing and automated compensation. Publishers add machine-readable terms to robots.txt files) try to formalize &#8220;what uses are allowed&#8221; and &#8220;what costs apply.&#8221;</p></li><li><p>Bots that pay gain access; bots that don&#8217;t are blocked.</p></li></ul><h3><strong>Why it fails</strong></h3><p><strong>1) It penalizes freshness and repeat payments</strong><br>Good AI systems refresh and cross-check. Pay-per-crawl makes that expensive, pushing caching and staleness. This creates an incentive to cache aggressively and crawl less frequently, leading to stale data. The economic model punishes exactly the behaviour that would improve AI quality: frequent, thorough content retrieval. Also the definition of &#8220;caching&#8221; can be extended a great deal, meaning an LLM can cache a search result into storage and never require to fetch that data again since it is getting access to raw content.</p><p><strong>2) Latency and Fragmentation</strong><br>It fragments the web into thousands of toll booths. Every toll booth adds auth/payment/policy checks&#8212;new failure modes in a latency-sensitive stack. Inference latency introduces <a href="https://www.tensormesh.ai/blog-posts/ai-inference-latency-slow-response-times-and-revenue">hidden</a> and <a href="https://context4gpts.com/resource-center/real-cost-web-scraping">opportunity</a> costs that AI companies cannot afford.</p><p><strong>3) Complexity of implementation appeals only to a handful players</strong> The <a href="https://rslstandard.org/rsl">RSL</a> implementation guide for an AI company is 11 pages long - I know because I made one for testing. And then the application will have to register with a transaction partner - most likely a DSP like in the <a href="https://blog.bidswitch.com/https/blog.bidswitch.com/announcing-the-dynamic-content-ledger">Bidswitch example</a> - in order to execute this content trade. On the other hand publishers need to implement this at the page level. The Standard that is being worked right now by the <a href="https://iabtechlab.com/announcing-content-monetization-protocols-comp-for-ai-working-group/">IAB CoMP working group</a> is an OpenRTB style standard - which is pretty lengthy and detailed - for EACH content page. Then integrate with an ecosystem partner that runs a licensing server for all these pages. I wonder who has the capacity to implement such a thing (<code>thinking emoji)</code>.</p><p><strong>4) Enforcement is the central failure mode</strong><br>Non-compliance is measurable and rising: <strong><a href="https://www.theregister.com/2025/12/08/publishers_say_no_ai_scrapers/">13.26% of AI bot requests ignored robots.txt in Q2 2025</a></strong>. CDNs and infrastructure providers will definitely try to help in that direction but in the end if AI companies can get access to content without paying they will do that. At the same time adds a ton of complexity that smaller publishers <a href="https://stytch.com/blog/how-to-block-ai-web-crawlers/">cannot afford</a>.</p><p><strong>5) Catch 22: Discovery requires the thing you&#8217;re trying to monetize</strong><br>The model assumes crawlers will <em>find</em> your content, evaluate it, and decide to pay. But discovery itself requires access&#8212;the very thing being metered. If you block crawlers by default, you&#8217;re invisible to the index. If you allow free access for discovery, you&#8217;ve already given away the data. It&#8217;s a structural catch-22: to be discovered, you must be crawled; to be crawled under pay-per-access, you must already be known. The system breaks at bootstrap.</p><p>Currently, Google&#8217;s AI Overviews exemplify this: the search index feeds the AI, and the AI reduces clicks, but the index was built on free crawling. Pay-per-crawl assumes a world where discovery and access are separate, but in practice, they collapse into the same request. You can&#8217;t sell access to something that hasn&#8217;t been discovered, and you can&#8217;t be discovered without granting access.</p><p><strong>6) Pricing remains speculative, neither market-tested nor validated at scale</strong><br>Pay-per-crawl assumes a rational pricing equilibrium, but no one has demonstrated how to price access in a way that works for both sides. Publishers set rates hoping to capture value; AI companies face unpredictable costs that compound across thousands of sources. The underlying bet is that regulatory pressure and blocking leverage will force well-capitalized players&#8212; namely Google and OpenAI&#8212;into compliance, creating a de facto standard through coercion rather than market discovery. This is not price formation. It&#8217;s a negotiation standoff disguised as a business model.</p><p><strong>Bottom line:</strong> Metering without enforceability RAW ACCESS becomes a leaky tax that degrades quality and still doesn&#8217;t guarantee payment with a flawed design mechanism geared towards the big players.</p><div><hr></div><h2><strong>Approach 5: Vendor Content Marketplaces</strong></h2><h3><strong>Closed platforms with no public proof</strong></h3><h3><strong>What it promises</strong></h3><p>Third-party marketplaces can solve the coordination problem by aggregating publisher content and connecting it with AI companies seeking licensed data, creating network effects and standardized access. Centralized Pay-Per-Query models.</p><h3><strong>How it works</strong></h3><ul><li><p>Publishers integrate vendor tech (gateway, authentication, metering, settlement).</p></li><li><p>The vendor aggregates supply and sells a unified pipe to AI companies.</p></li><li><p>Vendors provide analytics and payouts; they take a cut.</p></li></ul><p>Examples:</p><ul><li><p><a href="https://techcrunch.com/2024/06/26/dappier-is-building-a-marketplace-for-publishers-to-sell-their-content-to-llm-builders/">Dappier marketplace</a></p></li><li><p><a href="https://tollbit.com/blog/akamai-partnership/">TollBit &#8212; Akamai partnership</a></p></li></ul><h3><strong>Why it fails</strong></h3><p><strong>1) Cold-start dynamics are brutal</strong><br>Marketplaces struggle when premium publishers can do direct deals and the long tail can&#8217;t attract demand.</p><p><strong>2) Build on rented land (AGAIN)</strong><br>Publishers learned this lesson with Google: build on someone else&#8217;s infrastructure + distribution, and you&#8217;re subject to their terms, their margins, and their strategic pivots. When Google hit critical mass, it could unilaterally change ranking algorithms, ad share, and traffic flow&#8212;and publishers had no recourse.</p><p>Vendor marketplaces recreate this dynamic. You integrate their APIs, route traffic through their pipes, accept their analytics, and trust their settlement. When they reach scale, they control pricing, terms, and access to demand. If they change margin splits or restrict publisher controls, you have no leverage&#8212;your integration costs and workflow dependencies make switching prohibitively expensive.</p><p>This isn&#8217;t hypothetical. We&#8217;re watching it happen now: Google&#8217;s AI Overviews and AI Mode demonstrate how platforms can unilaterally insert themselves between publishers and users, extracting value without negotiation. Vendor marketplaces promise to prevent this&#8212;while building the exact same structural dependency under a different brand.</p><p>The open web doesn&#8217;t survive by replacing one intermediary with another. It survives when its citizens retain control over pricing, distribution, and the ability to exit without penalty.</p><p><strong>3) &#8220;Control&#8221; often becomes lock-in</strong><br>Vendor-specific integration increases switching costs; the vendor often owns demand relationships.</p><p><strong>4) Transparency remains theater, not infrastructure</strong><br><a href="https://iabtechlab.com/announcing-content-monetization-protocols-comp-for-ai-working-group/">IAB Tech Lab&#8217;s working group</a> explicitly called out &#8220;the absence of a marketplace and methods to attribute contribution of content.&#8221;</p><p>Publishers entering these marketplaces have no independent way to verify:</p><ul><li><p>Actual usage volume (how many times their content was retrieved)</p></li><li><p>Realized pricing (what AI companies actually paid per use)</p></li><li><p>Attribution methodology (how operators determine which content was &#8220;used&#8221;)</p></li><li><p>Marketplace margin (what percentage the intermediary captures)</p></li></ul><p>This isn&#8217;t new. We&#8217;ve seen this pattern before in programmatic advertising: walled gardens control measurement, reporting, and settlement, then report numbers that align with their economics, not yours. The difference is that programmatic ad exchanges at least had third-party verification and auditability standards. Vendor content marketplaces don&#8217;t.</p><p>Without open APIs, standardized reporting schemas, or third-party audits, &#8220;transparency&#8221; becomes whatever the vendor chooses to show you. And when the vendor controls both supply access and demand relationships, publishers have no leverage to demand better.</p><p><strong>Bottom line:</strong> A marketplace could be right; a <em>closed</em> marketplace becomes another dependency.</p><div><hr></div><h2><strong>Approach 6: Ads in AI Responses &amp; Affiliate Hybrids</strong></h2><h3><strong>Trust destruction for fragile yield</strong></h3><h3><strong>What it promises</strong></h3><p>Bring the most proven web monetization engine into answer interfaces. But conceal them as content.</p><h3><strong>How it works</strong></h3><ul><li><p>AI application requests content from a website.</p></li><li><p>The publisher along with the help of AdTech vendors inject paid advertising into the retrieved content.</p></li><li><p>OR publishers create advertorial content directly.</p></li><li><p>Publishers get paid in a CPM way based on content access.</p></li></ul><h3><strong>Why it fails</strong></h3><p><strong>1) Ads attack the core product asset: trust</strong><br>If users suspect <a href="https://www.nim.org/en/publications/detail/transparency-without-trust">commercial bias</a>, the answer engine loses the &#8220;utility&#8221; advantage that made it sticky. Trust is the highest adoption lever for an AI company. Losing that will lead to catastrophe.</p><p><strong>2) Open to Fraud</strong><br>It is very straight forward for someone who has been in AdTech and digital advertising to see that this mechanism can be gamed easily through botting. There will be the same MFA issue that the current open web has.</p><p><strong>3) Brand Voice breaks</strong><br>Ads are not displayed as they were integrated into the content of the publisher. The user&#8217;s LLM will do post-processing of the whole context to reply to the user. The brand&#8217;s messaging is most likely going to change in a way that is not controllable.</p><p><strong>4) Alignment contamination risk is real</strong><br>Sponsored outputs leaking into training/feedback loops can create persistent commercial bias and AI models do not act on <a href="https://www.alignmentforum.org/">user&#8217;s best interests</a>.</p><p><strong>Bottom line:</strong> Ads can and should exist in conversational interfaces but separate from the actual content with the right disclosure signals.</p><div><hr></div><h2><strong>What a Working Model Must Do</strong></h2><p>A durable solution has to match how AI behaves:</p><ol><li><p><strong>Monetize inference, not training</strong><br>Training is episodic. Inference is continuous. The monetization surface must live at the delivery moment (Not saying that publishers should not do that as well, but it is not the scalable economic model for the industry).</p></li><li><p><strong>Work without perfect attribution</strong><br>No payment system should depend on reconstructing causal contribution inside a black box. This means moving from &#8220;pay for what you contributed&#8221; to &#8220;participate in the value you enabled.&#8221;</p></li><li><p><strong>Prevent caching from zeroing out publisher participation</strong><br>Incentives must keep knowledge providers in the loop when their information is relied on.</p></li><li><p><strong>Preserve the long tail</strong><br>If only the top 50 brands get paid, the web shrinks into an aristocracy.</p></li><li><p><strong>Offer predictable economics for AI builders</strong><br>Cost volatility will push builders to route around the system.</p></li></ol><div><hr></div><h2><strong>Ending with The Paradigm Shift</strong></h2><h3><strong>Monetizing the artifact, not the moment of usefulness</strong></h3><p>In the 2000s, everyone thought print would remain the cash driver while websites were a quirky distribution channel. Publishers invested in print infrastructure, optimized print advertising, and treated web properties as experiments.</p><p>They were wrong. The web didn&#8217;t complement print but it replaced it as the main revenue driver. The business driver shifted from &#8220;how many newspapers do we sell&#8221; to &#8220;how much web traffic do we generate.&#8221;</p><p>Now we&#8217;re making the same mistake again.</p><p>Everyone thinks websites are the cash driver while AI is a quirky distribution channel. Publishers are investing in SEO, optimizing programmatic advertising, and treating AI licensing as an experiment.</p><p>They&#8217;re wrong again.</p><p>AI will complement websites as much as they did complement print. The business driver is shifting from &#8220;how much web traffic do we generate&#8221; to &#8220;how much value do we deliver in AI value chains.&#8221;</p><blockquote><p><strong>Value is created when the AI delivers a useful answer - not when it ingests content.</strong></p></blockquote><p>If the AI can answer from memory or cache, publisher participation goes to zero. If attribution is required, the system becomes non-auditable. If enforcement is required, the system becomes leaky. If exclusivity is required, the open web collapses.</p><h3><strong>From artifacts to value flows</strong></h3><p>In the 2000s, many publishers treated the web as &#8220;distribution.&#8221; It became the business model.</p><p>Now, many are treating AI as &#8220;distribution + licensing upside.&#8221; That&#8217;s not what it is.</p><p>AI is becoming the primary interface between questions and knowledge. So the strategic question isn&#8217;t:</p><blockquote><p>&#8220;How do we get paid for our content?&#8221;</p></blockquote><p>It&#8217;s:</p><blockquote><p><strong>&#8220;How does value flow in AI systems where our knowledge is used&#8212;and where can we attach a fair, enforceable, scalable price?&#8221;</strong></p></blockquote><p>Answer that, and you have a survival plan.</p><p>The future of publisher monetization won&#8217;t be a better contract. It will be <strong>a new market structure</strong>, one that prices usefulness at inference time, without impossible attribution and without turning the open web into a gated estate.</p><p>Let&#8217;s build something that works.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Hidden Cost of Web Scraping]]></title><description><![CDATA[Why AI Apps Are Burning Money on Bad Context]]></description><link>https://bakagiannis.substack.com/p/the-hidden-cost-of-web-scraping</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/the-hidden-cost-of-web-scraping</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Mon, 08 Dec 2025 19:09:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5341c5c7-0fe5-41da-b425-81b6385b6cef_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://context4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png" width="500" height="200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50828,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://context4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/180416357?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>Every AI application founder thinks web scraping is the cheapest way to get context. They&#8217;re wrong. It&#8217;s the most expensive infrastructure choice you can make but usually it is the only one.</p><p>The math seems compelling: why pay for content when you can scrape it for free? Build a few parsers, spin up some proxies, and you have access to the entire web. Your cost is just server time and a couple of engineers maintaining the code. Simple, right?</p><p>Except it&#8217;s not simple. And it&#8217;s definitely not cheap.</p><p>Here&#8217;s what actually happens: Your scrapers burn money through token bloat, create compounding engineering debt, expose you to existential legal risk, degrade answer quality, and kill user trust. All while you think you&#8217;re saving money.</p><p>The hidden costs are so high that AI companies relying on scraping are operating on borrowed time. The question isn&#8217;t whether they&#8217;ll realize it&#8217;s expensive&#8212;it&#8217;s whether they&#8217;ll figure it out before their competitors do.</p><h2><strong>The Token Economics Nobody Talks About</strong></h2><p>Let&#8217;s start with the most immediate, measurable cost: tokens.</p><p>When you scrape content, you&#8217;re not just paying for inefficient extraction&#8212;you&#8217;re paying for massive overconsumption. <strong>You ingest entire articles when you only need specific facts, definitions, or relevant excerpts.</strong></p><p><strong>The real comparison isn&#8217;t scraped vs. structured content. It&#8217;s what you ingest vs. what you actually need.</strong></p><h3><strong>The Overconsumption Problem</strong></h3><p>Consider what actually happens in AI applications:</p><p><strong>Scenario 1: Answering a factual question</strong></p><p>- User asks: &#8220;What is the capital of France?&#8221; (assume that the LLM does not have the answer in the training data)</p><p>- <strong>What you need</strong>: A simple fact (~10-20 tokens: &#8220;Paris is the capital of France&#8221;)</p><p>- <strong>What you ingest with scraping</strong>: Full article about Paris (~1,200 tokens)</p><p>- <strong>Waste: 98% </strong>of tokens are unnecessary</p><p><strong>Scenario 2: Getting a definition</strong></p><p>- User asks: &#8220;What is retrieval-augmented generation?&#8221;</p><p>- <strong>What you need</strong>: A concise definition (~50-100 tokens)</p><p>- <strong>What you ingest with scraping**</strong>: Full technical article (~1,500 tokens)</p><p>- <strong>Waste: 93% </strong>of tokens are unnecessary</p><p><strong>Scenario 3: Multi-source research synthesis</strong></p><p>- User asks: &#8220;Compare the economic policies of three different countries&#8221;</p><p>- <strong>What you need</strong>: Relevant excerpts from 10 sources (~100-200 tokens each = 1,000-2,000 tokens total)</p><p>- <strong>What you ingest with scraping</strong>: 10 full articles (~1,200 tokens each on avg = 12,000 tokens)</p><p>- <strong>Waste: 83% </strong>of tokens are unnecessary</p><p>Modern scraping tools like trafilatura and newspaper3k extract main content from HTML[^1]&#8212;but they still give you the entire article. A typical news article contains 600-900 words (~800-1,200 tokens).[^2]</p><p><strong>The problem isn&#8217;t extraction quality&#8212;it&#8217;s that you&#8217;re ingesting 10-100x more content than you need.</strong></p><h3><strong>The Real Token Economics</strong></h3><p><strong>100,000 conversations per month scenario:</strong></p><p>- 2 content retrievals per conversation (VERY conservative)</p><p>- Model: Claude Sonnet 4.5 ($3 input per million tokens)</p><p><strong>Current approach (scraping full articles):</strong></p><p>- Average per retrieval: 1,200 tokens</p><p>- Monthly: 100K &#215; 2 &#215; 1,200 = 240M tokens = <strong>$720/month</strong></p><p><strong>What you actually need (relevant excerpts/facts):</strong></p><p>- Average per retrieval: 150-200 tokens (targeted information)</p><p>- Monthly: 100K &#215; 2 &#215; 175 = 35M tokens = <strong>$105/month</strong></p><p><strong>Real waste: $615/month or $7,380/year</strong></p><p>That&#8217;s not 15-30% overhead, <strong>it&#8217;s 85% waste</strong>.</p><p>At 1 million conversations per month, you&#8217;re burning <strong>$73,800 per year</strong> on unnecessary tokens&#8212;ingesting content you never needed in the first place.</p><p>And this is just input tokens. Output tokens cost more (3-5x input pricing), and when models process massive irrelevant context, they generate longer, less precise outputs&#8212;compounding the waste further.</p><h2><strong>Technical Performance: How Scraping Kills Quality</strong></h2><p>Token costs are just the beginning. <strong>Web scraping degrades AI performance in measurable, research-proven ways</strong>, from context engineering to answer accuracy to user experience.</p><h3><strong>The Signal-to-Noise Problem</strong></h3><p>News articles aren&#8217;t designed for AI consumption, they&#8217;re designed for human browsing and ad monetization.</p><p>A typical HTML page includes:</p><p>- Site-wide navigation, headers, menus</p><p>- Display ads, newsletter forms, trending widgets</p><p>- The actual article (finally)</p><p>- Comment sections (often unmoderated, low-quality)</p><p>- Related articles, site maps, legal links</p><p>- Cookie notices, subscription prompts</p><p>The actual content represents a <strong>small fraction of total HTML tokens</strong>. Tools like Boilerpipe exist specifically to &#8220;detect and remove surplus clutter&#8221; because <strong>web pages contain so much boilerplate</strong> that extraction is a non-trivial engineering problem.[^34]</p><h3><strong>The Lost in the Middle Problem</strong></h3><p>Context engineering has emerged as critical for modern AI applications.[^5] The core insight: context is a finite resource with diminishing marginal returns.[^6]</p><p>Stanford research (Liu et al., 2023) demonstrated that language model performance is highest when relevant information occurs at the beginning or end of input context, and significantly degrades when models must access information in the middle.[^8] Recent research shows that <strong>context length alone hurts LLM performance even with perfect retrieval</strong>.[^9]</p><p>Now consider scraped HTML structure:</p><p>- <strong>Top</strong>: Navigation, headers, site-wide elements</p><p>- <strong>Middle</strong>: Actual article content (what you need)</p><p>- <strong>Bottom</strong>: Comments, ads, footer</p><p>You&#8217;re placing the signal exactly where the model performs worst. Structured content can place relevant information at the beginning, where models excel. Scraping locks you into a structure optimized for humans, not AI.</p><h3><strong>RAG Performance Degradation</strong></h3><p>Multiple research studies document that <strong>retrieval noise and redundancy degrade output quality in RAG systems</strong>.[^35][^36]</p><p>1. RAG systems suffer when encountering noisy or irrelevant documents</p><p>2. Misalignment between retrieved evidence and generated text leads to hallucinations</p><p>3. As context passages increase, &#8220;noise&#8221; also increases</p><p>4. Reader performance may plateau or degrade&#8212;sometimes beyond no-context performance</p><p><strong>Web scraping introduces systematic noise</strong>&#8212;navigation, ads, comments, boilerplate&#8212;that no model architecture can fully compensate for. Scraped HTML often contains all three corruption types within a single page: the article is relevant, the sidebar is irrelevant, and comments may contain counterfactual claims.</p><h3><strong>Hallucination Correlation</strong></h3><p>Research on hallucinations identified types:[^37]</p><ul><li><p><strong>Fabricated (43%)</strong></p></li><li><p><strong>Negations (30%)</strong></p></li><li><p><strong>Contextual (17%)</strong></p></li><li><p><strong>Causality-related (10%)</strong></p></li></ul><p>Recent research demonstrates a direct, measurable relationship between context length with low signal-to-noise ratio and hallucination rates.</p><p><strong>The hallucination rate increases with context length, reaching approximately 45% when context approaches 2,000 tokens.</strong>[^48] This isn&#8217;t theoretical&#8212;it&#8217;s a measured phenomenon across multiple studies.</p><p>Research on RAG systems reveals that <strong>models get &#8220;distracted&#8221; by irrelevant content in documents, particularly in long documents where the answer isn&#8217;t obvious.</strong>[^50] When retrieval granularity is too large, retrieved blocks contain excessive irrelevant content, increasing the cognitive burden on models and causing answers to deviate from the query.[^51]</p><p>Research using mechanistic interpretability (ReDeEP, 2024) revealed the internal mechanism: hallucinations occur when Knowledge FFNs in LLMs overemphasize parametric knowledge while Copying Heads fail to effectively retain or integrate external knowledge from retrieved content<strong>.</strong>[^53]</p><p><strong>The research is unambiguous: noisy context doesn&#8217;t just fail to help&#8212;it actively makes hallucinations more likely.</strong></p><p>Web scraping systematically introduces the exact conditions research identifies as causing hallucinations: long contexts, irrelevant content mixed with signal, poor information positioning, and high noise-to-signal ratios.</p><h3><strong>The Multi-Source Dilemma</strong></h3><p>Research shows that <strong>complete information about a query is rarely found in a single source</strong>.[^39] Natural answers require aggregating information from multiple sources.</p><p>This creates a painful dilemma:</p><p>Single-source or limited scraping:</p><ul><li><p>Lower costs and legal risk</p></li><li><p>But: Incomplete answers, lower quality</p></li></ul><p>Multi-source scraping (50+ publishers):</p><ul><li><p>Better quality and diversity</p></li><li><p>But: Exponential costs, multiplied risks</p></li></ul><p>It&#8217;s a catch-22: you need diversity for quality, but diversity multiplies cost and risk. And even with successful multi-source retrieval, <strong>multi-source synthesis remains challenging</strong>.[^40]</p><h3><strong>The Multi-Turn Conversation Penalty</strong></h3><p>Context bloat compounds throughout multi-turn conversations. Modern AI applications maintain conversation history, user preferences, retrieved content, and system instructions.</p><p>When scraped content consumes 3-4x more tokens than necessary, you&#8217;re forced into painful trade-offs:</p><ul><li><p>Drop older conversation turns (losing continuity)</p></li><li><p>Reduce content sources (sacrificing quality)</p></li><li><p>Compress context through additional LLM calls (adding latency and cost)</p></li><li><p>Sacrifice personalization (making responses less relevant)</p></li></ul><h3><strong>The Latency Tax</strong></h3><p>Context window size has a linear relationship with time to first token.[^11] AWS research shows that <strong>models experience substantial slowdowns when processing contexts exceeding 100,000 tokens</strong>.[^13]</p><p>Users notice lag. DAU/MAU metrics directly correlate with response quality and speed. Apps with DAU/MAU over 50% are world-class; most average 10%.[^15]</p><p>Every second of added latency pushes your stickiness ratio down<strong>.</strong> Since acquiring new users costs 5-7x more than retaining existing ones,[^16] latency isn&#8217;t just a UX problem&#8212;it&#8217;s a profitability problem.</p><h2><strong>Engineering Debt: The Cost That Never Stops</strong></h2><p>When founders evaluate web scraping, they estimate the initial build&#8212;maybe 2-4 weeks for scrapers. Then they move on.</p><p>What they miss: the initial build is only 30-40% of total engineering cost. The other 60-70% is maintenance&#8212;and it never stops.</p><h3><strong>The True Development Cost</strong></h3><p>With AI coding assistants providing 55% faster development,[^17] you have three options:</p><p><strong>Option 1: Build in-house with AI copilots</strong></p><p><strong>Initial development: $60,000-$80,000 over 2-3 months</strong></p><ul><li><p>Requirements and architecture with AI-generated boilerplate ($20K)</p></li><li><p>Core development for 20-30 publishers ($40K)</p></li><li><p>Testing, retry logic, error handling</p></li></ul><p>Hidden assumption: this only works for <strong>20-30 publishers</strong>. Scale to 50+ and complexity explodes non-linearly, pushing costs toward $120-150K.</p><p><strong>Option 2: Third-party services (Tavily, Firecrawl)</strong></p><p>Services promise to eliminate development costs entirely.[^18] But they create new problems:</p><ul><li><p><strong>Loss of control</strong>: Can&#8217;t customize extraction or optimize for your needs</p></li><li><p><strong>Quality unpredictability</strong>: At mercy of their extraction quality</p></li><li><p><strong>Costs scale</strong>: $0.005-$0.008/page compounds quickly</p></li><li><p><strong>Legal liability still yours</strong>: You&#8217;re still responsible for how content is used</p></li></ul><p>For 100K conversations/month (200K pages):</p><ul><li><p>Firecrawl: ~$1,000-$1,600/month ($12-19K/year)</p></li><li><p>Tavily: ~$1,600/month ($19K/year)</p></li></ul><p>Plus engineering time to integrate, monitor, and handle failures.</p><p><strong>Option 3: Hybrid approach (most common)</strong></p><p>Use third-party for MVP, then build custom as you scale. This means:</p><ul><li><p>Third-party costs while testing</p></li><li><p>Development costs when you need custom solutions</p></li><li><p><strong>Double the complexity, double the maintenance</strong></p></li></ul><p>The AI copilot productivity boost is real&#8212;but it doesn&#8217;t eliminate the fundamental problem: <strong>web scraping infrastructure is inherently brittle.</strong></p><h3><strong>The Maintenance Nightmare</strong></h3><p>Websites change constantly. Publishers redesign, update HTML, add anti-bot measures, change URL patterns. Every change breaks your scrapers.</p><p>Industry practitioners report that <strong>engineering teams spend 20-30% of their time maintaining existing scrapers</strong>.[^19] For a 5-person team, that&#8217;s one full-time engineer just keeping the lights on.</p><p>Update frequency is relentless:[^20]</p><ul><li><p>1-3 websites out of every 30 require updates each month</p></li><li><p>With strong anti-bot solutions, updates needed monthly or more</p></li><li><p>Each incident requires 2-3 developer days to fix</p></li></ul><p>The opportunity cost is staggering. Those hours could build features that differentiate your product. Instead, they&#8217;re spent reverse-engineering HTML changes and bypassing CAPTCHAs.</p><h3><strong>The Non-Linear Scaling Problem</strong></h3><p>Scraping 5 publishers? Manageable.</p><p>For early-stage prototypes serving fewer than 10,000 queries per month from 3-5 publishers, scraping can be a pragmatic short-term choice. The overhead is contained, legal exposure is minimal, and token costs remain low.</p><p>But this changes fast.</p><p>Scraping 50 publishers? Completely different problem.</p><p>Each publisher has:</p><ul><li><p>Different HTML structure requiring custom parsing logic</p></li><li><p>Different anti-scraping measures (Cloudflare, CAPTCHAs, JavaScript challenges)</p></li><li><p>Different URL patterns and content organization</p></li><li><p>Different update frequencies</p></li><li><p>Different legal terms and enforcement</p></li></ul><p>You can&#8217;t template this. Every publisher is a bespoke engineering problem.</p><p>Real-world example: building a custom solution for a difficult site required <strong>several weeks of developer time&#8212;thousands of dollars</strong> that easily outweighed third-party service fees.[^21]</p><p>Multiply that by 50+ publishers, each with their own quirks, each updating on their own schedule. The maintenance burden scales exponentially.</p><h3><strong>Infrastructure Hidden Costs</strong></h3><p><strong>Proxy Services:</strong>[^22]</p><ul><li><p>Premium residential/mobile proxies: $99+/month</p></li><li><p>Pricing by bandwidth: $6.60/GB and up</p></li><li><p>High-volume scraping: $500-2,000+/month</p></li></ul><p><strong>CAPTCHA Solving:</strong>[^23]</p><ul><li><p>2Captcha: ~$1.16 per 1,000 CAPTCHAs</p></li><li><p>Millions of retrievals per month: thousands in CAPTCHA costs</p></li></ul><p><strong>Storage and Processing:</strong></p><ul><li><p>Scraped HTML storage</p></li><li><p>Extraction pipelines</p></li><li><p>Content databases</p></li><li><p>Data freshness management</p></li></ul><p><strong>Typical monthly infrastructure: $200-$1,000+</strong> depending on scale.</p><h2><strong>Business Risk: Legal Exposure and Trust Crisis</strong></h2><p>Legal risk doesn&#8217;t show up on your monthly cost report until it destroys your company. And in 2025, how you handle data matters just as much as what your product does.</p><h3><strong>The Lawsuit Tsunami</strong></h3><p>The legal landscape has shifted dramatically. What was once gray area is now a minefield.</p><p><strong>The New York Times vs. OpenAI:</strong> The NYT has spent <strong>$10.8 million in legal bills</strong> fighting this case&#8212;and it&#8217;s not over. The judge ordered OpenAI to turn over <strong>20 million ChatGPT conversation logs</strong>.[^24][^25][^26]</p><p><strong>News Corp vs. Perplexity AI:</strong> Sued for &#8220;willfully copied copious amounts of copyrighted material.&#8221; Perplexity proudly marketed &#8220;skip the links&#8221;&#8212;directly threatening publisher business models. TollBit revealed Perplexity&#8217;s scrape-to-referral ratio: <strong>369 scrapes for every 1 referral</strong>.[^26][^27]</p><p><strong>Canadian Publishers vs. OpenAI:</strong> Multiple outlets sued for copyright infringement, circumvention of protective measures, breach of terms, and unjust enrichment.[^28]</p><p>The pattern is clear: publishers are aggressively defending their content across multiple jurisdictions.</p><h3><strong>Terms of Service Violations</strong></h3><p>Even if copyright law remains ambiguous, scraping violates explicit Terms of Service&#8212;creating clear breach of contract liability.</p><p><strong>Publisher TOS prohibitions:</strong>[^29]</p><ul><li><p><strong>Ryanair</strong>: Prohibits automated data extraction</p></li><li><p><strong>Meta</strong>: Prohibits collection via automated technology</p></li><li><p><strong>LinkedIn</strong>: Prohibits scraping of member profiles</p></li><li><p><strong>X Corp</strong>: Prohibits scraping in browsewrap and clickwrap agreements</p></li></ul><p>According to 404 Media, <strong>28% of &#8220;most actively maintained, critical sources&#8221; have restricted AI scraping</strong> in the last year.[^30] Researchers call this an &#8220;emerging crisis.&#8221;</p><h3><strong>The Transparency Problem</strong></h3><p>Scraping practices are becoming <strong>publicly measurable and embarrassingly transparent</strong>.</p><p>TollBit&#8217;s 2024 report exposed scrape-to-referral ratios:[^32]</p><ul><li><p><strong>OpenAI: 179:1</strong></p></li><li><p><strong>Perplexity: 369:1</strong></p></li><li><p><strong>Anthropic: 8,692:1</strong></p></li></ul><p>These numbers are cited in lawsuits, reported in press, and discussed in publisher board rooms. When your business model relies on scraping 369 times while sending back 1 referral, you&#8217;re extracting value until publishers shut you down.</p><h3><strong>Reputational Damage</strong></h3><p><strong>Perplexity AI:</strong>[^43]</p><ul><li><p>News Corp lawsuit, Forbes plagiarism accusation</p></li><li><p>Scrape-to-referral ratio of 369:1 became public embarrassment</p></li></ul><p><strong>Meta:</strong>[^44]</p><ul><li><p>Leaked documents showed scraping while ignoring robots.txt</p></li></ul><p><strong>OpenAI:</strong>[^45]</p><ul><li><p>NYT lawsuit costing $10.8M+, Indian copyright suit</p></li><li><p>High-profile legal battles creating negative brand association</p></li></ul><p>These aren&#8217;t obscure technical disputes&#8212;they&#8217;re <strong>front-page news</strong>. For every company like OpenAI with resources to weather the storm, dozens of smaller AI applications would be destroyed by similar controversies.</p><h3><strong>Consumer Trust</strong></h3><p>According to Cisco&#8217;s 2024 Consumer Privacy Survey:[^41]</p><p><strong>75% of consumers won&#8217;t buy from companies they don&#8217;t trust with their data</strong></p><p>The same research found:</p><ul><li><p>Consumers who trust providers spent <strong>50% more</strong> on connected devices</p></li><li><p><strong>51% of &#8220;Privacy Actives&#8221; have switched companies</strong> due to data privacy concerns</p></li><li><p><strong>49% of consumers aged 25-34 have switched</strong> over data policies</p></li></ul><p>The mechanism: <strong>How you handle others&#8217; data signals how you&#8217;ll handle users&#8217; data.</strong></p><p>When users discover your AI is built on unlicensed scraping&#8212;violating publisher terms and potentially copyright law&#8212;they infer you&#8217;ll be equally cavalier with their personal data.</p><h3><strong>Enterprise Procurement</strong></h3><p>For B2B AI applications, data sourcing practices are explicit RFP requirements.</p><p>From enterprise AI licensing guidelines:[^42]</p><p>&#8220;All content released through AI services must be: Originally created by the publisher, appropriately licensed from third-party rights holders, used as permitted by rights holders, or used as otherwise permitted by law.&#8221;</p><p>The critical clause: &#8220;<strong>Customer&#8217;s sole responsibility to ensure appropriate rights to all content input to AI service</strong>&#8220;</p><p>Translation: if your AI uses unlicensed scraped content and gets your enterprise customer sued, that&#8217;s on you&#8212;and you won&#8217;t get the contract.</p><p><strong>If you can&#8217;t prove your content is licensed, you can&#8217;t win enterprise deals.</strong></p><h3><strong>The Investor Due Diligence Problem</strong></h3><p>Web scraping isn&#8217;t just a legal risk&#8212;it&#8217;s a <strong>deal risk</strong>.</p><p>When AI companies go through fundraising, M&amp;A, or IPO processes, investor due diligence assesses:</p><ul><li><p>Violation of computer usage laws</p></li><li><p>Consumer privacy compliance</p></li><li><p>Material Non-Public Information handling</p></li><li><p>IP liability exposure</p></li></ul><p><strong>Section 204A of the Investment Advisers Act</strong> requires written policies to prevent MNPI misuse.[^31] For venture-backed companies, web scraping exposure can be a <strong>deal blocker</strong>.</p><p>M&amp;A transactions with companies using unlicensed scraping must carefully allocate liability. Acquirers don&#8217;t want to inherit your legal time bomb.</p><h3><strong>Data Ethics as Competitive Differentiation</strong></h3><p>The market is responding. Leading AI companies are pivoting from scraping to licensing.</p><p><strong>Major AI content licensing deals (2024):</strong>[^46]</p><ul><li><p><strong>OpenAI + News Corp</strong>: 5-year deal worth over <strong>$250M</strong></p></li><li><p><strong>OpenAI + Dotdash Meredith</strong>: Worth at least <strong>$16M</strong></p></li><li><p><strong>OpenAI + Axel Springer</strong>: <strong>$25M</strong> one-off payment plus variable fees</p></li></ul><p>PwC&#8217;s 2024 Trust Survey found that <strong>67% of customers prioritize hearing how companies protect data</strong>&#8212;but fewer executives (32%, down from 42%) are actually disclosing privacy policies.[^47]</p><p>That creates opportunity: <strong>AI companies that transparently demonstrate ethical data sourcing can differentiate on trust</strong>, not just model performance.</p><p>&#8220;Ethical data sourcing&#8221; isn&#8217;t compliance theater&#8212;it&#8217;s a <strong>competitive moat</strong>. It unlocks enterprise sales, improves retention, facilitates fundraising, and builds sustainable publisher relationships.</p><h2><strong>The Total Cost: What Scraping Really Costs</strong></h2><p>Let&#8217;s calculate the true cost for a realistic AI application.</p><p><strong>Assumptions:</strong></p><ul><li><p>100,000 conversations per month</p></li><li><p>2 content retrievals per conversation</p></li><li><p>50 target publishers for coverage</p></li><li><p>Claude Sonnet 4.5 ($3 input, $15 output per million tokens)</p></li></ul><h3><strong>Token Costs (Annual)</strong></h3><ul><li><p><strong>Scraped (full articles):</strong> 240M tokens/month = $8,640/year</p></li><li><p><strong>Targeted content (what you actually need):</strong> 35M tokens/month = $1,260/year</p></li><li><p><strong>Real token waste:</strong> $7,380/year (85% waste from overconsumption)</p></li></ul><h3><strong>Engineering Costs (Annual)</strong></h3><p><strong>In-house development:</strong></p><ul><li><p>Initial build (Year 1): $70,000</p></li><li><p>Ongoing maintenance: $120,000 (25% of 3-person team)</p></li></ul><p><strong>Third-party services:</strong></p><ul><li><p>Service costs: $15,000/year</p></li><li><p>Integration/monitoring: $24,000 (15% engineer time)</p></li><li><p>Total: $39,000/year</p></li><li><p><strong>But:</strong> Loss of control, quality unpredictability, legal liability still yours</p></li></ul><h3><strong>Infrastructure Costs (Annual)</strong></h3><ul><li><p>Proxy services: $12,000</p></li><li><p>CAPTCHA solving: $2,784</p></li><li><p>Storage/processing: $6,000</p></li><li><p><strong>Total:</strong> $20,784/year</p></li></ul><h3><strong>Legal Risk Costs (Annual)</strong></h3><ul><li><p>Cease-and-desist responses: $30,000-$75,000</p></li><li><p>Investor due diligence: $25,000-$50,000</p></li><li><p><strong>Conservative estimate:</strong> $55,000 (excluding lawsuits)</p></li></ul><h3><strong>Opportunity Costs (Annual)</strong></h3><p>Engineering time NOT spent on:</p><ul><li><p>Product features driving engagement</p></li><li><p>Model optimization</p></li><li><p>UX improvements</p></li><li><p>New capabilities</p></li></ul><p><strong>Estimated:</strong> $100,000 in lost product value</p><h3><strong>Quality Degradation Costs (Annual)</strong></h3><ul><li><p>User churn from poor quality: $250,000 in lost LTV (5% higher churn at 100K MAU, $50 LTV)</p></li><li><p>Lower DAU/MAU from latency/accuracy: $100,000 in reduced growth</p></li><li><p><strong>Total:</strong> $350,000/year</p></li></ul><h3><strong>TOTAL HIDDEN COST</strong></h3><p><strong>In-house (Year 1):</strong> $723,380 <strong>In-house (Ongoing):</strong> $653,380/year</p><p><strong>Third-party (Year 1):</strong> $501,380 <strong>Third-party (Ongoing):</strong> $501,380/year</p><p><strong>The Dilemma:</strong></p><p>Third-party services appear cheaper, but you sacrifice control and quality. Most teams start with third-party, hit limitations, and build custom anyway&#8212;paying for both in transition.</p><p>At 1 million conversations/month (10x scale):</p><ul><li><p><strong>Token waste alone:</strong> ~$74K/year</p></li><li><p><strong>In-house:</strong> ~$6.5M/year</p></li><li><p><strong>Third-party:</strong> ~$5M/year</p></li></ul><p>Neither option is sustainable. Both carry legal risk. Both degrade quality. Both force painful trade-offs.</p><p>This is what &#8220;free&#8221; content actually costs.</p><h2><strong>The Impossible Choice</strong></h2><p>AI application builders face an impossible dilemma.</p><p><strong>Option 1: Keep scraping</strong></p><ul><li><p>Token bloat burning money</p></li><li><p>25% of engineering time on maintenance</p></li><li><p>Legal exposure accumulating</p></li><li><p>Answer quality degrading</p></li><li><p>User trust eroding</p></li><li><p>Can&#8217;t pass enterprise RFPs</p></li><li><p>Cost: $500K-$6.5M+/year depending on scale</p></li></ul><p><strong>Option 2: Blanket licensing with major publishers</strong></p><ul><li><p>OpenAI paying $250M over 5 years to News Corp</p></li><li><p>Only works with OpenAI&#8217;s budget and leverage</p></li><li><p>Doesn&#8217;t cover long tail of publishers</p></li><li><p>Per-publisher negotiations = massive overhead</p></li><li><p>Cost: Millions upfront and ongoing</p></li></ul><p><strong>Option 3: Reduce content coverage</strong></p><ul><li><p>Limit to fewer publishers to reduce burden</p></li><li><p>Accept lower quality and completeness</p></li><li><p>Users get worse experience than competitors</p></li><li><p>Cost: Lost market share</p></li></ul><p><strong>Option 4: Build direct publisher relationships</strong></p><ul><li><p>Negotiate individual licensing deals</p></li><li><p>Requires legal team and business development</p></li><li><p>Each publisher wants different terms</p></li><li><p>Doesn&#8217;t scale beyond 10-20 publishers</p></li><li><p>Cost: Prohibitive for startups</p></li></ul><p>Every option has fatal flaws.</p><p>Scraping is expensive, risky, and degrades quality. Blanket licensing is only viable for giants. Reducing coverage kills competitiveness. Direct relationships don&#8217;t scale.</p><p><strong>The current content sourcing model for AI applications is fundamentally broken.</strong></p><h2><strong>The Strategic Window</strong></h2><p>The market is at an inflection point.</p><p>Publishers have realized AI companies are an existential threat to their traffic and revenue. They&#8217;re fighting back with lawsuits, technical countermeasures, and public pressure. 28% of critical sources have already blocked AI scraping&#8212;and that percentage is growing monthly.</p><p>Anti-bot technology is improving rapidly. Cloudflare Bot Management, PerimeterX, and similar services make scraping exponentially more difficult and expensive. The arms race favors defenders, not scrapers.</p><p>Legal precedents are being established. The NYT lawsuit, News Corp lawsuit, and Canadian publisher actions are creating case law that will make unlicensed scraping increasingly untenable.</p><p>Consumer and enterprise awareness of data ethics is rising. Users care how you source content. Enterprises require proof of licensing in RFPs. Investors scrutinize data practices in due diligence.</p><p>Market leaders like OpenAI are pivoting from scraping to licensing&#8212;demonstrating that even companies with effectively unlimited budgets recognize scraping is unsustainable.</p><p><strong>What worked in 2023 doesn&#8217;t work in 2025.</strong></p><p>The AI application founders who recognize web scraping&#8217;s true cost now&#8212;before the legal bills arrive, before the engineering debt becomes unmanageable, before user trust collapses&#8212;will have the opportunity to build on different infrastructure.</p><p>The most sophisticated AI teams are exploring fundamentally different approaches, not incremental fixes to scrapers, but infrastructure built for the realities of the Agentic Web where AI agents become primary internet users and content economics shift from advertising-driven traffic to value-based access.</p><p>Those who continue believing scraping is &#8220;cheap&#8221; will learn the truth the hard way: through $10M legal bills, broken scrapers consuming 30% of engineering time, user churn from poor answer quality, and lost enterprise deals because they can&#8217;t prove content licensing.</p><h2><strong>The Question</strong></h2><p>The current content sourcing model is broken.</p><p>Scraping appears free but costs millions. It burns tokens, wastes engineering time, creates legal exposure, degrades answer quality, and destroys user trust.</p><p>Most founders don&#8217;t realize the true cost&#8212;until they do the math.</p><p>The question isn&#8217;t whether AI applications need to find a better way to source content.</p><p>The question is whether you&#8217;ll figure it out before your competitors do.</p><p>The content sourcing infrastructure for AI applications is fundamentally broken. The companies that recognize this now&#8212;before million-dollar legal bills, before engineering debt consumes 30% of team capacity, before quality degradation kills user trust&#8212;will build on different foundations.</p><p>That infrastructure is being built. The question is whether you&#8217;ll adopt it while you have strategic choice, or be forced to migrate when scraping costs become undeniable.</p><p>The strategic window is open. For now.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>References</strong></h2><p>[^1]: Trafilatura and newspaper3k are popular Python libraries for content extraction from web pages. Trafilatura documentation: <a href="https://trafilatura.readthedocs.io/en/latest/evaluation.html">https://trafilatura.readthedocs.io/en/latest/evaluation.html</a></p><p>[^2]: Griffin Mott Consulting: How Many Words Are Usually In An Article? (2025). <a href="https://griffinmottconsulting.com/blog/ideal-article-length/">https://griffinmottconsulting.com/blog/ideal-article-length/</a></p><p>[^3]: Trafilatura can extract metadata, main body text and comments. Documentation: <a href="https://pypi.org/project/trafilatura/0.5.0/">https://pypi.org/project/trafilatura/0.5.0/</a></p><p>[^4]: OpenAI Developer Community: Markdown is 15% more token efficient than JSON. <a href="https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742">https://community.openai.com/t/markdown-is-15-more-token-efficient-than-json/841742</a></p><p>[^5]: Anthropic: Effective Context Engineering for AI Agents. <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents</a></p><p>[^6]: LlamaIndex: Context Engineering - What it is, and techniques to consider. <a href="https://www.llamaindex.ai/blog/context-engineering-what-it-is-and-techniques-to-consider">https://www.llamaindex.ai/blog/context-engineering-what-it-is-and-techniques-to-consider</a></p><p>[^7]: Chroma Research: Context Rot - How Increasing Input Tokens Impacts LLM Performance. <a href="https://research.trychroma.com/context-rot">https://research.trychroma.com/context-rot</a></p><p>[^8]: Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., &amp; Liang, P. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157-173. <a href="https://arxiv.org/abs/2307.03172">https://arxiv.org/abs/2307.03172</a></p><p>[^9]: Context Length Alone Hurts LLM Performance Despite Perfect Retrieval (2025). <a href="https://arxiv.org/html/2510.05381v1">https://arxiv.org/html/2510.05381v1</a></p><p>[^10]: Why Does the Effective Context Length of LLMs Fall Short? (2024). <a href="https://arxiv.org/html/2410.18745v1">https://arxiv.org/html/2410.18745v1</a></p><p>[^11]: Glean: How input token count impacts the latency of AI chat tools. <a href="https://www.glean.com/blog/glean-input-token-llm-latency">https://www.glean.com/blog/glean-input-token-llm-latency</a></p><p>[^12]: Understanding Latency, Throughput, and Context Length in LLM Hosting. <a href="https://www.databasemart.com/blog/llm-hosting-latency-throughput-context-length">https://www.databasemart.com/blog/llm-hosting-latency-throughput-context-length</a></p><p>[^13]: AWS: Optimizing AI responsiveness - Amazon Bedrock latency-optimized inference. <a href="https://aws.amazon.com/blogs/machine-learning/optimizing-ai-responsiveness-a-practical-guide-to-amazon-bedrock-latency-optimized-inference/">https://aws.amazon.com/blogs/machine-learning/optimizing-ai-responsiveness-a-practical-guide-to-amazon-bedrock-latency-optimized-inference/</a></p><p>[^14]: LongICLBench: Long-context LLMs Struggle with Long In-context Learning (2024). <a href="https://arxiv.org/html/2404.02060v3">https://arxiv.org/html/2404.02060v3</a></p><p>[^15]: CleverTap: DAU vs. MAU - App Stickiness Metrics Explained. <a href="https://clevertap.com/blog/dau-vs-mau-app-stickiness-metrics/">https://clevertap.com/blog/dau-vs-mau-app-stickiness-metrics/</a></p><p>[^16]: Gainsight: Essential Guide to DAU/MAU Ratio. <a href="https://www.gainsight.com/essential-guide/product-management-metrics/dau-mau/">https://www.gainsight.com/essential-guide/product-management-metrics/dau-mau/</a></p><p>[^17]: GitHub Copilot users complete tasks 55.8% faster than control groups. Source: arXiv - The Impact of AI on Developer Productivity. <a href="https://arxiv.org/abs/2302.06590">https://arxiv.org/abs/2302.06590</a></p><p>[^18]: Tavily pricing: $30-$100+/month. Firecrawl pricing: $16-$333+/month. Sources: <a href="https://docs.tavily.com/documentation/api-credits">https://docs.tavily.com/documentation/api-credits</a> and <a href="https://www.firecrawl.dev/pricing">https://www.firecrawl.dev/pricing</a></p><p>[^19]: Software Engineer average salary 2025-2026: $112-129K base, ~$160K fully loaded with benefits. Source: <a href="https://www.coursera.org/articles/software-engineer-salary">https://www.coursera.org/articles/software-engineer-salary</a></p><p>[^20]: The True Costs of a Web Scraping Project. </p><p>[^20]: How Much Does Web Scraping Cost - The Ultimate Guide. <a href="https://webautomation.io/blog/how-much-does-web-scraping-cost-the-ultimate-guide/">https://webautomation.io/blog/how-much-does-web-scraping-cost-the-ultimate-guide/</a></p><p>[^21]: How Much Does Web Scraping Cost. <a href="https://www.zenrows.com/blog/web-scraping-cost">https://www.zenrows.com/blog/web-scraping-cost</a></p><p>[^22]: Best CAPTCHA Proxies in 2025. <a href="https://www.zenrows.com/blog/captcha-proxies">https://www.zenrows.com/blog/captcha-proxies</a></p><p>[^23]: How Does Proxies Help CAPTCHA Bypass. <a href="https://www.octoparse.com/blog/use-proxies-to-bypass-captcha">https://www.octoparse.com/blog/use-proxies-to-bypass-captcha</a></p><p>[^24]: Lewis Silkin: NYT v OpenAI - Publishing Sector&#8217;s AI Content-Scraping Conundrum. <a href="https://www.lewissilkin.com/insights/2024/01/19/nyt-v-openai-the-publishing-sectors-ai-content-scraping-conundrum">https://www.lewissilkin.com/insights/2024/01/19/nyt-v-openai-the-publishing-sectors-ai-content-scraping-conundrum</a></p><p>[^25]: Judge Orders OpenAI to Hand Over 20 Million ChatGPT Logs in NYT Copyright Clash. <a href="https://www.analyticsinsight.net/news/judge-orders-openai-to-hand-over-20-million-chatgpt-logs-in-nyt-copyright-clash">https://www.analyticsinsight.net/news/judge-orders-openai-to-hand-over-20-million-chatgpt-logs-in-nyt-copyright-clash</a></p><p>[^26]: The Hollywood Reporter: NYT Has Spent $10.8M In Legal Battle With OpenAI. <a href="https://www.hollywoodreporter.com/business/business-news/new-york-times-legal-battle-openai-1236127637/">https://www.hollywoodreporter.com/business/business-news/new-york-times-legal-battle-openai-1236127637/</a></p><p>[^26]: The Register: Major publishers sue Perplexity AI for scraping content. <a href="https://www.theregister.com/2024/10/22/publishers_sue_perplexity_ai/">https://www.theregister.com/2024/10/22/publishers_sue_perplexity_ai/</a></p><p>[^27]: TechCrunch: News outlets accusing Perplexity of plagiarism and unethical web scraping. <a href="https://techcrunch.com/2024/07/02/news-outlets-are-accusing-perplexity-of-plagiarism-and-unethical-web-scraping/">https://techcrunch.com/2024/07/02/news-outlets-are-accusing-perplexity-of-plagiarism-and-unethical-web-scraping/</a></p><p>[^28]: American Bar Association: OpenAI Sued for Data Scraping in Canada. <a href="https://www.americanbar.org/groups/business_law/resources/business-law-today/2025-february/openai-sued-data-scraping-canada/">https://www.americanbar.org/groups/business_law/resources/business-law-today/2025-february/openai-sued-data-scraping-canada/</a></p><p>[^29]: TermsFeed: Terms &amp; Conditions to Stop Screen Scraping. <a href="https://www.termsfeed.com/blog/terms-conditions-stop-screen-scraping/">https://www.termsfeed.com/blog/terms-conditions-stop-screen-scraping/</a></p><p>[^30]: 404 Media: The Backlash Against AI Scraping Is Real and Measurable. <a href="https://www.404media.co/the-backlash-against-ai-scraping-is-real-and-measurable/">https://www.404media.co/the-backlash-against-ai-scraping-is-real-and-measurable/</a></p><p>[^31]: Akin Gump: Legal Implications of Web Scraping for Investment Firms. <a href="https://www.akingump.com/a/web/soxXRQ6Nw48FehNvwpdjJ1/2jiuhx/hflr-reprint-to-scrape-or-not-to-scrape-rappaport-altman-handschumacher-4819-0662-7801-v1.pdf">https://www.akingump.com/a/web/soxXRQ6Nw48FehNvwpdjJ1/2jiuhx/hflr-reprint-to-scrape-or-not-to-scrape-rappaport-altman-handschumacher-4819-0662-7801-v1.pdf</a></p><p>[^32]: PYMNTS: Web Scraping Wars - How Businesses Are Fighting AI Data Harvesting. <a href="https://www.pymnts.com/artificial-intelligence-2/2024/web-scraping-wars-how-businesses-are-fighting-ai-data-harvesting">https://www.pymnts.com/artificial-intelligence-2/2024/web-scraping-wars-how-businesses-are-fighting-ai-data-harvesting</a></p><p>[^33]: DropSite News: LEAKED - Top Websites Meta Is Scraping for AI. </p><p>[^34]: Stack Overflow: Algorithm for reading actual content of news articles. <a href="https://stackoverflow.com/questions/1451894/algorithm-for-reading-the-actual-content-of-news-articles-and-ignoring-noise-o">https://stackoverflow.com/questions/1451894/algorithm-for-reading-the-actual-content-of-news-articles-and-ignoring-noise-o</a></p><p>[^35]: Long Context RAG Performance of Large Language Models (2024). <a href="https://arxiv.org/html/2411.03538v1">https://arxiv.org/html/2411.03538v1</a></p><p>[^36]: arXiv: Retrieval-Augmented Generation - A Comprehensive Survey. <a href="https://arxiv.org/html/2506.00054v1">https://arxiv.org/html/2506.00054v1</a></p><p>[^37]: arXiv: A Survey on Hallucination in Large Language Models. <a href="https://arxiv.org/abs/2311.05232">https://arxiv.org/abs/2311.05232</a></p><p>[^38]: arXiv: The Dawn After the Dark - Empirical Study on Factuality Hallucination. <a href="https://arxiv.org/html/2401.03205v1">https://arxiv.org/html/2401.03205v1</a></p><p>[^39]: arXiv: MSRS - Evaluating Multi-Source Retrieval-Augmented Generation. <a href="https://arxiv.org/html/2508.20867">https://arxiv.org/html/2508.20867</a></p><p>[^40]: arXiv: Towards Multi-Source RAG via Synergizing Reasoning and Preference-Driven Retrieval. <a href="https://arxiv.org/html/2411.00689v1">https://arxiv.org/html/2411.00689v1</a></p><p>[^41]: Cisco Newsroom: How safe is our data? Consumers want to know. <a href="https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2024/m10/how-safe-is-our-data-consumers-want-to-know.html">https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2024/m10/how-safe-is-our-data-consumers-want-to-know.html</a></p><p>[^42]: Arphie: What is RFP legal requirements? <a href="https://www.arphie.ai/glossary/rfp-legal-requirements">https://www.arphie.ai/glossary/rfp-legal-requirements</a></p><p>[^43]: TechCrunch: News outlets accusing Perplexity of plagiarism and unethical web scraping. <a href="https://techcrunch.com/2024/07/02/news-outlets-are-accusing-perplexity-of-plagiarism-and-unethical-web-scraping/">https://techcrunch.com/2024/07/02/news-outlets-are-accusing-perplexity-of-plagiarism-and-unethical-web-scraping/</a></p><p>[^44]: DropSite News: LEAKED - Top Websites Meta Is Scraping for AI. </p><p>[^45]: Lewis Silkin: NYT v OpenAI - Publishing Sector&#8217;s AI Content-Scraping Conundrum. <a href="https://www.lewissilkin.com/insights/2024/01/19/nyt-v-openai-the-publishing-sectors-ai-content-scraping-conundrum">https://www.lewissilkin.com/insights/2024/01/19/nyt-v-openai-the-publishing-sectors-ai-content-scraping-conundrum</a></p><p>[^46]: Digiday: 2024 in review - Timeline of major deals between publishers and AI companies. <a href="https://digiday.com/media/2024-in-review-a-timeline-of-the-major-deals-between-publishers-and-ai-companies/">https://digiday.com/media/2024-in-review-a-timeline-of-the-major-deals-between-publishers-and-ai-companies/</a></p><p>[^47]: PwC: 2024 Trust Survey - How to earn customer trust. <a href="https://www.pwc.com/us/en/library/trust-in-business-survey/customer-trust-in-your-sector.html">https://www.pwc.com/us/en/library/trust-in-business-survey/customer-trust-in-your-sector.html</a></p><p>[^48]: K2View: RAG hallucination - What is it and how to avoid it. <a href="https://www.k2view.com/blog/rag-hallucination/">https://www.k2view.com/blog/rag-hallucination/</a></p><p>[^49]: Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., &amp; Liang, P. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157-173. <a href="https://arxiv.org/abs/2307.03172">https://arxiv.org/abs/2307.03172</a></p><p>[^50]: TechCrunch: Why RAG won&#8217;t solve generative AI&#8217;s hallucination problem. <a href="https://techcrunch.com/2024/05/04/why-rag-wont-solve-generative-ais-hallucination-problem/">https://techcrunch.com/2024/05/04/why-rag-wont-solve-generative-ais-hallucination-problem/</a></p><p>[^51]: arXiv: A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems - Progress, Gaps, and Future Directions (2025). <a href="https://arxiv.org/html/2507.18910v1">https://arxiv.org/html/2507.18910v1</a></p><p>[^52]: MDPI Mathematics: Hallucination Mitigation for Retrieval-Augmented Large Language Models - A Review (March 2025). <a href="https://www.mdpi.com/2227-7390/13/5/856">https://www.mdpi.com/2227-7390/13/5/856</a></p><p>[^53]: arXiv: ReDeEP - Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability (2024). ICLR 2025. <a href="https://arxiv.org/abs/2410.11414">https://arxiv.org/abs/2410.11414</a></p><p>[^54]: arXiv: Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought (February 2024). <a href="https://arxiv.org/html/2402.04004v2">https://arxiv.org/html/2402.04004v2</a></p><p>[^55]: ACL Anthology: RAG-HAT - A Hallucination-Aware Tuning Pipeline for LLM in Retrieval-Augmented Generation. EMNLP 2024. <a href="https://aclanthology.org/2024.emnlp-industry.113/">https://aclanthology.org/2024.emnlp-industry.113/</a></p>]]></content:encoded></item><item><title><![CDATA[The Billion Dollar Question: What Happens to Publishers When Clicks Disappear?]]></title><description><![CDATA[The traffic catastrophe is real. The alternatives don&#8217;t add up. And the infrastructure that could save publishers doesn&#8217;t exist yet.]]></description><link>https://bakagiannis.substack.com/p/the-billion-dollar-question-what</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/the-billion-dollar-question-what</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Mon, 01 Dec 2025 17:22:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e-LI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3004a7ff-088f-42f8-98f2-66b8d48d3783_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://context4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png" width="500" height="200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:200,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50828,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://context4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/180416357?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Si0E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 424w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 848w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1272w, https://substackcdn.com/image/fetch/$s_!Si0E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdd76150-2442-4b2b-8fe4-50ac9446237f_500x200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>Every publisher executive has seen the charts. Traffic down 25%. Down 40%. Down 90%. The numbers keep getting worse, and the explanations keep getting vaguer. &#8220;Algorithm changes.&#8221; &#8220;Market headwinds.&#8221; &#8220;Strategic pivots.&#8221;</p><p>Let&#8217;s be direct: <strong>AI is cannibalizing publisher traffic at an accelerating rate, and no one has figured out how to replace the revenue.</strong></p><p>Between May 2024 and May 2025, <a href="https://click-vision.com/zero-click-search-statistics">zero-click searches surged from 56% to 69%</a>&#8212;a 13 percentage point jump in just 12 months. When Google&#8217;s AI Overviews appear, <a href="https://www.searchenginejournal.com/impact-of-ai-overviews-how-publishers-need-to-adapt/556843/">zero-click rates hit 80-83%</a>. The link economy that sustained digital publishing for 25 years isn&#8217;t declining&#8212;it&#8217;s collapsing.</p><p>This article examines the data no one wants to talk about: the exact magnitude of traffic losses by content vertical, the correlation between AI trust and cannibalization velocity, and most importantly, the math that shows none of the current monetization alternatives can replace what&#8217;s being lost. Then we&#8217;ll explore the open question: what would a real solution actually look like?</p><h2><strong>Part 1: The Cannibalization Pattern</strong></h2><h3><strong>The Numbers Are Worse Than Reported</strong></h3><p>The headline figure&#8212;<a href="https://digiday.com/media/google-ai-overviews-linked-to-25-drop-in-publisher-referral-traffic-new-data-shows/">publishers losing 25-27% of traffic year-over-year</a>&#8212;masks catastrophic variation by content type. When you break down the data by vertical, a disturbing pattern emerges: the content categories publishers thought were &#8220;safe&#8221; are getting hit hardest.</p><p><strong>Educational Content: The Canary in the Coal Mine</strong></p><p><a href="https://www.edtechinnovationhub.com/news/chegg-reports-24-revenue-drop-sues-google-over-ai-impact-on-online-learning">Chegg lost 49% of its non-subscriber traffic between January 2024 and January 2025</a>. Total revenues dropped 24% in Q4 2024. The company laid off 45% of its workforce&#8212;388 people&#8212;and sued Google over AI Overviews &#8220;stealing&#8221; their traffic.</p><p>Why educational content? Because AI answers &#8220;how do I solve this calculus problem?&#8221; or &#8220;explain photosynthesis in simple terms&#8221; <em>perfectly</em>. Students don&#8217;t need to click through to Chegg when ChatGPT gives them the answer in 3 seconds. The product&#8212;instant educational content&#8212;is identical, but the user never lands on the publisher&#8217;s site.</p><p><strong>Recipe Content: Death by Convenience</strong></p><p>Food and recipe publishers saw <a href="https://fortune.com/2025/11/26/ai-slop-recipes-thanksgiving-food-blog-collapse-traffic/">traffic declines of 30-50% in 2024</a>. Epicurious.com traffic dropped 37% year-over-year by December 2024. Thanksgiving recipe searches&#8212;traditionally a traffic bonanza&#8212;were down and some reported losses of 40% year-over-year.</p><p>The format is the problem. Recipe content is perfectly structured for AI summarization: ingredients list, numbered steps, cook time, serving size. AI Overviews can display the entire recipe without requiring a single click. When <a href="https://www.searchenginejournal.com/impact-of-ai-overviews-how-publishers-need-to-adapt/556843/">click-through rates drop 34-89% for queries with AI summaries</a>, recipe publishers lose both traffic and the ad revenue that depended on it.</p><p><strong>Travel Content: An Extinction-Level Event</strong></p><p>Travel and tourism sites saw <a href="https://www.theregister.com/2025/06/22/ai_search_starves_publishers/">20% year-over-year traffic declines</a>, but the aggregate number obscures individual catastrophes. The Planet D, a travel blog, <a href="https://www.theregister.com/2025/06/22/ai_search_starves_publishers/">shut down after losing 90% of its traffic to AI Overviews</a>. Individual travel bloggers report <a href="https://www.dangerous-business.com/how-google-and-ai-are-killing-travel-blogs-like-mine/">40% traffic drops and 34% ad income losses</a> year-over-year.</p><p>&#8220;Best hotels in Barcelona.&#8221; &#8220;3-day Rome itinerary.&#8221; &#8220;What to pack for Iceland in winter.&#8221; These are exactly the queries AI excels at answering&#8212;and exactly the high-commercial-intent queries that drove affiliate revenue for travel publishers.</p><p><strong>News: Relatively Resilient (For Now)</strong></p><p>News publishers saw more moderate declines: <a href="https://pressgazette.co.uk/media-audience-and-business-data/uk-and-us-publishers-says-google-ai-is-harming-website-traffic/">median traffic down 7% for news brands versus 14% for non-news brands</a>. Major publishers averaged 10% losses from Google Search.</p><p>Why is news more resilient? Breaking news requires real-time, attributed sources. Google&#8217;s E-E-A-T (Experience, Expertise, Authoritativeness, Trust) guidelines still prioritize established news brands for current events. Legal liability concerns limit how definitively AI can present news without attribution.</p><p>But this resilience is temporary. As AI attribution systems improve and users grow comfortable consuming news through AI interfaces, news publishers will face the same cannibalization that&#8217;s already devastated educational, recipe, and travel content.</p><p><strong>Technology Content: The Cautionary Tale</strong></p><p><a href="https://developers.slashdot.org/story/25/01/10/1729248/stackoverflow-usage-plummets-as-ai-chatbots-rise">Stack Overflow saw a 75% drop in new questions from its 2017 peak</a>. Year-over-year (December 2024 vs. 2023), questions declined 60%. Since ChatGPT launched in November 2022, usage has dropped 76%.</p><p>Stack Overflow&#8217;s collapse reveals something more fundamental than traffic loss: the entire interaction paradigm changed. Developers didn&#8217;t just stop clicking through to Stack Overflow&#8212;they stopped <em>needing</em> Stack Overflow. The question-and-answer model became obsolete when AI could generate, debug, and explain code in real-time.</p><p>This isn&#8217;t about Google stealing traffic. It&#8217;s about agentic systems replacing the underlying use case.</p><p>Publishers fixate on Google because they built on rented land&#8212;they never owned distribution. But even if they had, the business would still face existential transformation. The platform shift happening now is category-agnostic.</p><p>Stack Overflow&#8217;s fate isn&#8217;t a cautionary tale about SEO strategy. It&#8217;s a preview of what happens when AI doesn&#8217;t just answer questions&#8212;it <em>performs tasks</em> that eliminate the need for informational content entirely.</p><p>Publishers who think &#8220;we just need better Google rankings&#8221; are solving for 2015. The question isn&#8217;t how to reclaim traffic from AI Overviews. It&#8217;s how to provide value in a world where users don&#8217;t need to visit publisher sites at all because agentic systems handle the entire workflow.</p><p>Adapt to the paradigm shift, or face Stack Overflow&#8217;s destiny&#8212;not because Google took your traffic, but because your entire content category became functionally obsolete.</p><h3><strong>The Trust Paradox: Why &#8220;Safe&#8221; Verticals Are Most Vulnerable</strong></h3><p>Here&#8217;s the uncomfortable truth everyone&#8217;s missing: <strong>The content categories where users trust AI answers most are getting cannibalized fastest.</strong></p><p><a href="https://kpmg.com/us/en/media/news/generative-ai-consumer-trust-survey.html">KPMG&#8217;s 2024 trust survey</a> found 56% of consumers trust AI for educational resources&#8212;the highest of any application. Recipe and how-to content have low trust barriers because the stakes are low (&#8221;what&#8217;s the worst that can happen if the recipe is slightly off?&#8221;). Developers trust AI for code because they can test it immediately.</p><p>Now look at the cannibalization rates:</p><ul><li><p><strong>Educational content (56% trust):</strong> Chegg down 49%</p></li><li><p><strong>Recipe content (low-stakes trust):</strong> Down 30-50%</p></li><li><p><strong>Developer content (high trust):</strong> Stack Overflow questions down 75%</p></li><li><p><strong>Medical content (low trust):</strong> Minimal reported traffic loss</p></li></ul><p>The relationship isn&#8217;t &#8220;more AI trust = more AI usage.&#8221; It&#8217;s <strong>&#8220;more AI trust = fewer clicks to publishers.&#8221;</strong></p><p>When users trust AI answers, they don&#8217;t need to verify by clicking through to the source. The zero-click search becomes the terminal action. <a href="https://click-vision.com/zero-click-search-statistics">When AI Overviews appear, 80% of searches end without a click</a>. For high-trust verticals, that number is likely even higher.</p><p><strong>This creates a devastating implication for publishers:</strong> Medical, legal, and financial content&#8212;currently &#8220;protected&#8221; by low AI trust and regulatory concerns&#8212;will face accelerating cannibalization the moment user trust rises. <a href="https://www.salesforce.com/news/stories/trusted-ai-data-statistics/">75% of workers say accurate data is critical to AI trust</a>. As AI accuracy improves in these high-stakes verticals, the trust barrier will fall, and traffic will collapse.</p><p>The conventional wisdom says &#8220;build trust to survive AI.&#8221; The data suggests the opposite: <strong>Publishers in high-trust verticals should fear AI adoption more, because trust drives zero-click behavior.</strong></p><h3><strong>The Inflection Point: What Happens at 80-90% Zero-Click?</strong></h3><p>Zero-click searches jumped from 56% to 69% in 12 months&#8212;a 23% relative increase. If this trajectory continues:</p><ul><li><p><strong>2026:</strong> 75% zero-click</p></li><li><p><strong>2027:</strong> 80% zero-click</p></li><li><p><strong>2028:</strong> 85% zero-click</p></li></ul><p>At 80-90% zero-click rates, the fundamental assumption underlying digital publishing&#8212;that content creation leads to traffic, which leads to monetization&#8212;breaks completely.</p><p><a href="https://www.semrush.com/blog/semrush-ai-overviews-study/">AI Overview appearance rates have doubled</a> from January 2025 (6.49%) to June 2025 (13.14%). Longer queries (8+ words) trigger AI summaries much more frequently. As AI coverage expands from simple queries (&#8221;capital of France&#8221;) to complex queries (&#8221;compare fixed-rate vs. adjustable-rate mortgages for first-time homebuyers in 2025&#8221;), zero-click rates will accelerate.</p><p><strong>What does this mean practically?</strong></p><p>If you&#8217;re a publisher doing 10 million monthly visits today, an 80% zero-click rate means 8 million of those visits disappear. If your revenue model depends on $3 CPMs across 7 ad impressions per visit, you&#8217;ve just lost:</p><p>8,000,000 visits &#215; 7 impressions &#215; $0.003 = <strong>$168,000 per month</strong> = <strong>$2.02 million per year</strong></p><p>That&#8217;s not a &#8220;headwind.&#8221; That&#8217;s an existential threat.</p><h2><strong>Part 2: Why None of the Alternatives Work at Scale</strong></h2><p>The consulting decks all say the same thing: &#8220;diversify revenue streams.&#8221; Subscriptions. Commerce content. Licensing deals. Events. Podcasts. Consulting services.</p><p>Let&#8217;s do the math and see if any of these can actually replace what&#8217;s being lost.</p><h3><strong>Advertising: The Structural Decline</strong></h3><p>Display and programmatic advertising remains the largest revenue source for most publishers, but it&#8217;s in structural decline from multiple directions.</p><p><strong>The Numbers:</strong></p><ul><li><p><a href="https://www.publift.com/blog/programmatic-advertising-trends">U.S. programmatic advertising: $168 billion (2024)</a></p></li><li><p><a href="https://www.therebooting.com/report/the-state-of-publisher-ad-revenue/">Publishers&#8217; share of global ad investment: 27.2% (2024) versus 71% a decade ago</a></p></li><li><p><strong>Over 10 years: -43.8 percentage points</strong></p></li></ul><p>That&#8217;s not just traffic loss from AI. That&#8217;s ad dollars flowing to walled gardens (Google, Facebook, Amazon), ad blocking (<a href="https://www.admonsters.com/ad-blocking-a-54b-problem-for-publishers-in-2024/">costing publishers $54 billion in 2024</a>), and social referral collapse (<a href="https://www.socialmediatoday.com/news/facebook-publisher-referrals-decline-50-percent/715745/">Facebook referrals down 50-58% over 6 years</a>).</p><p><strong>The trajectory is unmistakable:</strong> Advertisers follow attention. As users spend more time in AI interfaces (ChatGPT, Perplexity, Google AI Overviews) and less time on publisher sites, ad budgets will follow. The $168 billion programmatic market is growing, but publishers&#8217; share is shrinking.</p><h3><strong>Subscriptions: Limited Addressable Market</strong></h3><p>Subscriptions are the darling of every publisher strategy deck. <a href="https://localmedia.org/2024/01/digital-subscriptions-trends-for-2024-and-what-publishers-can-do-to-grow/">80% of publishers cite digital subscriptions as their most important revenue stream</a>, up from 74% in 2020.</p><p><strong>The growth story:</strong></p><ul><li><p><a href="https://www.inma.org/blogs/reader-revenue/post.cfm/subscription-growth-vs-revenue-growth-it-matters-what-you-measure">Median user subscriptions are up 3x since 2019</a></p></li><li><p><a href="https://www.inma.org/blogs/reader-revenue/post.cfm/subscription-growth-vs-revenue-growth-it-matters-what-you-measure">Median churn rates: &lt;5%</a></p></li><li><p>Newsletter platforms growing: <a href="https://whop.com/blog/newsletter-statistics/">Substack has 5M+ paid subscribers</a>; <a href="https://blog.beehiiv.com/p/2025-state-of-email-newsletters-by-beehiiv">beehiiv sent 15.6 billion emails in 2024</a></p></li></ul><p><strong>The problem:</strong></p><p>Tripling subscriptions from a tiny base still leaves you with a tiny base. The median news brand that tripled its subscribers since 2019 now serves thousands or tens of thousands of paying readers&#8212;compared to millions of ad-supported readers they used to monetize.</p><p><strong>The math:</strong></p><p>Advertising supported a model where publishers earned $3-6 CPMs across millions of free readers. Subscriptions earn $10-20/month from thousands of paying readers. The total addressable market for paid content is a small fraction of the ad-supported audience.</p><p>Elite brands (New York Times, Wall Street Journal, The Information) can make subscriptions work. For the other 99% of publishers, subscriptions are supplemental revenue, not a replacement for advertising.</p><p><strong>The AI acceleration problem:</strong></p><p>Subscriptions depend on top-of-funnel traffic to build awareness and drive conversions. As AI Overviews answer user queries without clicks, fewer users discover publishers organically. The traffic needed to feed subscription funnels is disappearing.</p><h3><strong>AI Data Licensing: Theater vs. Business Model</strong></h3><p>This is where the headlines get confusing. News Corp signs a <a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">$250 million deal with OpenAI</a>. Financial Times and Reuters ink licensing agreements. Surely this is the solution?</p><p><strong>The market:</strong></p><ul><li><p><a href="https://www.emetresearch.ai/blogs/market-report-ai-data-licensing-deals-(2020-present)">Total AI content licensing spend: $816.7 million (2024)</a></p></li><li><p><a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">Average deal size: $24 million per publisher</a></p></li></ul><p><strong>The context:</strong></p><ul><li><p><a href="https://www.admonsters.com/ad-blocking-a-54b-problem-for-publishers-in-2024/">Global revenue lost to ad blocking alone: $54 billion (2024)</a></p></li><li><p>U.S. programmatic advertising: $168 billion (2024)</p></li></ul><p><strong>Do the math:</strong></p><p>The entire 2024 AI licensing market ($816.7 million) is <strong>1.5% of what publishers lost to ad blocking</strong> in the same year. Even if licensing reaches $11.16 billion by 2030, that&#8217;s <strong>6.6% of current programmatic spend</strong>&#8212;shared across thousands of publishers globally.</p><p>News Corp&#8217;s $250 million deal sounds impressive until you realize News Corp generates $10+ billion in annual revenue. That licensing deal is 2.5% of annual revenue&#8212;and it&#8217;s one of the largest deals ever signed.</p><p><strong>The distribution problem:</strong></p><p>The &#8220;average&#8221; deal of $24 million is massively skewed by mega-publisher outliers. <a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">OpenAI offers smaller publishers $1-5 million per year</a> for archive access. Mid-tier publishers get far less. Long-tail publishers get nothing.</p><p><strong>The structure problem:</strong></p><p>Most licensing deals pay for <strong>training data</strong>&#8212;historical archives that AI companies use to train their models. This is a one-time value proposition. The ongoing value&#8212;when AI uses publisher content to generate billions of answers per month&#8212;goes largely uncompensated.</p><p><a href="https://digiday.com/media/2024-in-review-a-timeline-of-the-major-deals-between-publishers-and-ai-companies/">Perplexity launched a &#8220;Publishing Program&#8221; in July 2024</a> offering revenue share based on citations, but adoption is limited and the model unproven at scale. Haven&#8217;t heard about this initiative in a while now.</p><p><strong>The uncomfortable truth:</strong> AI licensing deals are PR wins, not business model solutions.</p><h3><strong>Commerce Content &amp; Affiliate Revenue: Killed by the Thing They Pivoted To</strong></h3><p>In the mid-2010s, many publishers pivoted to commerce content and affiliate revenue to reduce ad dependence. <a href="https://www.taboola.com/press-release/tabooladigidaysurvey">87% of publishers now use commerce content as a revenue contributor</a>. For some, it became a top-3 revenue source.</p><p><strong>The success stories:</strong></p><ul><li><p><a href="https://www.inma.org/blogs/world-congress/post.cfm/conde-nast-is-growing-commerce-through-editorial-content">Cond&#233; Nast: $600 million in product sales via editorial content (2024)</a></p></li><li><p><a href="https://project-aeon.com/blogs/media-publishers-are-becoming-e-commerce-powerhouses">New York Times Wirecutter: $101.3 million in affiliate referral revenue (9 months, 2023)</a></p></li></ul><p><strong>The problem:</strong></p><p>Commerce content&#8212;product reviews, buying guides, &#8220;best of&#8221; lists, how-to articles&#8212;is <em>exactly</em> the content type AI cannibalizes most effectively.</p><p><a href="https://www.inma.org/blogs/advertising-initiative-newsletter/post.cfm/how-publishers-are-capturing-revenue-in-the-post-traffic-era">Affiliate revenue share dropped significantly between 2023-2024</a>. Publishers report revenue drops up to 50% following Google AI Overviews rollout in May 2024.</p><p><strong>Why?</strong> Because AI Overviews can summarize product recommendations, synthesize reviews from multiple sources, and provide buying guidance&#8212;all without users clicking through to publisher sites. No click = no affiliate commission.</p><p><strong>The tragic irony:</strong> Publishers pivoted to commerce content to escape advertising dependence. Commerce content became the most vulnerable category to AI cannibalization. Publishers optimized for the exact queries AI handles best.</p><h3><strong>The Other Options: Niche, Non-Scalable, or Both</strong></h3><p><strong>Podcasts:</strong> <a href="https://www.iab.com/wp-content/uploads/2024/05/IAB_US_Podcast_Advertising_Revenue_Study_FY2023_May_2024.pdf">U.S. podcast advertising hit $2.43 billion in 2024</a>, growing 12% year-over-year. CPM rates are stable around <a href="https://libsyn.com/blog/september-2024-podcast-ad-rates/">$21-22 for 60-second spots</a>. Great! Except $2.43 billion is 1.4% of the $168 billion programmatic market. Podcasts work for publishers with strong audio offerings, but can&#8217;t replace advertising losses at scale.</p><p><strong>Events:</strong> The global events industry reached <a href="https://www.marketresearchfuture.com/reports/events-industry-market-12035">$1,505.53 billion in 2024</a>, growing at 11.8% CAGR. But most of this goes to venues, production, and B2B conference companies. Publishers capture a sliver&#8212;and only elite publishers with brand equity and infrastructure (WSJ conferences, TechCrunch Disrupt) can monetize meaningfully. For mid-tier publishers losing millions in ad revenue, events can&#8217;t move the needle.</p><p><strong>Consulting, White-Label Partnerships, Syndication:</strong> These are service businesses that don&#8217;t scale. <a href="https://www.hulkapps.com/blogs/ecommerce-hub/publisher-strategies-how-leading-media-houses-are-optimizing-revenue-streams-in-2024">The Independent&#8217;s wine club</a> might be &#8220;a real money spinner,&#8221; but it&#8217;s not replacing tens of millions in lost advertising revenue. These are distractions from the core structural problem.</p><h3><strong>The Revenue Gap That No One Wants to Acknowledge</strong></h3><p>Let&#8217;s put all the numbers in one place:</p><p><strong>What&#8217;s Being Lost:</strong></p><ul><li><p>Publishers&#8217; share of ad investment decline (10-year): <strong>-43.8 percentage points</strong></p></li><li><p>Traffic decline from zero-click search: <strong>-25-90% depending on vertical</strong></p></li></ul><p><strong>What&#8217;s Being Gained:</strong></p><ul><li><p>AI licensing (total market): <strong>$816.7 million (2024)</strong></p></li><li><p>Podcast advertising (total market): <strong>$2.43 billion (2024)</strong></p></li><li><p>Subscription growth: <strong>3x digital subs since 2019 (but from tiny base)</strong></p></li></ul><p>If you&#8217;re a Chief Revenue Officer or Chief Business Officer staring at 30-50% traffic declines and board questions about AI strategy, none of these alternatives&#8212;subscriptions, licensing, commerce, podcasts&#8212;close the gap at the scale and speed your P&amp;L requires. The revenue replacement math doesn&#8217;t work.</p><p>Even if AI licensing reaches $11.16 billion by 2030 and subscriptions double from current levels, publishers face a <strong>multi-billion-dollar structural revenue gap</strong> with no clear path to close it.</p><p>This isn&#8217;t a transition. It&#8217;s not a rough patch. It&#8217;s a fundamental breakdown of the economic model that sustained digital publishing for 25 years.</p><h2><strong>Part 3: The Open Question&#8212;What Would a Real Solution Look Like?</strong></h2><p>The current approaches&#8212;bilateral licensing deals, subscription paywalls, commerce pivots&#8212;aren&#8217;t closing the revenue gap. They&#8217;re rearranging deck chairs on the Titanic.</p><p>So what&#8217;s missing? What would an AI monetization solution that <em>actually works at scale</em> need to do?</p><p>The emerging consensus among publishers, AI researchers, and marketplace architects points to infrastructure that doesn&#8217;t exist yet&#8212;but that follows clear requirements based on market dynamics and publisher economics.</p><h3><strong>1. Pay Per Answer, Not Per Archive: The Inference-Time Shift</strong></h3><p>Most AI licensing deals pay for <strong>training data</strong>&#8212;access to historical archives that AI companies use to train their models. This is backward-looking and one-time.</p><p>The real value is <strong>inference-time</strong>&#8212;when AI uses publisher content to generate an answer for a user. This happens billions of times per day across all AI platforms. ChatGPT alone handles an estimated 2.5 billion queries per day (industry estimates). Perplexity, Google AI Overviews, Claude, Gemini&#8212;each generating billions of inference events.</p><p>If even a fraction of these inferences use publisher content, and if publishers could capture $0.001 per use, the math changes dramatically:</p><ul><li><p>2.5 billion inferences/day &#215; 30% attribution rate = 750 Million paid uses/day</p></li><li><p>750 Million uses x 365 &#215; $0.001 = <strong>$274 million/year</strong> (single platform - low estimate)</p></li><li><p>Scale across all AI platforms: <strong>$1B+ annual market</strong></p></li></ul><p>The AI market is still around 2% of the total search volume and expected to rise significantly - if not completely take over - in the next 5 to 10 years.</p><p><strong>The problem:</strong> Attribution systems that can track which content influenced which AI output nor the marketplaces that will help the value capture don&#8217;t exist today.</p><p><strong>What&#8217;s needed:</strong></p><ul><li><p>Real-time tracking of content usage in AI responses</p></li><li><p>Automated micropayments triggered at inference time and provenance verifiability</p></li><li><p>Systems that work across all AI platforms, not bilateral deals</p></li></ul><h3><strong>2. From Winner-Take-Most to Long-Tail Economics</strong></h3><p>Current AI licensing is a winner-take-most market. News Corp gets $250 million. New York Times, Financial Times, Wall Street Journal secure deals. Everyone else gets crumbs or nothing.</p><p><a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">The average deal is $24 million</a>, but that&#8217;s skewed by mega-publisher outliers. <a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">OpenAI offers smaller publishers $1-5 million per year</a>&#8212;if they even get a call back.</p><p>Long-tail publishers&#8212;the 10,000+ sites producing quality content in niche verticals&#8212;are completely shut out. Yet collectively, they produce enormous value for AI systems. A specialized cycling publication&#8217;s gear reviews inform AI answers about bikes. A regional news site&#8217;s local reporting shows up in AI summaries. They get nothing.</p><p><strong>What&#8217;s needed:</strong></p><ul><li><p>Marketplace or platform model (not bilateral negotiations)</p></li><li><p>Low barriers to entry (automated licensing, simple onboarding)</p></li><li><p>Micropayment infrastructure that enables compensation for small publishers</p></li><li><p>Collective bargaining power through network effects</p></li></ul><p>Think of how programmatic advertising scaled: Small sites could earn ad revenue through exchanges and SSPs without negotiating directly with every advertiser. AI licensing needs the same infrastructure.</p><h3><strong>3. Provide Granular Publisher Control</strong></h3><p>Right now, publishers have two options: allow AI crawling (via robots.txt) or block crawling - assuming that crawlers respect that which is not the case. There is no other way for AI applications to reach publisher content.</p><p>Publishers want:</p><ul><li><p><strong>Transparent access:</strong> Clear and legitimate way to share their insights and data.</p></li><li><p><strong>IP Protection:</strong> The data shared to be used ONLY for inference purposes</p></li><li><p><strong>Value Based Dynamic pricing:</strong> Charge based on the value given back to the end user of the AI application.</p></li><li><p><strong>Usage analytics:</strong> See which content is being used, how often, and by whom.</p></li></ul><p><strong>What&#8217;s needed:</strong></p><ul><li><p>Content licensing APIs or AI native integrations with granular controls</p></li><li><p>Standardized access and licensing terms</p></li><li><p>Dynamic pricing mechanisms like auctions</p></li><li><p>Real-time dashboards showing usage and revenue</p></li></ul><h3><strong>4. Flip the Power Dynamic: When AI Platforms Need You</strong></h3><p>The current power dynamic is broken. AI companies scrape freely from the public web. They only pay when threatened by lawsuits (New York Times vs. OpenAI) or when they want premium brand partnerships.</p><p>Publishers are in a weak negotiating position because AI companies can:</p><ol><li><p>Scrape public content for free</p></li><li><p>Train models on it without permission</p></li><li><p>Generate answers without attribution</p></li><li><p>Capture all the economic value</p></li></ol><p><strong>What&#8217;s needed to flip this dynamic:</strong></p><ul><li><p><strong>Incentive alignment:</strong> The easiest way to adoption is through incentives. Would it make sense for a business to start paying for a &#8220;product&#8221; (content) that already gets for free? There have to be common incentives at play to do so.</p></li><li><p><strong>Network effects:</strong> Once enough publishers join a marketplace, AI platforms <em>must</em> participate or risk inferior answer quality.</p></li></ul><p><strong>What you should not wait for because your company will be dead by then:</strong></p><ul><li><p><strong>Regulatory/legal pressure:</strong> Copyright litigation and potential legislation making unlicensed use illegal or expensive.</p></li></ul><h3><strong>5. Standardize Access &amp; Market Dynamics</strong></h3><p>Bilateral deals between individual AI companies and publishers won&#8217;t scale because they reinforce power asymmetry. Without standards, the market fragments into proprietary systems. OpenAI builds one licensing system. Google builds another. Anthropic does something different. Publishers must integrate with each separately, multiplying complexity and reducing adoption. The AI industry needs what the advertising industry achieved with programmatic infrastructure: <strong>standard protocols, transparent pricing, and interoperability</strong>.</p><p><strong>What&#8217;s needed:</strong></p><ul><li><p><strong>Universal content access</strong>: AI platforms query a standardized access gateway to retrieve external information and trigger payments.</p></li><li><p><strong>Trust mechanisms</strong>: Similar to how Google measures domain authority, AI content marketplaces need to verify the quality of the &#8220;product&#8221; exchanged, prevent fraud, content theft etc.</p></li><li><p><strong>Open pricing exchanges</strong>: Instead of opaque bilateral negotiations, marketplaces where supply (publisher content) and demand (AI platform usage) set clearing prices through auctions or posted rates. Publishers see what comparable content earns. AI platforms compare pricing across sources.</p></li><li><p><strong>Interoperable payment rails</strong>: Micropayment infrastructure that works across platforms&#8212;whether it&#8217;s blockchain-based, traditional fintech, or hybrid. Publishers shouldn&#8217;t need separate payment integrations for each AI company.</p></li></ul><p>The programmatic advertising analogy is instructive: Before ad exchanges and SSPs standardized inventory access, publishers negotiated directly with every advertiser. Inefficient, non-transparent, limited to large players. OpenRTB and header bidding protocols changed everything. Small publishers gained access to global demand. Advertisers could reach niche audiences at scale.</p><h3><strong>What Would It Take to Actually Solve This?</strong></h3><p>If marketplace infrastructure is the answer, how do you evaluate whether a solution is real or vaporware? Here are the non-negotiable requirements:</p><ol><li><p><strong>Inference-time access &amp; attribution</strong>: Does it track and compensate every usage?</p></li><li><p><strong>Liquidity on both sides</strong>: Are AI platforms already using it, or is it theoretical demand?</p></li><li><p><strong>Transparent pricing discovery</strong>: Can you see market rates, or are you negotiating blind?</p></li><li><p><strong>Low integration friction</strong>: Does it take hours, days or months to go live?</p></li><li><p><strong>Provenance &amp; verification</strong>: Can you prove that the value exchange is auditable?</p></li></ol><p>Any marketplace that can&#8217;t deliver on all five isn&#8217;t solving the structural problem&#8212;it&#8217;s another band-aid.</p><h3><strong>Why Doesn&#8217;t This Exist Yet?</strong></h3><p>If the solution is conceptually clear, why isn&#8217;t anyone building it? Short answer: It is SUPER hard.</p><p><strong>Technical barriers:</strong></p><ul><li><p>Content value exchange mechanisms do not exist, not even theoretically</p></li><li><p>Monitoring billions of AI queries across hundreds of platforms requires massive scale</p></li></ul><p><strong>Economic barriers:</strong></p><ul><li><p>AI companies have no incentive to pay voluntarily (marginal cost of scraping = $0)</p></li><li><p>Revenue share reduces AI company margins</p></li><li><p>Chicken-and-egg: publishers won&#8217;t invest without guaranteed buyers; AI companies won&#8217;t pay without publisher participation</p></li></ul><p><strong>Market structure barriers:</strong></p><ul><li><p>Power asymmetry (Big 4 AI companies control 80% of market; thousands of publishers compete)</p></li><li><p>No unified publisher front for collective bargaining</p></li><li><p>Winner-take-most dynamics favor elite publishers</p></li></ul><p><strong>Coordination problems:</strong></p><ul><li><p>No industry standards</p></li><li><p>Free rider problem (individual publisher opt-out doesn&#8217;t stop AI)</p></li><li><p>Regulatory lag (copyright law unclear on AI training; lawsuits pending but slow)</p></li></ul><h3><strong>The Provocative Questions</strong></h3><p>This is where conventional thought leadership would pivot to &#8220;our product solves this.&#8221; Instead, let&#8217;s be honest about what we don&#8217;t know.</p><p><strong>Who will build the content value exchange infrastructure?</strong></p><p>Will it be AI companies (unlikely&#8212;not in their interest)? A consortium of publishers (possible, but coordination is hard)? A third-party marketplace? A regulatory mandate that forces industry standardization?</p><p><strong>Can publishers coordinate to demand fair compensation before it&#8217;s too late?</strong></p><p>The window for collective action is narrowing. AI models are already trained on vast amounts of publisher content. The longer publishers wait, the weaker their negotiating position becomes. But publisher fragmentation&#8212;thousands of independent businesses with different strategies&#8212;makes coordination nearly impossible without external forcing function.</p><p><strong>Is there a business model that aligns AI platforms and content creators?</strong></p><p>The hardest question of them all. This will need its own article.</p><p>Maybe we&#8217;re watching the end of ad-supported digital publishing as we&#8217;ve known it. Maybe only elite publishers with subscription models and massive brand equity survive. Maybe long-tail and mid-tier publishers simply disappear, and the Agentic Web runs on a handful of mega-publishers and AI-generated content.</p><h2><strong>Where do we end up after all that?</strong></h2><p>Let&#8217;s return to the data:</p><ul><li><p>Zero-click searches jumped from 56% to 69% in 12 months</p></li><li><p>Publishers have lost 25-90% of traffic depending on vertical</p></li><li><p>The entire AI licensing market ($816.7M) is 1.5% of publisher ad-blocking losses ($54B)</p></li><li><p>No alternative revenue model scales to replace advertising</p></li></ul><p><strong>The question is not &#8220;what happens to publishers when clicks disappear?&#8221;</strong></p><p><strong>The question is: &#8220;What do we build so publishers don&#8217;t disappear with the clicks?&#8221;</strong></p><p>The infrastructure to capture inference-time value doesn&#8217;t exist at scale. The attribution systems are immature. The payment rails are nascent. The industry standards are absent. The regulatory framework is unclear.</p><p>But the need is urgent. Publisher traffic is collapsing <em>right now</em>. Revenues are declining <em>right now</em>. The &#8220;extinction-level event&#8221; some observers describe is not hypothetical&#8212;it&#8217;s happening.</p><p>The infrastructure to solve this doesn&#8217;t exist at scale yet. But the window to shape it is closing. Publishers who join early marketplaces now&#8212;during alpha and beta phases&#8212;will set pricing benchmarks, influence platform design, and capture premium positioning before the market commodifies.</p><p>In 24 months, marketplace access will be table stakes. The question is whether you&#8217;re setting the terms or accepting them.</p><p><strong>What do you think?</strong> Are marketplace economics the answer, or is there another path forward? I&#8217;m deep in the weeds in this space and talking to publishers navigating these decisions.</p><p>If you&#8217;re a working through publisher and content creator AI monetization strategy, I&#8217;d love to hear your perspective. Reach out on <a href="https://www.linkedin.com/in/ibakagiannis/">LinkedIn</a> or through the <a href="https://context4gpts.com">website</a>.</p><h2><strong>Sources</strong></h2><ul><li><p><a href="https://www.searchenginejournal.com/impact-of-ai-overviews-how-publishers-need-to-adapt/556843/">Search Engine Journal - Impact of AI Overviews on Publishers</a></p></li><li><p><a href="https://click-vision.com/zero-click-search-statistics">Click Vision - Zero-Click Search Statistics 2025</a></p></li><li><p><a href="https://digiday.com/media/google-ai-overviews-linked-to-25-drop-in-publisher-referral-traffic-new-data-shows/">Digiday - 25% Drop in Publisher Referral Traffic</a></p></li><li><p><a href="https://www.edtechinnovationhub.com/news/chegg-reports-24-revenue-drop-sues-google-over-ai-impact-on-online-learning">EdTech Innovation Hub - Chegg Revenue Drop</a></p></li><li><p><a href="https://fortune.com/2025/11/26/ai-slop-recipes-thanksgiving-food-blog-collapse-traffic/">Fortune - AI Impact on Recipe Traffic</a></p></li><li><p><a href="https://www.grocerslist.com/blog/ai-overviews-recipe-traffic-strategy">Grocers List - AI Overviews Recipe Strategy</a></p></li><li><p><a href="https://www.theregister.com/2025/06/22/ai_search_starves_publishers/">The Register - AI Search Starves Publishers</a></p></li><li><p><a href="https://www.dangerous-business.com/how-google-and-ai-are-killing-travel-blogs-like-mine/">Dangerous Business - Google Killing Travel Blogs</a></p></li><li><p><a href="https://developers.slashdot.org/story/25/01/10/1729248/stackoverflow-usage-plummets-as-ai-chatbots-rise">Slashdot - Stack Overflow Usage Plummets</a></p></li><li><p><a href="https://pressgazette.co.uk/media-audience-and-business-data/uk-and-us-publishers-says-google-ai-is-harming-website-traffic/">Press Gazette - Google AI Harming Website Traffic</a></p></li><li><p><a href="https://kpmg.com/us/en/media/news/generative-ai-consumer-trust-survey.html">KPMG - Generative AI Consumer Trust Survey</a></p></li><li><p><a href="https://www.salesforce.com/news/stories/trusted-ai-data-statistics/">Salesforce - Trusted AI Data Statistics</a></p></li><li><p><a href="https://www.designrush.com/agency/search-engine-optimization/trends/zero-click-searches">Design Rush - Zero-Click Searches 2025</a></p></li><li><p><a href="https://www.publift.com/blog/programmatic-advertising-trends">Publift - Programmatic Advertising Trends</a></p></li><li><p><a href="https://www.therebooting.com/report/the-state-of-publisher-ad-revenue/">The Rebooting - State of Publisher Ad Revenue</a></p></li><li><p><a href="https://www.admonsters.com/ad-blocking-a-54b-problem-for-publishers-in-2024/">AdMonsters - Ad Blocking $54B Problem</a></p></li><li><p><a href="https://www.socialmediatoday.com/news/facebook-publisher-referrals-decline-50-percent/715745/">Social Media Today - Facebook Publisher Referrals Decline 50%</a></p></li><li><p><a href="https://localmedia.org/2024/01/digital-subscriptions-trends-for-2024-and-what-publishers-can-do-to-grow/">Local Media - Digital Subscriptions Trends 2024</a></p></li><li><p><a href="https://www.inma.org/blogs/reader-revenue/post.cfm/subscription-growth-vs-revenue-growth-it-matters-what-you-measure">INMA - Subscription Growth vs Revenue Growth</a></p></li><li><p><a href="https://voices.media/as-the-paid-reader-base-grows-more-slowly-reducing-churn-is-the-focus-for-publishers-going-into-2024/">Voices Media - Reducing Churn Focus 2024</a></p></li><li><p><a href="https://whop.com/blog/newsletter-statistics/">Whop - Newsletter Statistics</a></p></li><li><p><a href="https://blog.beehiiv.com/p/2025-state-of-email-newsletters-by-beehiiv">beehiiv - 2025 State of Email Newsletters</a></p></li><li><p><a href="https://www.cbinsights.com/research/ai-content-licensing-deals/">CB Insights - AI Content Licensing Deals</a></p></li><li><p><a href="https://www.emetresearch.ai/blogs/market-report-ai-data-licensing-deals-(2020-present)">Emet Research - AI Data Licensing Market Report</a></p></li><li><p><a href="https://digiday.com/media/2024-in-review-a-timeline-of-the-major-deals-between-publishers-and-ai-companies/">Digiday - Major Deals Between Publishers and AI Companies</a></p></li><li><p><a href="https://www.taboola.com/press-release/tabooladigidaysurvey">Taboola - Publishers Using Commerce Content</a></p></li><li><p><a href="https://project-aeon.com/blogs/media-publishers-are-becoming-e-commerce-powerhouses">Project Aeon - Publishers E-commerce Powerhouses</a></p></li><li><p><a href="https://www.inma.org/blogs/advertising-initiative-newsletter/post.cfm/how-publishers-are-capturing-revenue-in-the-post-traffic-era">INMA - Post-Traffic Era Revenue</a></p></li><li><p><a href="https://www.iab.com/wp-content/uploads/2024/05/IAB_US_Podcast_Advertising_Revenue_Study_FY2023_May_2024.pdf">IAB - US Podcast Advertising Revenue 2023</a></p></li><li><p><a href="https://libsyn.com/blog/september-2024-podcast-ad-rates/">Libsyn - September 2024 Podcast Ad Rates</a></p></li><li><p><a href="https://www.marketresearchfuture.com/reports/events-industry-market-12035">Market Research Future - Events Industry Market</a></p></li><li><p><a href="https://www.hulkapps.com/blogs/ecommerce-hub/publisher-strategies-how-leading-media-houses-are-optimizing-revenue-streams-in-2024">Hulk Apps - Publisher Strategies 2024</a></p></li><li><p><a href="https://aimagazine.com/articles/how-can-ai-firms-pay-publishers-perplexity-has-a-plan">AI Magazine - Perplexity Plan to Pay Publishers</a></p></li><li><p><a href="https://digitalcontentnext.org/blog/2025/03/06/ai-content-licensing-lessons-from-factiva-and-time/">Digital Content Next - AI Licensing Lessons from TIME</a></p></li><li><p><a href="https://www.prnewswire.com/news-releases/content-credits-revolutionizes-online-content-accessibility-for-publishers-businesses-and-consumers-302183782.html">PR Newswire - Content Credits Launch</a></p></li><li><p><a href="https://aijourn.com/why-ai-makes-micropayments-essential-for-publishers-and-creators/">AI Journal - Why AI Makes Micropayments Essential</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Agentic Web, Part 4: From Search Bars to Gateways]]></title><description><![CDATA[Search returns links; gateways deliver outcomes. This essay defines what an agentic gateway is, provides a practical framework to assess one, and surveys where those gateways are likely to live.]]></description><link>https://bakagiannis.substack.com/p/the-agentic-web-part-4-from-search</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/the-agentic-web-part-4-from-search</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Tue, 05 Aug 2025 16:10:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e-LI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3004a7ff-088f-42f8-98f2-66b8d48d3783_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://ads4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DKn-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png" width="500" height="100" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:100,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20780,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://ads4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/167158003?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!DKn-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A search bar used to be <em>the</em> gateway to the internet. You typed a query; you got ten links. Today, the dominant pattern is different: synthesized answers, suggested actions, and task flows. Google&#8217;s own guidance acknowledges the shift away from the classic &#8220;ten blue links&#8221; toward richer, multimodal results, and its AI Overviews formalize the answer-first pattern.</p><p>But the Agentic Web is much more than an AI Overview. Today, users increasingly express goals -&#8220;book a refundable flight under &#8364;300 next Friday,&#8221; &#8220;migrate my site to HTTPS without downtime,&#8221; &#8220;draft and file this paperwork&#8221; - and expect systems to execute. <strong>Thesis:</strong> the front door to the web is becoming an <em>Agentic Gateway</em>: the place where intent is captured, context is grounded, and actions are orchestrated across tools and services.</p><blockquote><p><strong>Previously in this series</strong></p><ul><li><p><strong><a href="https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision">Introduction to the Agentic Web: Vision</a></strong> &#8212; Why the web is shifting from pages to agents and what that enables.</p></li><li><p><strong><a href="https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of">The Agentic Web, Part 2: Anatomy of an Agent</a></strong> &#8212; What components a competent agent needs (memory, tools, planning).</p></li><li><p><strong><a href="https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web">Part 3: Evolution of Web Infrastructure</a></strong> &#8212; How infrastructure (APIs, auth, payments) unlocks end-to-end execution.</p></li></ul></blockquote><h2><strong>Definition: what is an Agentic Gateway?</strong></h2><p>An <strong>Agentic Gateway</strong> is the front door to an autonomous capability. It&#8217;s the layer a user or a system touches to <em>state intent</em>, and the layer that then <em>interprets, plans, executes, verifies,</em> and <em>hands back outcomes</em>, often by coordinating one or multiple large language models (LLM) with tools, data, and people.</p><p>Think of it as the mission control for an agentic workflow:</p><ul><li><p>It translates ambiguous goals (&#8220;ship this feature by Friday,&#8221; &#8220;negotiate a better rate,&#8221; &#8220;compile an investment brief&#8221;) into machine-actionable plans.</p></li><li><p>It orchestrates models, tools, and services to execute those plans.</p></li><li><p>It manages context, personalization, and permissions so the agent does the <em>right</em> thing with the <em>right</em> information.</p></li><li><p>It reports progress, asks for clarification when needed, and knows when it&#8217;s done.</p></li></ul><p>Agentic Gateways can be closed-world or open-world. The difference lies on how well defined is the agentic workflow and how clear is the definition of success. Closed-world gateways have clear(er) feedback loops even though they can still interact with the open world e.g. a coding agent. When we are talking about the web we are talking about open-world gateways.</p><h2>The Gateway Analysis Framework (GATE)</h2><p>To evaluate any gateway, we&#8217;ll use the <strong>GATE Framework</strong>&#8212;a four-part lens that maps exactly to the capabilities you need to get outcomes,</p><blockquote><p><strong>Framework G.A.T.E.</strong></p><ol><li><p><strong>Grounding Context (G):</strong> <em>Internally available context</em>&#8212;what the gateway already knows.</p></li><li><p><strong>Action Surface (A):</strong> <em>Execution-connected functionality</em>&#8212;tools and extensibility (plugins, APIs, automations).</p></li><li><p><strong>Translation Layer (T):</strong> <em>Web 2.0 compatibility</em>&#8212;ability to render and interact with today&#8217;s sites (forms, cookies), and to fall back to browsing.</p></li><li><p><strong>Engine Competence (E):</strong> <em>Model attributes</em>&#8212;reasoning, planning, multimodality, latency, model size, deployment infra and how well the gateway chains skills.</p></li></ol></blockquote><h3>G) Grounding Context (Internal Context &amp; Identity)</h3><p><strong>Definition:</strong> The user-specific state the gateway can access by default (profiles, preferences, organization policy, past tasks) and the authorization practices to go along with it.</p><p><strong>Why it matters:</strong><em> Personalization.</em> Without grounding, the agent guesses. With grounding, it minimizes clarification loops, respects constraints, and avoids wrong actions (e.g., booking with the wrong card).</p><p><strong>Example:</strong> An OS copilot with account access knows your calendar, Wi-Fi networks, and installed apps.</p><p><strong>Implication:</strong> Gateways with durable, privacy-aware memory produce faster, more accurate outcomes.</p><h3>A) Action Surface (Execution Tools)</h3><p><strong>Definition:</strong> The set of functions the gateway can call: native tools, third-party APIs, and a mechanism to add new ones safely (scopes, rate limits, audit logs).</p><p><strong>Why it matters:</strong> Outcomes require verbs&#8212;search, fill, sign, buy, deploy. A bare LLM without tools is a talker, not a doer.</p><p><strong>Example:</strong> A &#8220;pay now&#8221; step via Stripe Checkout or Sessions API inside an agentic flow.</p><p><strong>Implication:</strong> Without tools, you get summaries. With tools, you get completed tasks.</p><h3>T) Translation Layer (Web 2.0 Compatibility)</h3><p><strong>Definition:</strong> The ability to interact with today&#8217;s web in a deeply respectful way. Mostly used to render pages, submit forms, generate content, respond to email etc.</p><p><em><strong>Why it matters:</strong></em> Agentic APIs will lag. A pragmatic gateway must interact with legacy sites and forms while staying inside compliance boundaries.</p><p><strong>Example:</strong> An agentic browser that can open a live product page for inspection and still extract facts or complete checkout.</p><p><strong>Implication:</strong> Compatibility buys coverage; it keeps the agent useful before every site exposes an agent API.</p><h3>E) Engine Attributes (Model &amp; Orchestration)</h3><p><strong>Definition.</strong> The reasoning, planning, and multimodal capabilities that translate intent into plans, call tools, and verify outputs, under latency and cost constraints. Also entails where this engine lives.</p><p><em><strong>Why it matters:</strong></em> Weak planning leads to looped prompts and partial results. Strong engines can decompose tasks, check their work, and recover from tool failure. Infrastructure requirements many times dictate the power of the engine.</p><p><strong>Example:</strong> Gateways that cite sources, summarize uncertainty, and support tool-use tightly reduce error rates.<br><strong>Implication:</strong> Confidence isn&#8217;t just model IQ; it&#8217;s the full reliability stack.</p><p><strong>Putting it all together</strong></p><p>Booking a complex multi-city trip requires stored traveler profiles &amp; preferences (G), the ability to book hotel and air travel (A), take robust multi-step reasoning with fast responses (E) and validate that the offers and reservations are made in the vendor&#8217;s website (T).</p><h2>User Agentic Gateways Evaluation</h2><p>The &#8220;natural home&#8221; of the gateway could be the browser, but user behavior is shifting from passive browsing to conversational and task-centric flows. We currently have mapped four routes users are actually taking and we will evaluate them with GATE.</p><h3>Apps (ChatGPT, Claude, etc.)</h3><p>Dedicated AI apps are the <em>default conversational gateways</em> today.</p><ul><li><p><strong>G:</strong> Solid personal memory features are emerging, but portability and authentication for third party auth are cumbersome.</p></li><li><p><strong>A:</strong> Mature tool ecosystems for search, code, data. Commerce is emerging as third party tooling.</p></li><li><p><strong>T:</strong> Can search, scrape and summarize. No compatibility with traditional web.</p></li><li><p><strong>E:</strong> State-of-the-art reasoning with competitive latency with probably the best orchestration.</p></li></ul><p><strong>Strength:</strong> Fast innovation and broad tool coverage.</p><p><strong>Risk:</strong> Fragmentation of identity, memory/personalization and tooling for each app.</p><h3>Integrated AI / Co-pilots (Chrome Extensions)</h3><p>Co-pilots meet users where they work (docs, email, IDEs).</p><ul><li><p><strong>G:</strong> Access to local browsing history and tabs gives rich situational context, if permissions are well-scoped but limited to no memory and personalization.</p></li><li><p><strong>A:</strong> Extension APIs execute actions inside the browser (form-fill, DOM click) and call external services; reliability depends on site stability. Also limited to the page or session at hand.</p></li><li><p><strong>T:</strong> Excellent for Web 2.0 compatibility because they operate &#8220;where the user is,&#8221; but fragile when sites change layouts.</p></li><li><p><strong>E:</strong> Model lives in an third party API call and have limited optionality in terms of orchestration due to the app size limitations.</p></li></ul><p><strong>Strength:</strong> Low friction; meets the user in-flow.</p><p><strong>Risk:</strong> Live and die by the attached app/browser. A solution only for today.</p><h3>Agentic Browsers</h3><p>Browsers that ship an agent as a first-class feature attempt to <em>merge gateway and renderer</em>.</p><ul><li><p><strong>G:</strong> Bird&#8217;s eye view of all web aspects with built-in identity and workspace memory. Authentication solutions are the most mature.</p></li><li><p><strong>A:</strong> Native headless modes, automation primitives, and deep extensions make them strong executors. But unfortunately rely a lot on existing search indexes and scraping, practices that will not be relevant in the future.</p></li><li><p><strong>T:</strong> Best-in-class rendering and interaction fidelity by design.</p></li><li><p><strong>E:</strong> Competitive models but a lot of memory requirements on device (mobiles).</p></li></ul><p><strong>Strength:</strong> Deepest integration with the legacy web and device.</p><p><strong>Risk:</strong> Data privacy issues and incentive alignment with content providers they are scraping.</p><h3>OS-Embedded AI (phones, computers)</h3><p>The operating system can become the <em>universal gateway</em> across apps, files, and hardware.</p><ul><li><p><strong>G:</strong> Deepest personal context (files, emails, calendars, sensors) with system-level permissions.</p></li><li><p><strong>A:</strong> Can orchestrate across apps (mail, calendar, messages) and invoke device capabilities.</p></li><li><p><strong>T:</strong> Limited direct web manipulation (same as apps). But nothing stops developers from building on device functionality.</p></li><li><p><strong>E:</strong> Private/local models increasingly capable; mixed cloud offload for heavy tasks. This is the biggest potential strength, but it&#8217;s currently a limitation.</p></li></ul><p><strong>Strength:</strong> Strongest personalization with privacy and local identity management.</p><p><strong>Risk:</strong> Model performance. Reasoning is critical in an open-world system, and on-device models can lag the state of the art by a wide margin.</p><h2>Agent-to-Agent Gateways</h2><p>In an open-world setting like the internet, intents range from &#8220;compare 12 EV models&#8221; to &#8220;pay a customs duty&#8221; to &#8220;rebook my flight and preserve seat 14A.&#8221; It is <em>exponentially</em> hard&#8212;practically impossible&#8212;for one agent to include all context, contracts, and capabilities. Two forces make <strong>third-party agents</strong> desirable:</p><ol><li><p><strong>Specialized value beats generality</strong>: a bank&#8217;s agent knows card rules; a retailer&#8217;s agent sees inventory; a logistics agent owns carrier APIs.</p></li><li><p><strong>Fair representation and efficiency</strong>: brands, businesses and publishers want to speak for themselves and at the same time gateways don&#8217;t want to re-research settled facts.</p></li></ol><p>In order though for third-party agents to exist there have to be deeper and undeniable incentives that align with existential or monetary values. Generally we see three reasons for such agents to exist.</p><ol><li><p><strong>Monetize execution &amp; context</strong>: Charge per call when the agent makes the gateway &#8220;better&#8221;. The capability or context adds concrete value. <em>Example:</em> a stripe payments agent processing checkout or a sports publisher agent provides the live score of a game.</p></li><li><p><strong>Sell downstream</strong>: Recommend or fulfill products/services and earn margin. <em>Example:</em> BYD&#8217;s agent presenting trims and inventory or a retail network offering tailored recommendations from their partner stores.</p></li><li><p><strong>Gain distribution</strong>: Use responses to route attention to a creator or brand. <em>Example:</em> Joe Rogan&#8217;s podcast agent offering an opinion about &#8220;who wins: tiger or gorilla&#8221;.</p></li></ol><p>To connect them to a gateway, there could be two linking models.</p><h3>Direct-to-Agent</h3><p>The gateway calls a known external agent via an &#8220;agentic API&#8221; like MCP often with identity and permissions already established.<br><strong>Why it matters:</strong> Low latency, predictable UX, clear accountability.<br><strong>Example:</strong> A user&#8217;s default <em>Payments Agent</em> (e.g., Stripe) handles checkout inside the flow, with pre-authorized methods and receipts.</p><h3>Agentic Marketplace</h3><p>The gateway routes a request to a <em>network</em> to discover the best agent for the intent, then negotiates capabilities and terms.<br><strong>Why it matters:</strong> Coverage and competition which are useful when the gateway doesn&#8217;t know &#8220;who&#8221; to call.<br><strong>Example:</strong> the user queries &#8220;is the Tesla stock a Buy, a Hold or a Sell?&#8221;, then the agent requests from the network information about earnings calls, latest financials and expert opinions. The MorningStar agent and the Yahoo agent respond with context that helps the gateway to craft a well rounded response.</p><blockquote><p><strong>Call to Arms:</strong><br>We are working on something exciting in this area. The hardest problems are incentive design, safety, and attribution. Let&#8217;s tackle them together. If you are passionate about it as we are, reach out!</p></blockquote><h2>Why should I care?</h2><p>We haven&#8217;t found the definitive Agentic Gateway yet&#8212;but we now have a clear way to evaluate contenders with GATE. In the near term, Agentic Browsers are poised to win: they sit where users already act and bridge today&#8217;s Web 2.0 forms and flows. Over the longer horizon, OS-level solutions will most likely prevail by combining deep personal context, permissions, and cross-app execution.</p><p>What this means for you: if you&#8217;re a <strong>publisher, content creator, SaaS, e-commerce store or platform</strong> - or any of the other internet &#8220;actors&#8221; -  you need to ship a third-party agent NOW. Make your capabilities callable (not just readable): expose verbs (quote, book, pay, modify) and content as AI context. Distribution is shifting from pages to agent calls, those without agents will be quietly routed around. Build an agent, get measured by outcomes, and stay visible through the transition.</p><h2>Next and final perspective - Money makes the world go around</h2><p>In the next installment, we&#8217;ll follow the money trail and examine the economic architectures that emerge when agents stop merely reading the web and start signing the checks&#8212;who pays whom, for what, and when.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Battle of the Agentic Web Has Begun]]></title><description><![CDATA[Cloudflare&#8217;s decision to block crawlers by default shows that the ad-supported &#8220;link economy&#8221; is dying. A new market for data, discovery, and monetization is coming&#8212;fast.]]></description><link>https://bakagiannis.substack.com/p/the-battle-of-the-agentic-web-has</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/the-battle-of-the-agentic-web-has</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Fri, 04 Jul 2025 09:44:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e-LI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3004a7ff-088f-42f8-98f2-66b8d48d3783_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pq8k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pq8k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pq8k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png" width="564" height="112.8" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43fec168-21af-4972-a22a-1d0586c752cc_500x100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:100,&quot;width&quot;:500,&quot;resizeWidth&quot;:564,&quot;bytes&quot;:20780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/167507003?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pq8k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!Pq8k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43fec168-21af-4972-a22a-1d0586c752cc_500x100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>A Not-So-Quiet Flip of the Switch</strong></h2><p>On July 1st 2025 Cloudflare, gatekeeper to roughly a fifth of the internet, turned on crawler blocking for every new customer and rolled out a <strong>Pay Per Crawl</strong> program that lets publishers charge bots for every request. (<a href="https://www.theverge.com/news/695501/cloudflare-block-ai-crawlers-default?utm_source=chatgpt.com">theverge.com</a>)<br>With one configuration change, the company challenged a core assumption of Web 2.0: that anyone&#8212;or anything&#8212;may scrape your content as long as it boosts traffic.</p><h2><strong>How Web 2.0 Incentives Broke</strong></h2><h3>Blame Google</h3><p>Let&#8217;s be honest&#8212;this all starts with Google. If Google had been transparently extractive from the beginning, maybe the flawed incentive design of Web 2.0 would&#8217;ve been exposed much sooner. The vicious cycle of:</p><ol><li><p><strong>Intent &#8594; Google Search</strong></p></li><li><p><strong>Search &#8594; Clicks</strong></p></li><li><p><strong>Clicks &#8594; On-page ads</strong></p></li><li><p><strong>Ads &#8594; Revenue</strong></p></li></ol><p>Kept everyone on the hamster spinning wheel. Welcome to the link economy. Google&#8217;s crawler infrastructure made it all work&#8212;indexing and ranking the world&#8217;s information, for free, so long as you played by its rules. The crawler was the cost of doing business, the ad auction the profit engine.</p><p>That engine required bots/crawlers that tirelessly roamed the web, harvesting and scoring content. That was acceptable in a world where humans clicked the links. But the AI companies broke that covenant. An agent that answers in its own window never hands the user back to the publisher, so no ad loads, no CPM, no paycheck.</p><h3>Scraping Was Not Cool</h3><p>For most of the past two decades, automated scraping sat just north of plagiarism and just south of denial-of-service on the moral spectrum. Google got a pass because its crawler sent readers back to us, and those eyeballs converted into ad revenue. That quid-pro-quo kept the link economy humming but also kept the need to accommodate crawlers.</p><p>Then came AI. Foundation-model builders vacuumed up massive corpora, and every new chatbot feature&#8212;summaries, opinions, breaking news&#8212;demanded fresh pages. Suddenly scraping wasn&#8217;t vandalism; it was <em>business as usual</em>.</p><h2><strong>Cloudflare Fires the First Shot</strong></h2><p>To Cloudflare&#8217;s credit, they see what&#8217;s going on&#8212;and, most importantly, they take action. Likely driven by pure capitalist motives, as Saanya Ojha <a href="https://saanyaojha.substack.com/p/the-internet-just-flipped-its-default">pointed out</a>, because the opportunity ahead is simply that great. The current scraping model is unsustainable. Publishers are losing traffic and revenue while AI companies profit from their data. Something has to change.</p><h3>The Plan In Action</h3><p>As of July 1 2025, all new customer sites are set to block known AI bots by default unless site owners explicitly allow them. This marks a transition from the old opt-out model to a new, permission-based framework where publishers can choose to allow, block, or monetize access for specific AI crawlers. To support this, Cloudflare introduced a &#8220;Pay&#8209;Per&#8209;Crawl&#8221; feature and then a marketplace that lets site owners charge a flat fee per crawl request, with payments facilitated directly through HTTP protocols.</p><h3>Three Truths Were Spoken</h3><ul><li><p><strong>Misaligned incentives are unsustainable.</strong> If bots keep draining value, creators will lock down their sites or vanish.</p></li><li><p><strong>Permission is non-negotiable.</strong> A default block forces every AI company to declare itself and ask.</p></li><li><p><strong>Publishers must exist in an agent-friendly web.</strong> Data holders deserve a business model that doesn&#8217;t require an ad stack.</p></li></ul><h3><strong>But They Got Some Things Wrong</strong></h3><ul><li><p>Crawling HTML is a crude way to feed an LLM. It drags along layout debris and forces publishers to run parallel CMS instances (web, feed, llms.txt, etc.).</p></li><li><p>Curating relationships between AI companies and every publisher in the world who might have data that would benefit the users of the AI, is not realistic. This would need to happen though if there is an API key for each partnership.</p></li><li><p>Offloading discovery to a marketplace <em>without</em> building search defeats the purpose. Matching the right datum to the right query is the hard (and expensive) part.</p></li></ul><p>Also one thing was missed completely: <strong>Discovery economics.</strong> A marketplace is meaningless without matching. Google subsidised retrieval with ads. Who funds matching in a post-ad world? (talk to me if this area is interesting wink wink)</p><h2><strong>Why AI Still Needs the Open Internet</strong></h2><h3><strong>Why Does AI Need Open Internet Data?</strong></h3><p><strong>Pre-training<br></strong> <em>What the model needs:</em> rich language, stylistic nuance, broad domain knowledge<br> <em>Typical sources:</em> historical web, books, structured corpora</p><p><strong>Fine-tuning<br></strong> <em>What the model needs:</em> task-specific examples, up-to-date terminology<br> <em>Typical sources:</em> partner datasets, proprietary logs</p><p><strong>Inference<br></strong> <em>What the model needs:</em> fresh facts, time-sensitive signals, authoritative context<br> <em>Typical sources:</em> APIs, live feeds, plugins</p><p>Three main buckets of external data that power inference usage:</p><ol><li><p><strong>Perspective &amp; opinion.</strong> Essays, forums, niche newsletters etc</p></li><li><p><strong>Live feeds of reality.</strong> Prices, weather, sports scores, shipping schedules etc anything that is being created at live speed or it is time-sensitive.</p></li><li><p><strong>Credibility signals.</strong> Citations, peer review, historical revisions.</p></li></ol><p>Large-scale training demands bulk access; live inference needs low-latency access. Either way, the web remains the richest, messiest, most up-to-date dataset available. No private corpus can match its breadth.</p><h2><strong>Rethinking Internet&#8217;s Data Business Models</strong></h2><p>So what business models actually make sense? I argue that we have to separate these models based on the intended use of the data.</p><h3>Training Data &#8212; the &#8220;Bottle of Wine&#8221; Scenario</h3><p>Once your words flow into the model&#8217;s weights, they stay there, the way wine poured into a stew can never be poured back into the bottle. From my point of view there are three ways of licensing the data:</p><ol><li><p><strong>Metered Royalty<br></strong> <em>Charge a fee every time the AI uses knowledge traced back to your work.</em></p><ul><li><p><em>Appeal:</em> Feels equitable. Pay me when you profit from me.</p></li><li><p><em>Problem:</em> Detecting those moments is like spotting a single grape in that stew. Even the vendor can&#8217;t do it currently, and they have every incentive to undercount.</p></li></ul></li><li><p><strong>Revocable Lease<br></strong> <em>Grant access now, pull it later if terms sour.</em></p><ul><li><p><em>Appeal:</em> Keeps pressure on the AI company to behave.</p></li><li><p><em>Problem:</em> Impossible to &#8220;un-train&#8221; a model without wrecking it; the wine is already simmering. The threat is an illusion.</p></li></ul></li><li><p><strong>One-Time, Perpetual Licence<br></strong> <em>Sell the rights up front&#8212;no strings, no meter.</em></p><ul><li><p><em>Appeal:</em> Zero tracking overhead, zero litigation about who owes what.</p></li><li><p><em>Problem:</em> You must be comfortable never clawing back control.</p></li></ul></li></ol><p><strong>Best fit</strong>: <strong>One-Time, Perpetual Licence</strong></p><p>Technical enforcement for options 1 and 2 simply doesn&#8217;t exist at production scale, and every extra audit hop slows the product you hope to monetize. Choosing the perpetual route is less about generosity and more about admitting physics: once the model swallows your data, policing bites is fantasy.</p><h3>Inference Data &#8212; the &#8220;Rental Car&#8221; Scenario</h3><p>Here, your content sits outside the model. The AI calls for it only when a user&#8217;s question needs it, much like renting a car for a day trip. Sounds easy to meter&#8212;until you spot the loopholes. The main one is that inference data often becomes training data after the fact. If the AI company logs the conversation or fine-tunes its orchestrator, your &#8220;real-time&#8221; data just becomes &#8220;forever&#8221; data. I am seeing three main options for monetizing such data:</p><ol><li><p><strong>Pay-Per-Call API<br></strong> <em>Tiny fee every time the model hits your endpoint.</em></p><ul><li><p><em>Upside:</em> Straightforward invoice: X calls &#215; Y cents. No data manipulation.</p></li><li><p><em>Snag:</em> You must trust the AI company&#8217;s logs, or fund a third-party auditor that slows everything down. And if those logs later feed training, you&#8217;re accidentally back in the &#8220;Bottle of Wine.&#8221; Also you have to maintain the API.</p></li></ul></li><li><p><strong>Pay-Per-Crawl </strong>(Cloudflare&#8217;s pitch)<br> <em>Same as above but with a unified interface.</em></p><ul><li><p><em>Upside:</em> Using one &#8220;connector&#8221; to the data instead of managing XXX APIs.</p></li><li><p><em>Snag:</em> Same as above plus you have to correctly route traffic to the &#8220;correct&#8221; spots.</p></li></ul></li><li><p><strong>Gate &amp; Transform<br></strong> <em>License each retrieval <strong>and</strong> strip the payload to the bare minimum: summaries, embeddings, or partial snippets.</em></p><ul><li><p><em>Upside:</em> Your core IP never lands whole on the AI company&#8217;s disk, making downstream training far less valuable.</p></li><li><p><em>Snag:</em> It does not exist. Hit me up if this is interesting to you.</p></li></ul></li></ol><p><strong>Best fit</strong>: <strong>Pay-Per-Crawl (today) but Gate &amp; Transform (tomorrow)</strong><br>The best starting point would be to align with Cloudflare. Limit bot exposure and start getting back something for the consumed data. But if you truly don&#8217;t want your prose immortalized inside someone else&#8217;s model, you must control every retrieval and obscure the raw source. Anything less is a polite invitation for the vendor to turn today&#8217;s rental into tomorrow&#8217;s permanent fleet car.</p><h2>Where Next?</h2><p>For the Agentic Web to materialise, we still need serious infrastructure and protocol work&#8212;see my other articles for the deep dive. The bottom line is clear: AI agents must become first-class citizens of the new Internet, and that means fresh rules and new monetisation options.</p><p>I applaud Cloudflare&#8217;s move; it&#8217;s a strong first step, but there&#8217;s plenty left to do. These are exciting times. If this prospect excites you too&#8212;and you&#8217;d like to get involved as a collaborator, investor, or stakeholder with a monetization angle&#8212;drop me a line. I&#8217;d love to talk.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Agentic Web Part 3: Evolution of Web Use]]></title><description><![CDATA[Understand how current web primitives are being transformed through AI-first interaction.]]></description><link>https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Mon, 30 Jun 2025 15:42:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wp28!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f41c29-ff2f-45d1-9fe7-4a8352c68f35_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.ads4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bUVD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bUVD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png" width="500" height="100" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:100,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.ads4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/167185859?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bUVD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!bUVD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23a4b958-e1f9-41b2-9153-aec631a6e5ec_500x100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>In Part 1</strong>, we defined the <em>Agentic Web</em>: a shift from static pages to outcome-driven interactions powered by AI agents.<br><strong>In Part 2</strong>, we examined the anatomy of an AI agent and its web.</p><p>Here, we dive into how core web use is transforming. Browsing gives way to <strong>delegation</strong>. The web stops being a place to click and becomes a system that acts.</p><h2>Web 2.0 Core Use Cases</h2><p><strong>Stay informed</strong><br>&#8211; Manually visit news sites, RSS, newsletters<br>&#8211; Search for &#8220;what happened?&#8221;</p><p><strong>Learn &amp; research</strong><br>&#8211; Keyword search &#8594; skim multiple sources &#8594; bookmark or copy&#8211;paste notes</p><p><strong>Communicate &amp; build community</strong><br>&#8211; Email, chat apps, social media feeds, forums</p><p><strong>Consume entertainment</strong><br>&#8211; Stream video/music, play web games, scroll memes</p><p><strong>Discover &amp; buy</strong><br>&#8211; Search + ad/social referrals &#8594; compare offers &#8594; fill checkout forms</p><p><strong>Manage money</strong><br>&#8211; Log in to online banking, trading dashboards, crypto wallets</p><p><strong>Do work &amp; create</strong><br>&#8211; SaaS dashboards, cloud docs, CMS/blog editors</p><p><strong>Book &amp; coordinate services</strong><br>&#8211; Flight portals, ride-hailing, food delivery, tele-health portals</p><p><strong>Self-development &amp; education</strong><br>&#8211; MOOCs, language apps, digital training platforms</p><h2>From One-Size-Fits-All to Adaptive Automation</h2><p>In the early days of AI on the web, interaction was treated as a one-size-fits-all experience: enter a prompt, let the model run, accept the output.</p><p>But this oversimplifies reality. Human behaviour isn&#8217;t uniform&#8212;it's contextual, emotionally layered, and risk-sensitive. Users calibrate trust in AI systems based on the stakes, emotional significance, and potential consequences of each action.</p><h3>Why Maslow Still Matters in the Agentic Web</h3><p>To design automation that feels trustworthy, we must align it with <strong>Maslow&#8217;s Hierarchy of Needs</strong>:</p><ul><li><p><strong>Physiological Needs</strong> &#8211; food, shelter, basic goods</p></li><li><p><strong>Safety Needs</strong> &#8211; health, financial stability, protection</p></li><li><p><strong>Belonging &amp; Love</strong> &#8211; relationships, community, connection</p></li><li><p><strong>Esteem</strong> &#8211; status, achievement, personal value</p></li><li><p><strong>Self-Actualization</strong> &#8211; growth, creativity, purpose</p></li></ul><p>The further up the pyramid a task falls, the more <strong>emotional weight</strong>, <strong>irreversibility</strong>, and <strong>regulatory impact</strong> it tends to carry. Consequently, the more <strong>nuanced and collaborative</strong> automation must become.</p><h3>Trust Calibration: Matching Automation to Human Psychology</h3><p><strong>Factor: Cost or Risk</strong></p><ul><li><p><em>Low-Stakes Tasks</em>: $5 household item, news digest</p></li><li><p><em>High-Stakes Tasks</em>: Designer goods, healthcare, legal matters</p></li></ul><p><strong>Factor: Emotional Weight</strong></p><ul><li><p><em>Low-Stakes Tasks</em>: &#8220;Refill dog food&#8221;</p></li><li><p><em>High-Stakes Tasks</em>: &#8220;Plan my wedding menu&#8221;</p></li></ul><p><strong>Factor: Reversibility</strong></p><ul><li><p><em>Low-Stakes Tasks</em>: Easily undone (cancel, edit, re-order)</p></li><li><p><em>High-Stakes Tasks</em>: Difficult to unwind (legal filings, medical decisions)</p></li></ul><p><strong>Factor: Regulation</strong></p><ul><li><p><em>Low-Stakes Tasks</em>: Light or none</p></li><li><p><em>High-Stakes Tasks</em>: Heavily regulated (finance, health, privacy, compliance)</p></li></ul><p><strong>Key Insight</strong>:</p><ul><li><p><strong>Basic-level tasks</strong> &#8594; full automation</p></li><li><p><strong>Mid to upper-level tasks</strong> &#8594; consultative, agent-supported experiences</p></li></ul><h2>Autonomy Spectrum for Core Web Use Cases</h2><p>As we examined, not every task on the web requires&#8212;or deserves&#8212;the same level of oversight. Some can be fully delegated to agents, while others demand active human involvement. The <strong>Autonomy Spectrum</strong> illustrates how common use cases divide across three modes of control: <strong>Agent-Led (full autonomy)</strong>, <strong>Collaborative (partial autonomy)</strong>, and <strong>User-Led (low autonomy)</strong>.</p><p><strong>Use Case: Stay Informed</strong></p><ul><li><p><em>Agent-Led</em>: Daily news digest, sentiment alerts</p></li><li><p><em>Collaborative</em>: Curated deep-dive</p></li><li><p><em>User-Led</em>: Op-ed comparison</p></li></ul><p><strong>Use Case: Learn &amp; Research</strong></p><ul><li><p><em>Agent-Led</em>: Collect abstracts</p></li><li><p><em>Collaborative</em>: Draft literature review</p></li><li><p><em>User-Led</em>: Final thesis</p></li></ul><p><strong>Use Case: Communicate</strong></p><ul><li><p><em>Agent-Led</em>: Auto-sort inbox</p></li><li><p><em>Collaborative</em>: Suggest talking points</p></li><li><p><em>User-Led</em>: Deliver bad news</p></li></ul><p><strong>Use Case: E-Commerce</strong></p><ul><li><p><em>Agent-Led</em>: Restock consumables</p></li><li><p><em>Collaborative</em>: Laptop shortlist</p></li><li><p><em>User-Led</em>: One-of-a-kind art</p></li></ul><p><strong>Use Case: Finance</strong></p><ul><li><p><em>Agent-Led</em>: Pay utilities</p></li><li><p><em>Collaborative</em>: Portfolio rebalance</p></li><li><p><em>User-Led</em>: High-risk investment</p></li></ul><p><strong>Use Case: Travel &amp; Logistics</strong></p><ul><li><p><em>Agent-Led</em>: Book commutes</p></li><li><p><em>Collaborative</em>: Business trip planning</p></li><li><p><em>User-Led</em>: Honeymoon</p></li></ul><p><strong>Use Case: Creative Work</strong></p><ul><li><p><em>Agent-Led</em>: Resize images</p></li><li><p><em>Collaborative</em>: First-pass ad copy</p></li><li><p><em>User-Led</em>: Final brand voice</p></li></ul><p><strong>Use Case: Security &amp; Compliance</strong></p><ul><li><p><em>Agent-Led</em>: Patch vulnerabilities</p></li><li><p><em>Collaborative</em>: Flag unusual logins</p></li><li><p><em>User-Led</em>: Regulatory reports</p></li></ul><h2>Deep Dive: E-Commerce at Two Extremes</h2><p><strong>Household Staples (Toilet Paper)</strong></p><ul><li><p><strong>Intent</strong>: &#8220;Buy the usual brand, cheapest price, deliver tomorrow.&#8221;</p></li><li><p><strong>Agent Action</strong>:</p><ul><li><p>Checks price/coupons</p></li><li><p>Verifies discounts</p></li><li><p>Executes payment</p></li></ul></li><li><p><strong>User Involvement</strong>:</p><ul><li><p>Push notification: &#8220;Order placed: $11.20, arrives Tue.&#8221;</p></li></ul></li><li><p><strong>Why It Works</strong>: Low cost, reversible, no emotional weight.</p></li></ul><p><strong>Luxury Apparel (Designer Dress)</strong></p><ul><li><p><strong>Intent</strong>: &#8220;Find a black cocktail dress, budget &#8364;800, deliver before July 10.&#8221;</p></li><li><p><strong>Agent Action</strong>:</p><ul><li><p>Curates options with return policies</p></li><li><p>Flags shipping estimates</p></li></ul></li><li><p><strong>User Involvement</strong>:</p><ul><li><p>Reviews shortlist</p></li><li><p>Confirms preference and payment</p></li></ul></li><li><p><strong>Why Collaboration Matters</strong>: High cost, taste sensitivity, potential return hassle.</p></li></ul><h2>Behaviour Shift: From Browsing to Outcomes</h2><p>We established that web usage is changing from the bottom up. The Agentic Web reframes the question from <strong>&#8220;Where should I click?&#8221;</strong> to <strong>&#8220;What should happen?&#8221;</strong></p><p><strong>Browsing</strong>, the core user behaviour of the current web, is shaken to the bone. Many businesses have built around this "random walk" pattern to influence users. But with outcome-driven agents, much of this activity diminishes. But not all of it.</p><h3>Behaviours Likely to Fade</h3><p><strong>Fading Task: Typing search queries and clicking through 10 blue links</strong></p><p><em>Why It Disappears</em>: Agents gather, rank, and synthesize facts; users receive direct answers or auto-performed actions.</p><p><strong>Fading Task: Hand-comparing prices and coupon codes</strong></p><p><em>Why It Disappears</em>: Agents benchmark and buy when target conditions are met.</p><p><strong>Fading Task: Filling repetitive forms</strong></p><p><em>Why It Disappears</em>: Agents transmit verified identity and payment tokens via secure APIs.</p><p><strong>Fading Task: Daily email triage</strong></p><p><em>Why It Disappears</em>: Agents auto-sort, draft replies, or resolve routine items.</p><p><strong>Fading Task: SEO-driven &#8220;listicle&#8221; content farms</strong></p><p><em>Why It Disappears</em>: Thin content loses relevance as agents filter for decision-ready information.</p><p><strong>Fading Task: Banner and pre-roll advertising</strong></p><p><em>Why It Disappears</em>: Agents filter non-value ads; commerce shifts to API-level offers and rev-share models.</p><p><strong>Fading Task: Manual social cross-posting &amp; scheduling</strong></p><p><em>Why It Disappears</em>: Agents generate, localize, A/B-test, and auto-publish across platforms.</p><p><strong>Fading Task: One-size-fits-all learning modules</strong></p><p><em>Why It Disappears</em>: Adaptive tutors offer personalized flows.</p><p><strong>Fading Task: First-level customer support chat trees</strong></p><p><em>Why It Disappears</em>: Domain-specific agents resolve routine queries; humans intervene only for edge cases.</p><h3>Why We&#8217;ll Still Load the Site</h3><p>Automation will handle low-stakes tasks, but humans will continue to access traditional websites in critical situations&#8212;where <strong>trust, regulation, or experience</strong> are key.</p><p><strong>Reason: Trust and liability</strong></p><p><em>When It Matters</em>: Medical, legal, and financial content where users need to verify the author, credentials, and source authority.</p><p><strong>Reason: Immersive shopping</strong></p><p><em>When It Matters</em>: Augmented reality demos, 3D product views, and virtual try-ons that enhance purchase confidence.</p><p><strong>Reason: Community and story</strong></p><p><em>When It Matters</em>: Forums, comment sections, live events, and newsletters that foster social engagement and ongoing participation.</p><p><strong>Reason: Complex interactivity</strong></p><p><em>When It Matters</em>: Configurators, dashboards, and simulation tools that require real-time input and responsive interfaces.</p><p><strong>Reason: Identity and transactions</strong></p><p><em>When It Matters</em>: Secure checkouts, user account portals, and Know Your Customer (KYC) processes where manual review or confirmation is essential.</p><p><strong>Reason: Emotional or milestone decisions</strong></p><p><em>When It Matters</em>: Life events like planning a wedding, choosing a school, or evaluating surgical options&#8212;situations that demand deep content, visual context, and deliberate comparison.</p><p><strong>Rule of thumb</strong>:<br>If the user must <strong>feel</strong>, <strong>prove</strong>, or <strong>experience</strong> something&#8212;<strong>emotionally</strong>, <strong>legally</strong>, or <strong>interactively</strong>&#8212;they will still open the website or original source.</p><h2>Next: Agentic Interfaces</h2><p>We&#8217;ll shift from back-end rails to the <strong>touchpoints where humans and agents converge</strong>, tracing the emergent patterns that let software signal intent, share control, and fade elegantly into the background.</p><p>In short, <strong>Part 4</strong> maps how interface design must evolve when <strong>autonomy</strong>, not clicks, becomes the primary mode of interaction.</p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for your interest in my thoughts. Now pass the knowledge on!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/p/agentic-web-part-3-evolution-of-web?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><p>Learn more about the way ADS4GPTS is changing the monetization of the internet by aligning human and AI incentives</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.ads4gpts.com&quot;,&quot;text&quot;:&quot;Visit ADS4GPTS&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.ads4gpts.com"><span>Visit ADS4GPTS</span></a></p>]]></content:encoded></item><item><title><![CDATA[The Agentic Web Part 2: Anatomy of the Agentic Web]]></title><description><![CDATA[Examine the technical components and workflows of the Agentic web.]]></description><link>https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Mon, 30 Jun 2025 15:35:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wp28!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f41c29-ff2f-45d1-9fe7-4a8352c68f35_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.ads4gpts.com" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dhnw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dhnw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png" width="500" height="100" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:100,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.ads4gpts.com&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/167164907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dhnw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!Dhnw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda460c1b-d800-474c-ad0a-7d70cde03b7b_500x100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Welcome to Part 2.</strong><br><em>Imagine you open an invoice link and your screen fills with a 600-line JSON blob.<br></em> A procurement bot could parse, validate, and pay that bill in under a second.<br> You, meanwhile, are left hunting for the total and wondering if your browser just broke.</p><p>That tiny hiccup captures the core design problem of the <strong>Agentic Web</strong>: people and software agents want the <strong>same data for different reasons via radically different routes.</strong></p><h2><strong>Web 2.0 Actors &#8211; Who&#8217;s Really Online?</strong></h2><p>Ask the average passer-by who &#8220;uses&#8221; the internet and they&#8217;ll picture a human in front of a screen. Reality is more crowded. In 2025, automated traffic&#8212;everything from search-engine spiders to health-check pings&#8212;now edges out human clicks, accounting for <strong>51 % of total web requests</strong>.(<a href="https://www.securityweek.com/bot-traffic-surpasses-humans-online-driven-by-ai-and-criminal-innovation/?utm_source=chatgpt.com">securityweek.com</a>) Much of that spike comes from AI crawlers such as those run by OpenAI and Anthropic, which vacuum up content to train their models.</p><p>Understanding the <strong>intent</strong> behind each request&#8212;not just its interface&#8212;matters, because two identical HTTP calls can serve wildly different purposes. Here&#8217;s the cast:</p><h3>Humans (Users)</h3><ul><li><p><strong>What They Care About:</strong><br>Clear outcomes &#8212; trust, delight, task completion</p></li><li><p><strong>Core Traits:</strong><br>Sensory, context-rich, emotion-driven</p></li><li><p><strong>Primary Constraints:</strong><br>Cognitive load, accessibility, privacy expectations</p></li></ul><h3>Bots (Crawlers / Scrapers)</h3><ul><li><p><strong>What They Care About:</strong><br>A comprehensive, fresh link graph; maximum page fetches per crawl cycle</p></li><li><p><strong>Core Traits:</strong><br>Headless, pattern-matching, largely stateless</p></li><li><p><strong>Primary Constraints:</strong><br><code>robots.txt</code>, CAPTCHA walls, IP blocks, bandwidth ceilings</p></li></ul><h3>Application System Processes</h3><p><em>(APIs, webhooks, schedulers, service-mesh sidecars, infra probes)</em></p><ul><li><p><strong>What They Care About:</strong><br>Reliable machine-to-machine orchestration &#8212; payments, health checks, ETL jobs, retries</p></li><li><p><strong>Core Traits:</strong><br>Deterministic, idempotent, authenticated</p></li><li><p><strong>Primary Constraints:</strong><br>Auth tokens/HMAC, exponential back-off, graceful degradation</p></li></ul><p><strong>Interface &#8800; Intent.</strong> A single URL might be fetched by a smartphone browser, a price-monitoring bot, or a serverless cron job. The packet payloads look the same; the motives&#8212;and therefore the design constraints&#8212;do not.</p><h2><strong>Why Bots Exist</strong></h2><p>Automation wins whenever a task can be distilled into repeatable HTTP calls and the payoff per request outstrips the cost of spinning up compute. Cloud minutes are cheap, bandwidth is plentiful, and HTTP is permission-less by design. That math makes an automated probe or fetch &#8220;always worth a try.&#8221;</p><h3><strong>The Good Side: Essential Infrastructure</strong></h3><ul><li><p><strong>Indexing &amp; Discovery.</strong> Search-engine crawlers trawl billions of pages so humans can find one relevant result in milliseconds&#8212;a feat no manual workforce could fund or finish.</p></li><li><p><strong>Performance &amp; Health.</strong> CDN nodes, uptime monitors, and service-mesh sidecars fire constant pings to keep modern apps reliable.</p></li><li><p><strong>Market Efficiency.</strong> Price-comparison bots, accessibility readers, and research aggregators turn raw pages into actionable data streams.</p></li></ul><p>These &#8220;good bots&#8221; underpin everything from Google Search to real-time stock quotes.</p><h3><strong>The Dark Side: Exploitation</strong></h3><p>The same cost asymmetry fuels a myriad of bad things:</p><h4>Account Takeover</h4><ul><li><p><strong>Bot Tactic:</strong> Credential-stuffing scripts hammer login endpoints</p></li><li><p><strong>Impact:</strong> Mass breaches, identity theft</p></li></ul><h4>Scalping &amp; Reselling</h4><ul><li><p><strong>Bot Tactic:</strong> Millisecond-level checkout automation</p></li><li><p><strong>Impact:</strong> Empty shelves, inflated resale prices</p></li></ul><h4>Content Theft</h4><ul><li><p><strong>Bot Tactic:</strong> Full-site scrapers ignore <code>robots.txt</code></p></li><li><p><strong>Impact:</strong> SEO dilution, lost ad and subscription revenue</p></li></ul><h4>Ad Fraud</h4><ul><li><p><strong>Bot Tactic:</strong> Headless browsers spoof impressions and clicks</p></li><li><p><strong>Impact:</strong> Billions in wasted ad spend</p></li></ul><h3>Scale Works Against You</h3><p>In 2024, <strong>automated traffic officially overtook human traffic</strong> for the first time:</p><blockquote><p>&#128314; Bots generated <strong>51% of all web traffic</strong>, with <strong>37% classified as &#8220;bad&#8221;</strong><br>&#8212; <a href="https://www.imperva.com/">Imperva, 2024</a></p></blockquote><h4><strong>One Protocol, Divergent Motives</strong></h4><p>Two GETs to the same URL can arrive from a Chrome tab, a polite crawler, or a credential-stuffing botnet. The packet footprints match; the intentions diverge.</p><h2><strong>New Kid on the Block: AI Agents</strong></h2><p>Web 2.0 bots execute pre-baked scripts; Web 4.0 agents <strong>reason, plan, and adapt.</strong> They consume a goal (&#8220;book me the cheapest flight tomorrow morning&#8221;), fetch just-in-time context, decide on a sequence of calls, and then act&#8212;without a human hand on every click. Researchers have begun calling this emergent layer the <strong>&#8220;Agentic Web,&#8221;</strong> where autonomous software negotiates, purchases, and publishes alongside us.(<a href="https://www.linkedin.com/pulse/agentic-web-how-40-ai-generative-intelligence-redefining-jha-1ydhc?utm_source=chatgpt.com">linkedin.com</a>,<a href="https://www.frontiersin.org/journals/blockchain/articles/10.3389/fbloc.2025.1591907/full?utm_source=chatgpt.com"> frontiersin.org</a>)</p><p>The upgrade isn&#8217;t merely faster scripting; it&#8217;s a shift from <strong>automation</strong> (repeatable tasks) to <strong>autonomy</strong> (goal-directed workflows):</p><ul><li><p><strong>What They Care About:</strong><br>Immediate, accurate outcomes with minimal latency &#8212; clean data, unambiguous instructions, and deterministic results</p></li><li><p><strong>Core Traits:</strong><br>Code-driven, data-hungry, task-oriented<br>Endowed with memory and planning capabilities</p></li><li><p><strong>Primary Constraints:</strong><br>Rate limits<br>Strong authentication<br>Observability<br>Predictable side effects</p></li></ul><p>In short, the web&#8217;s newest participant isn&#8217;t just another bot; it&#8217;s a decision-maker. Understanding its incentives and guardrails is crucial, because when agents act, they do so at machine speed&#8212;but with stakes that feel distinctly human.</p><h3><strong>The Universal Agentic Workflow Framework</strong></h3><p>Autonomous agents outperform dumb bots because they follow a disciplined, <strong>human-modeled loop</strong>: they set a goal, plan, gather context, act, check their own work, and learn from the outcome. Below is the canonical seven-phase cycle every reliable agent must execute on the Agentic Web.</p><p>Here&#8217;s a <strong>Substack-native rewrite</strong> of your table &#8212; optimized for clarity, scanability, and formatting consistency across email and web readers. Each phase is presented as a clear, standalone section to maintain flow and authority:</p><h4><strong>1. Intent (Query)</strong></h4><ul><li><p><strong>What Happens:</strong> Extract the user&#8217;s goal and constraints. Clarify ambiguities (e.g., &#8220;Which Dr. Lewis?&#8221;).</p></li><li><p><strong>Why It Matters:</strong> Clear intent prevents downstream rework and misalignment.</p></li><li><p><strong>Actor:</strong> &#129489; Human (user prompt)</p></li></ul><h4><strong>2. Reasoning</strong></h4><ul><li><p><strong>What Happens:</strong> Decompose the goal into ordered sub-tasks. Choose a compliant and efficient approach.</p></li><li><p><strong>Why It Matters:</strong> Poor reasoning in regulated domains can lead to liability.</p></li><li><p><strong>Actor:</strong> &#129302; Interface agent</p></li></ul><h4><strong>3. Context Gathering</strong></h4><ul><li><p><strong>What Happens:</strong> Pull relevant data &#8212; personal preferences, credentials, policy limits, inventory. May coordinate with other agents.</p></li><li><p><strong>Why It Matters:</strong> Even flawless logic fails on stale or incomplete data.</p></li><li><p><strong>Actor:</strong> &#129302; Interface + external agents / data APIs</p></li></ul><h4><strong>4. Execution (Tool Calls)</strong></h4><ul><li><p><strong>What Happens:</strong> Call APIs, complete forms, trigger RPA &#8212; all as atomic and reversible steps.</p></li><li><p><strong>Why It Matters:</strong> This is where latency, rate limits, and edge cases become visible.</p></li><li><p><strong>Actor:</strong> &#129302; Interface agent (local tools) + external services</p></li></ul><h4><strong>5. Reflection</strong></h4><ul><li><p><strong>What Happens:</strong> Verify the outcome against the original intent. Compare before/after state. Log discrepancies.</p></li><li><p><strong>Why It Matters:</strong> Catches silent failures and powers the learning loop.</p></li><li><p><strong>Actor:</strong> &#129302; Interface agent</p></li></ul><h4><strong>6. Human Audit</strong></h4><ul><li><p><strong>What Happens:</strong> Pause for review, approval, or override&#8212;especially in high-stakes scenarios.</p></li><li><p><strong>Why It Matters:</strong> Satisfies ethical, legal, and emotional thresholds.</p></li><li><p><strong>Actor:</strong> &#129489; Human</p></li></ul><h4><strong>7. Iterative Feedback</strong></h4><ul><li><p><strong>What Happens:</strong> Store explicit &#128077; / &#128078; or learn from implicit corrections. Continuously update behavior.</p></li><li><p><strong>Why It Matters:</strong> Turns one success into a pattern of accuracy gains.</p></li><li><p><strong>Actor:</strong> &#129489; Human + &#129302; Interface agent</p></li></ul><h4><strong>Who Does What?</strong></h4><ul><li><p><strong>Humans</strong>: supply the <em>query</em>, review high-risk actions, and provide feedback.</p></li><li><p><strong>Interface Agent</strong>: The AI that directly communicates with the human. Plans, personalizes via memory, executes trusted tools, and self-reflects.</p></li><li><p><strong>External Agents/Capabilities</strong>: enrich context (e.g., web search, partner APIs) and perform domain-specific tool calls the Interface agent can&#8217;t.</p></li></ul><p>Two capabilities super-charge that middle layer:</p><ol><li><p><strong>Search</strong> &#8211; live, scoped retrieval of facts the agent doesn&#8217;t yet know.</p></li><li><p><strong>Computer Use</strong> &#8211; browser automation for any site that lacks a public API, keeping agents as versatile as humans with a mouse.</p></li></ol><h3><strong>Agentic Protocol Layer</strong></h3><p>Agents need more than raw HTTP. They rely on a thin but crucial protocol layer that standardises how context is loaded, tasks are handed off, work is orchestrated, and results are audited. Below is a snapshot of that layer, aligned to today&#8217;s live specs and vendor road-maps.</p><h4><strong>Context &amp; Tool Access</strong></h4><p><strong>Purpose:</strong> Expose data and executable functions to models<br><strong>Key Specs:</strong></p><ul><li><p>Model Context Protocol (MCP) &#8212; <em>"USB-C for AI context"</em> (<a href="https://www.anthropic.com/">Anthropic</a>)</p></li><li><p>Function-calling / Agent-invocation APIs (<a href="https://platform.openai.com/">OpenAI</a>, <a href="https://docs.aws.amazon.com/">AWS</a>, <a href="https://cloud.google.com/">Google Cloud</a>)</p></li></ul><h4><strong>Agent-to-Agent Collaboration</strong></h4><p><strong>Purpose:</strong> Structured task hand-off and negotiation between autonomous agents<br><strong>Key Specs:</strong></p><ul><li><p>A2A (Agent-to-Agent) Protocol (<a href="https://linuxfoundation.org/">Linux Foundation</a>)</p></li><li><p>ACP (Agent Communication Protocol) (<a href="https://agentcommunicationprotocol.dev/">agentcommunicationprotocol.dev</a>)</p></li></ul><h4><strong>Workflow &amp; Orchestration</strong></h4><p><strong>Purpose:</strong> Chain function calls, manage state, retries, and branching logic<br><strong>Key Specs:</strong></p><ul><li><p>LangGraph Patterns (<a href="https://langchain-ai.github.io/">langchain-ai.github.io</a>)</p></li><li><p>Microsoft AutoGen multi-agent workflow engine</p></li></ul><h4><strong>Discovery &amp; Registry</strong></h4><p><strong>Purpose:</strong> Publish and locate agent capabilities<br><strong>Key Specs:</strong></p><ul><li><p>A2A &#8220;Agent Cards&#8221; endpoint</p></li><li><p>OpenAI Plugin Manifest &amp; Function Registry (in development)</p></li></ul><h4><strong>Control-Plane</strong></h4><p><strong>Purpose:</strong> Enforce policy, authentication, rate limits, and capture telemetry<br><strong>Key Specs:</strong></p><ul><li><p>mTLS for agent-to-agent trust</p></li><li><p>OpenTelemetry Gen-AI semantic conventions (<a href="https://opentelemetry.io/blog/2024/otel-generative-ai/">opentelemetry.io</a>)</p></li></ul><h4>Still Missing</h4><p>While this stack reflects the current state of the field, there are key gaps &#8212; especially in:</p><ul><li><p>Event-driven architectures</p></li><li><p>Pub/Sub messaging</p></li><li><p>Multimodal context streaming</p></li><li><p>Agent memory standardization</p></li><li><p>and more</p></li></ul><p>The field is evolving fast, but many foundational elements remain fragmented or immature.</p><p><strong>Why it matters</strong>:<br> <em>HTTP moves the bytes; these specs move intent with accountability.</em> Together they let any compliant agent discover tools, invoke them safely, delegate subtasks to other agents, and surface verifiable receipts&#8212;turning the open web into a programmable operating system rather than a patchwork of ad-hoc scrapers.</p><p><strong>Agents != Bots</strong></p><p>Bots were a necessary workaround, not our destiny. By treating agents as welcomed guests&#8212;complete with their own front door and house rules&#8212;we can ditch the cat-and-mouse games, shrink CAPTCHA fatigue, and make the internet faster, fairer, and more open for everyone.</p><h2><strong>The Fork in the Road: One Web or Two?</strong></h2><p>Over the next 24 months every product team will have to decide whether to extend the current, human-centred web or stand up a parallel rail optimised for autonomous agents. Both paths are open; neither is free of trade-offs.</p><p>Basically do AI agents try to act as humans and move a mouse and tap keyboards or swipe on mobile or they directly plug in the systems action controls?</p><h3><strong>Option A &#8212; Keep a Single Surface</strong></h3><p>Sites continue to serve the same URLs humans visit, but enrich them with machine-readable cues (JSON-LD, schema.org, micro-data) or a /.well-known/mcp endpoint so an agent can ask for agent+json while a browser still receives HTML.</p><p><strong>Why teams like it</strong></p><ul><li><p><strong>Zero new stack debt.</strong> You evolve, rather than rebuild, your web tier.</p></li><li><p><strong>Universal reach.</strong> Browsers, crawlers, screen-readers, LLMs&#8212;everyone hits the same address.</p></li><li><p><strong>SEO continuity.</strong> Backlinks and ranking signals keep working.</p></li></ul><p><strong>Why it strains over time</strong></p><ul><li><p><strong>Heavy pages hurt agents.</strong> The median desktop page now ships ~2.6 MB of CSS, images and ads &#8212;bloat an agent must load, parse and pay to tokenize. Also hurts LLM performance. (<a href="https://almanac.httparchive.org/en/2024/page-weight?utm_source=chatgpt.com">almanac.httparchive.org</a>)</p></li><li><p><strong>Blurry security signals.</strong> Helpful booking agents and credential-stuffing botnets look identical in the logs.</p></li><li><p><strong>Publisher revenue risk.</strong> If an LLM scrapes only to give answers, publishers lack any incentive to upload new content since their ad-based monetization model simply does not work.</p></li></ul><h3><strong>Option B &#8212; Stand Up a Dedicated Agent Rail</strong></h3><p>Expose a slim, authenticated interface for everything around AI agents: Model Context Protocol (MCP) for data fetches, Agent-to-Agent (A2A) for secure task hand-offs. Agents identify themselves, negotiate rate-limits and receive terse JSON: no CSS, no CAPTCHAs, no token waste.</p><p><strong>Why it&#8217;s compelling</strong></p><ul><li><p><strong>Efficiency gains.</strong> JSON payloads are 20-50&#215; lighter than full HTML, slicing latency, GPU time and carbon.</p></li><li><p><strong>Built-in governance.</strong> Scoped OAuth 2.1 tokens, mTLS and execution receipts make abuse throttling explicit.</p></li><li><p><strong>New business levers.</strong> Context and agentic capabilities become desired commodities that owners can create novel business models around.</p></li></ul><p><strong>Where it bites</strong></p><ul><li><p><strong>Resource gap.</strong> Small publishers may lack time or talent to run and police a second interface.</p></li><li><p><strong>Fragmentation risk.</strong> Without shared specs, we could repeat the browser-compatibility wars.</p></li><li><p><strong>Discovery reset.</strong> Ranking Agents - not pages - demands fresh search paradigms and tooling.</p></li><li><p><strong>Innovation parallels mobile APIs.</strong> Remember when JSON REST services unlocked the app economy? Agent rails promise similar uplift&#8212;atomic capabilities, composable in any interface.</p></li></ul><p>Watch closely the <a href="https://www.ads4gpts.com">ADS4GPTS</a> and our other stealth project for innovations in this space.</p><p>Yet the &#8220;two-web&#8221; future is not automatic. Coordination will decide whether we get <strong>USB-C-level interoperability</strong> or a <strong>VHS-vs-Betamax</strong> rerun. Projects such as open-source MCP servers and Linux Foundation&#8217;s A2A spec give cause for optimism, but only if product teams treat them as <em>baseline plumbing</em>, not vendor lock-in.(<a href="https://www.gravitee.io/blog/googles-agent-to-agent-a2a-and-anthropics-model-context-protocol-mcp?utm_source=chatgpt.com">gravitee.io</a>,<a href="https://devblogs.microsoft.com/azure-sdk/introducing-the-azure-mcp-server/?utm_source=chatgpt.com"> devblogs.microsoft.com</a>)</p><p>Finally it will be interesting to see the decisions of incumbents in the internet, search and publishing space. Cloudflare&#8217;s CEO Matthew Prince is one of the first to openly talk about this and his stance is clear: this is the end of scraping. This means that Cloudflare will be betting and working on a dedicated rail for agentic data-hungry workflows (<a href="https://www.youtube.com/watch?v=H5C9EL3C82Y">axios</a>).</p><h2><strong>Agents as First-Class Citizens of the Web</strong></h2><p>Whether we choose a single mixed surface or a dedicated rail, one principle must survive the transition: <strong>autonomous agents deserve the same design respect as human users.</strong> An agent is not &#8220;just another bot.&#8221; It carries a person&#8217;s intent, wallet, and liability into the network. Ignoring that status&#8212;forcing agents to scrape, spoof headers, or dodge CAPTCHAs&#8212;doesn&#8217;t merely slow them down; it erodes the very trust we rely on when we delegate tasks to software.</p><h3><strong>Why &#8220;First-Class&#8221; Matters</strong></h3><ol><li><p><strong>Delegated Authority<br></strong> When you ask an agent to &#8220;rebook my flight&#8221; or &#8220;move &#8364;10 000 to Treasury bills,&#8221; you&#8217;ve handed over legal and financial agency. The web must recognise that authority with explicit identity, scoped credentials, and auditable logs. Think of it as a human with power of attorney.</p></li><li><p><strong>Predictable Contracts<br></strong> Human-centred rate limits assume seconds between clicks; agents operate in milliseconds. Treating them as first-class citizens means publishing machine-negotiable SLAs and quota ceilings, so the system fails gracefully instead of rate-banning the user&#8217;s entire day.</p></li><li><p><strong>Security Through Transparency<br></strong> If an agent can declare who it represents and <em>what</em> capability it is invoking, orchestrators can block bad actors with surgical precision. No more collateral damage from blanket CAPTCHA gates or IP blacklists.</p></li><li><p><strong>Economic Alignment<br></strong> Publishers worry about lost ad impressions; users worry about token costs; providers worry about GPU bills. First-class treatment lets us meter, price, and share value explicitly, turning today&#8217;s friction into tomorrow&#8217;s business model.</p></li></ol><h4><strong>The Strategic Upshot</strong></h4><ul><li><p><strong>For Developers:</strong> embracing first-class agents early means fewer brittle work-arounds, lower infra bills, and a cleaner architecture when regulations tighten.</p></li><li><p><strong>For Publishers:</strong> authenticated agents offer a chance to charge for data instead of losing it to unmetered scraping.</p></li><li><p><strong>For Users:</strong> reliable delegation frees them from micro-management, because their digital proxy enjoys the kind of predictable service quality humans already expect.</p></li></ul><p>The web did this once before: browsers became first-class citizens when we moved from Telnet to HTTP 1.0. Repeating that leap for autonomous software will decide whether the Agentic Web becomes an open commons or a patchwork of paywalls, scrapers, and broken UX. Treat agents with parity now, and the ecosystem will repay us with speed, safety, and entirely new modes of value creation.</p><h2>Next: Evolution of Web Use</h2><p>We now turn from infrastructure to behaviour, tracking how familiar rituals like searching, shopping, learning, compress into terse prompts once agents shoulder the work. Part 3 sketches this shift from manual browsing to outcome-oriented delegation, revealing what disappears, what endures, and what entirely new habits emerge.</p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for your interest in my thoughts. Now pass the knowledge on!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/p/the-agentic-web-part-2-anatomy-of?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><p>Learn more about the way ADS4GPTS is changing the monetization of the internet by aligning human and AI incentives</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.ads4gpts.com&quot;,&quot;text&quot;:&quot;Visit ADS4GPTS&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.ads4gpts.com"><span>Visit ADS4GPTS</span></a></p>]]></content:encoded></item><item><title><![CDATA[Introduction to the Agentic Web: Vision and Definitions]]></title><description><![CDATA[Explore the internet shift from human-driven interactions to autonomous AI agents.]]></description><link>https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision</link><guid isPermaLink="false">https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision</guid><dc:creator><![CDATA[Ioannis Bakagiannis]]></dc:creator><pubDate>Mon, 30 Jun 2025 15:32:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wp28!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67f41c29-ff2f-45d1-9fe7-4a8352c68f35_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DKn-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DKn-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png" width="500" height="100" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:100,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20780,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bakagiannis.substack.com/i/167158003?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DKn-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 424w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 848w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1272w, https://substackcdn.com/image/fetch/$s_!DKn-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218bb88f-8eae-42b3-bf64-bd8f0d76f79d_500x100.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>Imagine a World...</strong></h2><p>Consider this scenario: at 7:02 a.m., before you even silence your morning alarm, your personal AI assistant has quietly booked you a cheaper, lower-carbon flight, seamlessly adjusting your calendar to accommodate this change. Later that morning, as you prepare for a marathon scheduled for Sunday, you instruct your AI agent to procure the ideal pair of running shoes within your preferred price range. It swiftly evaluates dozens of retailers, assesses user reviews, checks inventory availability, and completes the transaction. All you experience is the assurance that the best possible outcome has been delivered effortlessly.</p><h2><strong>What Is the "Agentic Web"?</strong></h2><p>The Agentic Web represents the fourth generation of internet evolution, marking a profound shift from human-driven interactions to autonomous AI agents. Unlike previous iterations&#8212;Web 1.0&#8217;s static pages, Web 2.0&#8217;s interactive and social content, and Web 3.0&#8217;s focus on decentralized data&#8212;the Agentic Web is characterized by proactive, context-aware agents executing tasks on behalf of users.</p><p>In simple terms, as articulated by industry commentators, users "no longer interact directly with applications or APIs, but with intelligent agents acting as active, autonomous intermediaries" (dev.to). These agents are not merely passive tools; they possess the capability to perceive context, reason about goals, and autonomously execute tasks, effectively turning the internet from a passive information repository into a dynamic, collaborative ecosystem.</p><h2><strong>The Evolutionary Journey of the Web</strong></h2><p>To truly appreciate the revolutionary potential of the Agentic Web, let's briefly revisit the previous web generations:</p><ul><li><p><strong>Web 1.0 (1990s&#8211;early 2000s):</strong> Primarily read-only, characterized by static HTML pages and limited interactivity.</p></li><li><p><strong>Web 2.0 (mid-2000s&#8211;2010s):</strong> User-generated content and social interactions, exemplified by platforms like Facebook, Wikipedia, and YouTube.</p></li><li><p><strong>Web 3.0 (2010s&#8211;2020s):</strong> Emphasized decentralization, linked data, semantic content, and user data ownership via technologies such as blockchain.</p></li><li><p><strong>Web 4.0 (2020s&#8211;):</strong> Autonomous AI agents become the primary actors, enabling users to simply declare intentions while agents proactively manage complex interactions, transcending manual tasks and navigation.</p></li></ul><blockquote><p><em>As succinctly summarized by one analyst: "If Web 1.0 was read-only, Web 2.0 let us interact and collaborate, and Web 3.0 focused on decentralization and connected data, Web 4.0 introduces autonomous agents capable of reasoning, acting, and collaborating" (dev.to).</em></p></blockquote><h2><strong>Key Differentiators of the Agentic Web</strong></h2><p>What fundamentally distinguishes Web 4.0 from its predecessors is the shift from explicit, manual interactions to implicit, intent-driven experiences. Rather than users manually comparing flights or creating complex dashboards, AI agents autonomously navigate across multiple services, perform comparisons, and assemble customized results.</p><p>This reduces cognitive load, increases efficiency, and enables personalized, contextually relevant outcomes.</p><p>Further, personalization in Web 4.0 moves beyond limited recommendation algorithms, evolving into real-time, context-aware adaptability. Agents continuously learn and remember user preferences, past requests, and behaviours, collaborating dynamically among themselves to fulfill complex tasks in a manner completely tailored to each user&#8217;s immediate context (gate.com).</p><p>This represents a move away from one-size-fits-all interfaces to fully bespoke, agent-generated experiences.</p><h2><strong>Why Now? &#8211; Market and Technological Drivers</strong></h2><p>Several crucial factors are driving the timely emergence of the Agentic Web:</p><p><strong>GPU Economics:</strong> The cost of GPU-based computation, essential for training and running sophisticated AI models, has dramatically fallen&#8212;approximately 70% year-over-year. This significant reduction makes the deployment of continuous, autonomous AI agents economically viable, allowing them to operate efficiently in the background without substantial costs.</p><p><strong>AI Efficiency and Execution:</strong> Advances in machine learning, notably in large language models (LLMs), have significantly increased the efficiency, reliability, and effectiveness of AI agents. Today&#8217;s AI can manage complex multi-step tasks, communicate seamlessly with other agents, and maintain consistent, reliable performance.</p><h2><strong>The Two-Gear Internet: Agent vs. Human Speed</strong></h2><p>The introduction of autonomous agents will create a dual-speed internet:</p><ul><li><p><strong>Agent-to-Agent Interactions:</strong> Fast, efficient, continuous, and automatic communication between AI agents.</p></li><li><p><strong>Human-to-Agent and Human-to-Human Interactions:</strong> Necessarily slower due to human processing limitations, but optimized by AI assistance to ensure maximum efficiency and effectiveness.</p></li></ul><p>Interfaces and gateways capable of seamlessly bridging these speeds will be critical. The future of the internet thus includes dynamic, adaptive interfaces designed specifically to mediate various communication channels, optimizing interactions based on context and participants involved.</p><h2><strong>Desired Features of the Agentic Web</strong></h2><ul><li><p><strong>Accountability and Transparency:</strong><br>AI agents must maintain clear audit trails of their decision-making processes, enabling human oversight and accountability. High-stakes decisions should require explainability, ensuring trust and compliance with emerging regulatory frameworks.</p></li><li><p><strong>Security and Robustness:</strong><br>Agents must operate within secure, sandboxed environments, utilizing zero-trust architectures and robust authentication protocols to mitigate risks from malicious actors or inadvertent misuse.</p></li><li><p><strong>Privacy Protection:</strong><br>Strong data protection measures, including on-device data processing, encryption, federated learning, and comprehensive user consent frameworks, should be integral to agent design, aligning with stringent data regulations.</p></li><li><p><strong>Fairness and Ethical Compliance:</strong><br>AI agents must actively mitigate biases and promote fairness, undergoing regular bias audits and adhering to clearly defined ethical guidelines and codes of conduct to ensure equitable outcomes.</p></li><li><p><strong>Human Autonomy and Control:</strong><br>Meaningful human oversight must remain central, particularly for critical decisions, preserving human agency and preventing dependency or deskilling.</p></li><li><p><strong>Human-AI Alignment:</strong><br>AI incentives, optimization and monetization procedures should align with human interests.</p></li><li><p><strong>International Collaboration and Standardization:</strong><br>Cross-border cooperation on regulatory frameworks, ethical standards, and technical interoperability is vital to avoid fragmentation and ensure coherent governance across the global digital ecosystem.</p></li></ul><h2><strong>Where Next: </strong></h2><p>As we delve deeper in the subsequent parts of this series, we will examine the gateways, business models, and ethical considerations inherent in the development of the Agentic Web. This exploration will further illustrate the profound implications and opportunities presented by this next-generation internet ecosystem, fundamentally altering how we engage with digital technology.</p><p>Part 2 is the X-ray that shows which bones bend, which joints break, and where entirely new organs are forming.</p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for your interest in my thoughts. Now pass the knowledge on!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://bakagiannis.substack.com/p/introduction-to-the-agentic-web-vision?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><p>Learn more about the way ADS4GPTS is changing the monetization of the internet by aligning human and AI incentives</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.ads4gpts.com&quot;,&quot;text&quot;:&quot;Visit ADS4GPTS&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.ads4gpts.com"><span>Visit ADS4GPTS</span></a></p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bakagiannis.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Temporal Perspective! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>