fb tracking

Letter to Regulators and Lawmakers on Antitrust Concerns Regarding the Integration of Generative AI into Search

By Robert Weissman

Download PDF

Dear Federal Trade Commission Chair Lina Khan, Assistant Attorney General for the Antitrust Division of the Department of Justice Jonathan Kanter, Senator Amy Klobuchar, Senator Mike Lee, Representative Thomas Massie, and Representative J. Luis Correa,

We are writing to urge you to investigate and issue guidelines related to the issue of Google, Microsoft and potentially other firms incorporating large language model (LLM) artificial intelligence (AI) into their internet search functionality. The prospect of Google integrating its new Gemini AI into its standard search function in early 2024 makes this request especially urgent.

Although the precise ways that LLMs will be incorporated into search are obviously evolving, we fear that LLM incorporation may unfairly and substantially injure competition. Even more profoundly, we fear that LLM incorporation could enable dominant search firms effectively to enclose and privatize the open internet.

To avert these dangers, we urge you to launch an immediate investigation of LLM incorporation into search and to issue guidelines on anti-competitive practices as expeditiously as possible.

The core concern we see is simply stated: LLM-provided narrative search results may provide returns that synthesize and effectively appropriate content available on the internet, diminishing the likelihood that users will click through to links of the original authors and providers of the information. This shift in user behavior may cause vast and deep injuries to internet content providers and fundamentally and detrimentally change how the internet works.

We want to emphasize that this concern relates to, but is fundamentally distinct from, issues relating to the training of LLMs.

I. Historic precedent and current context

The practice of search results returning direct answers to questions, rather than simply providing links to relevant sites, is not wholly new, of course – and it has already been a source of major controversy. Content providers for years have complained that dominant search firms are improperly using their content. For example, in 2019 testimony before the House of Representatives antitrust subcommittee, Brian Warner, the founder of CelebrityNetWorth.com, explained that Google’s decision to provide information from his site as a response to search queries – rather than simply a link to the site – plummeted traffic to his site, eventually by 80 percent.[1]

Concluded the House antitrust subcommittee in its 2020 report: “Google’s practice of misappropriating third-party content to bootstrap its own rival search services and to keep users on Google’s own webpage is further evidence of its monopoly power and an example of how Google has abused that power. Google seized value from third-party businesses without their consent. These businesses had no effective choice but to allow Google’s misappropriation to continue, given Google’s search dominance. In this way, Google leveraged its search dominance to misappropriate third-party content, free-riding on others’ investments and innovations.”[2]

This practice injures not just competitors but future investment in web-based content, innovation on the internet and the open internet itself. The House antitrust subcommittee report highlighted that multiple companies and investors told the committee they were deterred from new investments and experimenting with new forms of content because of fear that Google would reap the rewards.

The antitrust committee also noted the broader problem of enclosing the internet, highlighting studies showing that a majority of search requests resulted in no clicks outside of the Google ecosystem. A 2022 study found the proportion of zero-click search requests had jumped to 57 percent on mobile and 53 percent on desktop.[3] That study also, however, refined its categorization, to show that a substantial portion of zero-click searches in fact reflected users refining their search. About 18 percent of the zero-clicks were refined searches and about 10 percent were within the Google ecosystem.[4] Whatever the precise numbers, the point remains that current search engine practices foreshadow what may be coming at a much greater scale with the integration of LLMs into search.

In the following two sections, we highlight emerging practices and concerns with Google and Microsoft’s LLM-enabled search tools. In Section IV, we describe what we see as the anti-competitive harms, urging you to investigate further and issue guidelines to protect the open internet.

II. Google’s LLM-enabled search

In February 2023, Google announced the release of Bard, its new LLM tool. In the announcement, the company indicated that LLM would soon be integrated into its search engine.[5]

“Soon,” wrote company CEO Sundar Pichai, “you’ll see AI-powered features in Search that distill complex information and multiple perspectives into easy-to-digest formats, so you can quickly understand the big picture and learn more from the web: whether that’s seeking out additional perspectives, like blogs from people who play both piano and guitar, or going deeper on a related topic, like steps to get started as a beginner. These new AI features will begin rolling out on Google Search soon.”[6]

Bard now provides answers to queries that include narrative responses to complex questions, drawing on information widely available on the web.

Three examples follow, in response to queries we posed on December 26, 2023: 1) asking about the impact of climate change on the Amazon rainforest; 2) querying for an explanation of black holes; and 3) requesting a table on wildlife species in the Serengeti National Park in Tanzania.

NOTE: For examples, download the PDF version of the letter.

In December, Google announced the release of Gemini, its largest and most powerful AI model.[7] The company said that Gemini’s rollout would occur in phases, soon to be incorporated into search, as well as the Chrome browser.[8]

It is the prospect of integration into Google’s dominant search engine that makes it so urgent for you to investigate the integration of LLMs into search and to issue proactive guidelines.

III. Bing’s LLM-enabled search

In February 2023, Microsoft announced that it was integrating an OpenAI-generated LLM into its Bing search tool.[9]

Microsoft explained that the benefit of the LLM integration was to offer more direct replies in search and to provide more complex, direct answers to questions than traditional search can provide. The new tool, it said, would provide:

Better search. The new Bing gives you an improved version of the familiar search experience, providing more relevant results for simple things like sports scores, stock prices and weather, along with a new sidebar that shows more comprehensive answers if you want them.

Complete answers. Bing reviews results from across the web to find and summarize the answer you’re looking for. For example, you can get detailed instructions for how to substitute eggs for another ingredient in a cake you are baking right in that moment, without scrolling through multiple results.”[10]

Both of these stated purposes effectively mean that users will get answers inside the search response, with no need to click through to other websites. Bing answers generally provide footnotes, but there is every reason to expect few people to click through; indeed, Bing’s objective is to give users the information they are seeking in the search response.

In September, Microsoft announced that it would incorporate the Bing chat tool into Microsoft 365. Now rebranded as Microsoft Copilot, the AI tool is designed to enable users to access AI tools to manage and manipulate their own information and that of the business enterprise, as well as to provide LLM-enabled search results.[11] In its updated form, Copilot continues to aim to give detailed and synthetic answers in search responses. (“When you ask complex questions, Bing gives you detailed replies.” And: “Copilot looks at search results across the web to offer you a summarized answer and links to its sources.”[12])

Bing and Copilot do provide detailed, narrative answers to queries, based on information on the internet. Three examples follow, in response to queries we posed on December 26, 2023: 1) asking about the impact of climate change on the Amazon rainforest; 2) querying for an explanation of black holes; and 3) requesting a table on wildlife species in the Serengeti National Park in Tanzania.

NOTE: For examples, download the PDF version of the letter.

IV. Integration of LLMs into search poses anti-competitive concerns

Our concern is that search results from dominant companies will effectively enclose and privatize the expansive and diverse information of the internet, built by the incalculable number of volunteer hours and the massive public and private monetary investment devoted to developing, formatting and presenting internet content. As noted, this problem already exists, but it threatens to become far worse, as LLM-assisted search results provide not just facts and information snippets but synthetic and complex answers.

Search companies may be able to unfairly benefit from the investment of competitors and gain unfair advantage over them. It will be increasingly difficult for content providers to monetize their investments – or for nonprofits to gain followers or for volunteers to get credit – if users get all the information they are seeking from search without clicking through to the content providers’ websites. In this scenario, the incentives to develop and innovate web-based content will diminish still further, threatening the vigor and even viability of the open internet.

The examples provided above illustrate these concerns. The Copilot-generated table of species in the Serengeti comes directly from a single source,[13] leaving users little reason to click through to that source. Provided with the detailed response from Bard about the impact of climate change on the Amazon, only the most interested users would click through to the linked sources. The Bard response to the query on black holes did not include any links or footnotes, providing no direct pathway to the sources on which it drew.

A further example elaborates on the concern. By way of explaining what it is, Bing prompted us to ask it to prepare a table of volcanic activity over the last 10 years. It replied with a table and provided as its first source a link for starctmag.com.[14] The starctmag.com link included an almost identical table, as revealed below. (As it happens, the starctmag.com table is itself likely derivative from some other original source.)

NOTE: For examples, download the PDF version of the letter.

These problems cannot be addressed by private litigation, at least not on a systemic basis. Cases of nearly direct copying might give rise to a private lawsuit for copyright infringement, but in most cases the use of third-party content will be less direct and singular. Where LLM-generated content is synthesized from multiple sources, copyright claims will be difficult or impossible. Compounding the challenge of private enforcement, LLMs will likely provide differing answers to the same question over time. Copyright enforcement will be unrealistic for most small businesses, nonprofits and individuals. And potentially injured parties may well be unaware of how LLMs are using their content. In short, private copyright enforcement is unlikely to address this problem.

Copyright protection will fall short for another, even more important, reason: The problem we are highlighting extends beyond copyright infringement. It involves the unfair leveraging of dominant position by search and Big Tech companies; unfair methods of competition; threats to the open internet; and unjust enclosure and privatization of the information commons.

A broader lens is needed: Looking forward, we urge you to investigate the issue of LLMs and search, including to assess these issues:

  • What impact will LLM search replies have on the internet and the information commons, including on the future generation of content by individuals and for-profit and nonprofit entities?
  • What are the click-through impacts of providing both specific answers and rich, narrative replies in search?
  • What are the mechanisms by which LLM search replies draw from sources on the internet?
  • What property and contractual claims are impacted by LLM search replies?
  • Should LLM search systems owe compensation to content providers?
  • Are private remedies available for impacted content providers unfairly and adversely affected by LLM search replies?
  • Do search companies integrating LLM replies have a fair use right to draw on internet content; and if so, to what extent and in what instances does that fair use right apply?
  • Is the provision of links or footnotes at the end of a rich, narrative reply sufficient to guard against unfair appropriation and the wrongful leveraging of dominant position?
  • Does existing law provide sufficient remedy to the problems posed by LLM incorporation into search?
  • How does the provision of search responses in audio format affect these questions? For example, what role do links or footnotes play if an LLM-enabled search engine provides an audio response to a query?

Concurrently, we encourage the Federal Trade Commission and/or the U.S. Department of Justice to issue guidelines on the fair use of LLMs in search. We urge consideration of two specific policies as part of a broader guidance:

First, for dominant search platforms – which we believe should include Microsoft’s Bing/Copilot because of Copilot’s integration into Microsoft 365 – provision of rich, narrative response that reflects content drawn from the internet should be considered a prohibited, anti-competitive practice. (Again, we emphasize that we are discussing a practice distinct from training on internet content to develop the LLM.)

Second, at minimum, LLM-enabled narrative search responses that effectively borrow from individual, specific sources must be prohibited. In establishing this principle, the neural network process by which an LLM generates a reply should not matter; what should matter is the effect. If a search response looks like it is relying on a single source – by producing largely identical content – it should not matter how complicated is the process by which the response is generated.

We acknowledge that the questions we are raising are difficult, but they require rapid answers in light of changing technology.

Thank you for your leadership and for considering these matters.

Sincerely,

 

Robert Weissman

President, Public Citizen

Endnotes

[1] Brian Warner, testimony before the House Judiciary Subcommittee on Antitrust, Commercial and Administrative Law, July 16, 2019, https://www.congress.gov/116/meeting/house/109793/documents/HHRG-116-JU05-20190716-SD015.pdf

[2] Investigation of Competition in Digital Markets, Majority Staff Report and Recommendations, House Judiciary Subcommittee on Antitrust, Commercial and Administrative Law 2020, page 187, https://democrats-judiciary.house.gov/uploadedfiles/competition_in_digital_markets.pdf

[3] Marcus Tober, “Zero-Clicks Study,” Semrush, October 25, 2022, https://www.semrush.com/blog/zero-clicks-study

[4] See additional analysis: Danny Goodwin, “Google Search Study: 25.6% of Desktop, 17.3% of Mobile are Zero-Click,” Search Engine Land, October 25, 2022, https://searchengineland.com/zero-click-study-semrush-389067

[5] Sundar Pinchai, “An Important Next Step on our AI Journey,” The Keyword, Google, February 6, 2023, https://blog.google/technology/ai/bard-google-ai-search-updates

[6] Sundar Pinchai, “An Important Next Step on our AI Journey,” The Keyword, Google, February 6, 2023, https://blog.google/technology/ai/bard-google-ai-search-updates

[7] Sundar Pinchai, “A Note,” The Keyword, Google, December 6, 2023, https://blog.google/technology/ai/google-gemini-ai/#sundar-note

[8] Demis Hassabis, “Introducing Gemini,” The Keyword, Google, December 6, 2023, https://blog.google/technology/ai/google-gemini-ai/#availability

[9] Yusuf Medhi, “Reinventing Search with New AI-Powered Microsoft Bing and Edge, Your Copilot for the Web,” Official Microsoft Blog, Microsoft, February 7, 2023, https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web

[10] Yusuf Medhi, “Reinventing Search with New AI-Powered Microsoft Bing and Edge, Your Copilot for the Web,” Official Microsoft Blog, Microsoft, February 7, 2023, https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web

[11] Yusuf Medhi, “Announcing Microsoft Copilot, Your Everyday AI Companion,” Official Microsoft Blog, Microsoft, September 21, 2023, https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion

[12] “Frequently Asked Questions,” Microsoft, accessed December 26, 2023, https://www.microsoft.com/en-us/bing?ep=258&es=31&form=MA13FV#faq.

[13] “21 Common Animals in Serengeti National Park,” Earthlife Expeditions, accessed December 26, 2023, https://www.earthlifeexpeditions.com/common-animals-in-serengeti-national-park

[14] “Organize the Last 10 Years of Volcanic Activity into a Table,” Starctmag.com, November 30, 2023, https://starctmag.com/top-news/organize-the-last-ten-years-of-worldwide-volcanic-activity-into-a-table-24952-2023