Artificial Intelligence company Perplexity has strongly denied allegations of “data theft” after being sued by Reddit in a New York Federal Court.
Perplexity Rejects Reddit’s Claims of Data Theft
The lawsuit, filed on 22 October, names Perplexity—founded by Aravind Srinivas—along with three other entities accused of scraping Reddit content for commercial benefit. Reddit described the alleged activity as an “industrial-scale, unlawful” operation that extracted millions of user comments without authorization.
According to Reddit, this conduct violated the platform’s terms of service and allowed Perplexity to profit from Reddit’s vast repository of user-generated content. The Associated Press reported that the lawsuit seeks to challenge what Reddit considers unauthorized harvesting of data to fuel commercial AI products.
Perplexity operates an AI-powered “answer engine,” a system that allows users to retrieve summarized information from across the internet. The company immediately issued a public defense, calling Reddit’s lawsuit a “sad example” of the tension that arises when public data becomes a major component of a public company’s business model.
Billionaire Showdown! Musk Blasts Reid Hoffman for Allegedly Bankrolling Anti-Tesla Protests
Perplexity contended that it does not train large language models or foundation models like OpenAI’s ChatGPT or Google’s Gemini. Instead, it focuses on building an interface that organizes and presents information to users. In its response, the company claimed that Reddit’s lawsuit is a “show of force” intended to strengthen its bargaining position in licensing negotiations with major AI developers.
Perplexity’s Defense and Counterclaims
In a detailed statement, Perplexity denied the accusation that it ignored Reddit’s licensing requests. The company stated that as an “application-layer” firm, it does not engage in model training and therefore does not require a license to use Reddit content in the way it does. It explained that Reddit insisted on payment despite Perplexity’s lawful access to public Reddit data, a demand the company rejected as “strong-arm tactics.”
The AI startup accused Reddit of attempting to “extort” payments from technology companies through aggressive legal measures. “In any case, we won’t be extorted, and we won’t help Reddit extort Google, even if they’re our (huge) competitor,” the company wrote in a Reddit post. “Perplexity will play fair, but we won’t cave. And we won’t let bigger companies use us in shell games.”
Perplexity further clarified that its use of Reddit content is limited to summarizing existing discussions and referencing Reddit threads for transparency. The company stated that such practices are necessary for users to verify the sources of AI-generated information. It emphasized that all Reddit data used by Perplexity is publicly available and accessed in accordance with legal norms.
Screenshots from Perplexity’s public post on Reddit, shared in two parts, further outlined its stance that Reddit’s demands are inconsistent with fair use principles. The company argued that Reddit’s lawsuit misrepresents the nature of its operations and fails to distinguish between entities that train AI models and those that use AI to retrieve and organize information.
The Broader Legal and Industry Context
The Reddit lawsuit targets not only Perplexity but also several infrastructure companies linked to the AI data ecosystem. The filing names Lithuanian web-scraping firm Oxylabs UAB, Texas-based startup SerpApi, and a proxy service called AWMProxy. Reddit claims that these companies form part of a data supply chain that enables AI tools to access vast amounts of user-generated content from online platforms.
Shock Charge! Musk’s Tesla Zapped by Trump’s Tariff Storm
This marks Reddit’s second lawsuit of this nature. Earlier in June, the company filed a separate case against AI startup Anthropic, alleging similar unauthorized use of Reddit data to train AI models. However, the latest filing extends the scope of Reddit’s legal actions by including smaller, lesser-known firms that provide the technical backbone for data gathering.
According to the Associated Press, Reddit’s legal complaint underscores the platform’s growing emphasis on monetizing access to its user content. The company has previously signed multimillion-dollar licensing deals with Google, OpenAI, and other major firms, allowing them to use Reddit discussions for AI model training. These agreements have become an essential part of Reddit’s revenue model, particularly after its debut on Wall Street.
Perplexity, in contrast, claims that such licensing deals are primarily aimed at model makers, not application-layer platforms. It noted that major AI developers are now scaling back or walking away from expensive licensing contracts with Reddit, citing information from recent earnings reports.
The case brings renewed attention to the ongoing tension between data-rich social media companies and AI firms that rely on publicly available information to operate. While Reddit asserts that its content deserves licensing protections, Perplexity argues that it is using publicly accessible data in a lawful and transparent way.
Trump’s $15 billion lawsuit against The New York Times dismissed by federal judge
The lawsuit is expected to test the boundaries between open internet data and proprietary digital assets—a growing area of legal dispute as AI technologies continue to expand their reliance on user-generated content.