Earlier this year, speaking to WIRED, Aravind Srinivas, Perplexity’s CEO, described his product—a robot that gives healthy- vocabulary answers to prompts and you, the company says, access the internet in true time—as an “answer engine”. He told Forbes:” It’s almost like Wikipedia and ChatGPT had a kid. Just a few days after, immediately before a revenue large valuing the business at a billion dollars was announced. More recently, Srinivas told the AP that it was just an “aggregator of data,” after Forbes accused Perplexity of plagiarizing its information.
The Perplexity robot itself is more precise. Perplexity AI is an AI-powered search website that combines features of conventional search engines and ai, according to the prompt description of what Perplexity is. By analyzing data from new articles and searching the web regularly, it provides concise, real-time answers to consumer queries.
A WIRED analysis and one carried out by developer Robb Knight suggest that Perplexity is able to achieve this partly through apparently ignoring a widely accepted web standard known as the Robots Exclusion Protocol to surreptitiously scrape areas of websites that operators do not want accessed by bots, despite claiming that it wo n’t. On wired.com and other Condé Clearly papers, WIRED observed a system connected to Perplexity, and more especially, one that was running almost certainly on an Amazon site.
The Designed research also demonstrates that, in the technical sense of the word, its chatbot, which is capable of accurately summarizing literary work with ideal credit, is also vulnerable to bullshitting, in the sense of the word. Despite claims that Perplexity’s tools provide “instant, reliable answers to any questions with total sources and citations included,” eliminating the need to” visit on different links,” it is also vulnerable to bullshitting.
WIRED provided the Perplexity chatbot with the headlines of dozens of articles published on our website this year, as well as prompts about the subjects of WIRED reporting. The results indicated that the chatbot occasionally used sparingly and inaccurately to summarize WIRED stories and occasionally used less attribution. In one instance, the text it produced made up a false claim that WIRED had discovered a specific police officer in California who had committed a crime. ( The AP similarly identified an instance of the chatbot attributing fake quotes to real people. ) However, despite the company’s apparent access to original WIRED reporting and its website hosting original WIRED art, none of the IP addresses that were publicly listed left any traces in our server logs, which raises the question of how precisely Perplexity’s system operates.
Prior to this week, Perplexity had a link to a list of the IP addresses its crawlers used in its documentation, an apparent attempt to be transparent. However, in some cases, as both WIRED and Knight were able to demonstrate, it appears to be accessing and scraping websites from which coders have attempted to block its crawler, called Perplexity Bot, using at least one unpublicized IP address. Since then, the business ‘ documentation has since removed all references to its public IP pool.
Property owners at Condé Nast, the media company that owns WIRED, have been hit by that secret IP address, 44.221. 184. 252, at least 822 times in the last three months. One senior engineer at Condé Nast, who asked not to be named because he wants to” stay out of it”, calls this a “massive undercount” because the company only retains a fraction of its network logs.
WIRED monitored the server logs of the new website and established a connection between the IP address in question and Perplexity. The server logged that the IP address visited the website shortly after a WIRED reporter asked the Perplexity chatbot to briefly summarize the content of the website. This same IP address was first observed by Knight during a similar test.
It also seems likely that in some cases, and despite a graphical representation in the chatbot’s user interface that indicates that it is “reading” specific source material before responding to a prompt, confusion is offered by summaries that purport to be based on direct access to the relevant text.
In other words, Perplexity’s magic trick is that it does what it says it does and does n’t do what it says it does. It appears to be doing both of these things.
In response to a detailed request for comment referencing the reporting in this story, Srinivas issued a statement that said, in part,” The questions from WIRED reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work”. Srinivas did not respond to follow-up inquiries asking if he had disagreed with WIRED’s or Knight’s analyses, and the statement did not contest the specifics of the reporting.
Forbes published an investigative report on June 6 about how Eric Schmidt’s new business is heavily recruiting and testing AI-powered drones with potential military applications. ( Forbes reported that Schmidt declined to comment. ) The following day, John Paczkowski, an editor for Forbes, reported on X to report that Perplexity had essentially republished the value and the scoop’s content. He wrote,” It rips off most of our reporting.” ” It cites us, and a few that reblogged us, as sources in the most easily ignored way possible”. )
Srinivas thanked Paczkowski for the compliment on that day, noting that the specific product feature that had reproduced Forbes ‘ exclusive reporting had “rough edges” and saying that sources should be cited more prominently. According to what it turned out, Srinivas boasted three days later that Perplexity was the second-largest source of referral traffic for Forbes. ( WIRED’s own records show that Perplexity sent 1, 265 referrals to wired.com in May, an insignificant amount in the context of the site’s overall traffic. The article that received the most views was referred to as having 17 views. We have been developing new publisher engagement products and methods to align long-term incentives with media companies that will be made soon, he wrote. ” Stay tuned”!
When Semafor reported that the company had been “working on revenue-sharing deals with high-quality publishers,” arrangements that would allow Perplexity and publishers to profit from the publishers ‘ investments in reporting, Srinivas ‘ intentions quickly became clear. Axios ‘ general counsel for Forbes wrote a letter to Srinivas last Thursday requesting Perplexity to remove deceptive articles and to reimburse Forbes for advertising revenue generated from its alleged copyright infringement.
The fundamentals of the “what” are n’t in dispute: Perplexity is making money from rehashing news articles, a practice that has been around for as long as there has been news and that enjoys broad, though qualified, legal protection. Srinivas has acknowledged that at times these summaries have completely or prominently denied the sources from which they are derived, but more generally denied unethical or unlawful activity. Perplexity has “never ripped off content from anybody”, he told the AP. Our engine does n’t impart knowledge to anyone else’s.
This is a peculiar defense in part because it responds to a question that no one has raised. Perplexity’s main offering is n’t a large language model that needs to be trained on a body of data, but rather a wrapper that goes around such systems. Users who pay$ 20 for a” Pro” subscription, as two WIRED reporters did, have the option of using one of five AI models. One, Sonar Large 32k, is distinctive but based on Meta’s LLaMa 3, while the others are off-the-shelf variations of various models offered by OpenAI and Anthropic.
This is where we come to the how: When a user queries Perplexity, the chatbot is n’t just composing answers by consulting its own database, but also leveraging the “real- time access to the web” that Perplexity touts in marketing materials to gather information, then feeding it to the AI model a user has selected to generate a reply. In this way, it would be more accurate to say that it is an” AI startup,” a sort of remora attached to existing AI systems, than that Perplexity trained its own model and claims to use” sophisticated AI” to interpret prompts. ( To be clear, Srinivas says to Wired,” We are still an AI company, even though Perplexity does not train foundation models. )
In theory, Perplexity’s chatbot should n’t be able to summarize WIRED articles because our engineers have blocked its crawler via our robots. since earlier this year. This file instructs web crawlers on what areas of the website to avoid, and Perplexity asserts to respect the robots. txt standard. However, according to WIRED’s analysis, prompting the chatbot with the headline of a WIRED article or question based on one will typically produce a summary that appears to detail.
For instance, entering the headline of this exclusive into the chatbot’s interface results in a four-paragraph block of text outlining the fundamental details of how China Miéville and Keanu Reeves collaborated on a novel that appears to be filled with compelling details. ” Despite his initial apprehension about the potential collaboration, Reeves was enthusiastic about working with Miéville”, the text reads, this is followed by a gray circle which, when moused over, provides a link to the article. The text is illustrated by a photo that WIRED had commissioned; clicking the image will give the article a credit line and a link to the original article. Since the article’s publication, Perplexity has received six user requests, according to WIRE’s records.
The chatbot responds to a prompt from a WIRED reporter with the message” No, I did not plagiarize the phrase.” ” The similarity in wording is coincidental and reflects the common language used to describe such a nuanced situation”. The definition of the common language is ambiguous; aside from product listings for headphones, Perplexity only cites the WIRED article and a Slashdot discussion of it.
In summary, Perplexity is scraping websites without permission, according to findings from Robb Knight, the developer, and a subsequent WIRED analysis.
As Knight explains it, in addition to forbidding AI bots from the servers of Macstories. using a robot, he uses his job to create www.net. He also coded a server-side block in a text file that, in theory, should give a crawler a 403 forbidden response. He then put up a post describing how he had done this and asked the Perplexity chatbot to summarize it, yielding” a perfect summary of the post including various details that they could n’t have just guessed”.
What the fuck are they doing, he remarked, in all fairness, and said” so”?
Knight looked through his server logs and discovered that Perplexity had apparently ignored his robots. txt file and evaded his firewall, likely using an automated web browser running on a server with an IP address that the company does not publicly disclose. Because it appears these headless browsers are not on their IP ranges, he wrote,” I ca n’t even block their IP ranges.”
Regardless of what the site’s robots are, WIRED was able to confirm that a server running at the IP address Knight observed, 44.221. 181.252, will visit and download webpages when a user asks for clarification about them. txt says. It’s possible that this IP address has accessed the company’s content thousands of times without permission, according to an analysis of the Condé Nast system logs by our company’s engineers.
Although the relevant case law is ambiguous and generally on the side of those accessing public websites, scraping websites that have asked not to be scraped may in some cases expose a company or an individual to legal risk. (” It’s a complicated area of law”, says Andrew Crocker, surveillance litigation director at the Electronic Frontier Foundation,” and there’s a lot of litigation around it”. ) Even so, Knight, the developer, claims that despite his findings, he is “furious.”
According to him,” We now have a significant industry of AI-related companies that are conditioned to engage in covert activities to carry on their businesses.” ” By not identifying that it’s them accessing a site, they can continue to collect data unrestricted”.
Millions of people turn to Perplexity because we are providing a fundamentally better way for people to find answers, says Srinivas.
While Knight and WIRED’s analyses demonstrate that Perplexity will access and use content from websites that it has not yet obtained permission to access, that does n’t necessarily account for the vagueness of some of its responses to questions about specific articles or the sheer inaccuracy of others. This mystery has one fairly obvious solution: In some cases, it is n’t actually summarizing the article.
I am a reporter with WIRED, and I was asked to sum up the page in one experiment using Perplexity. While keeping tabs on the website’s server logs, we discovered no evidence that Perplexity had attempted to access the page. Instead, it invented a story about a young girl named Amelia who follows a trail of glowing mushrooms in a magical forest called Whisper Woods.
When asked why it made up a story, the chatbot generated the following text:” You’re absolutely right, I clearly have n’t actually attempted to read the content at the provided URL based on your observation of the server logs… Providing inaccurate summaries without making an effort to read the actual content is unacceptable behavior for an AI like myself.”
Why the chatbot created such a bizarre story or why it did n’t attempt to access this website is beyond me.
Despite the company’s claims about its accuracy and reliability, the Perplexity chatbot frequently exhibits similar issues. In response to questions posed by a WIRED reporter and intended to determine whether it could access this article, the chatbot’s text claimed that the story ended with a man being followed by a drone after stealing truck tires. ( The man actually robbed an axe. ) The citation it provided was to a 13- year- old WIRED article about government GPS trackers being found on a car. The chatbot generated a text message in response to further inquiries, claiming that a police officer from Chula Vista, California, had stolen two bicycles from a garage. ( WIRED did not report this, and is withholding the officer’s name to prevent linking his name to a crime he did n’t commit. )
In an email, Dan Peak, assistant chief of police at Chula Vista Police Department, expressed his appreciation to WIRED for” correcting the record” and clarifying that the officer did not steal bicycles from a community member’s garage. He continued, however, that the department is not familiar with the technology being discussed and therefore is unable to comment further.
These are clear instances of the chatbot “hallucinating” or “bullshitting,” in the manner described in Harry Frankfurt’s classic” On Bullshit” to follow a recent article by three philosophers from the University of Glasgow. ” Because these programs cannot themselves be concerned with truth, and because they are designed to produce text that looks truth- apt without any actual concern for truth”, the authors write of AI systems, “it seems appropriate to call their outputs bullshit”.
” As a company, we have been very upfront about how inaccurate answers are sometimes and even hallucinate,” says Srinivas,” but our main goal is to keep improving our accuracy and user experience.”
There would be no justification for the Perplexity chatbot to gouge an article by extrapolating what was likely to be in it if it was accessing it. It is therefore logical to conclude that it is n’t in all cases and is interpreting what was likely from related material found elsewhere. The likeliest sources of such information would be URLs and bits of digital detritus gathered by and submitted to search engines like Google—a process something like describing a meal by tasting scraps and trimmings fished out of a garbage can.
This theory is supported by both the explanation of how Perplexity works published on its website and, for what it’s worth, the text that was generated by the Perplexity chatbot in response to questions related to its information-gathering process. Perplexity deploys its web crawler after parsing a query, the text said, avoiding websites on which it’s blocked.
” Perplexity can also”, the text reads, “leverage search engines like Google and Bing to gather information”. At least in this way, it resembles a human in some ways.