Google's search algorithm is perhaps the most important system on the Internet, determining which sites survive and what content appears on the web. But how Google ranks websites has long been a mystery, unravelled by journalists, researchers and those involved in search engine optimization.
Now, a shocking leak of thousands of pages of internal documents appears to offer the first behind-the-scenes glimpse into how search works — and suggests that Google hasn't been entirely honest about it for years. So far, Google has not responded to multiple requests for comment on the documents' legitimacy.
Rand Fishkin, who has worked in the SEO industry for more than a decade, said a source shared the 2,500-page document with him in the hopes that reports of the leak would refute “lies” that Google employees have told about how its search algorithm works. Fishkin said the document outlines Google's search API and provides a breakdown of the information available to employees.
The details Fishkin shared are dense and technical, likely easier to read for developers and SEO professionals than the general public. And the leaks don't necessarily prove that Google uses any particular data or signals that matter to search rankings. Rather, they outline what data Google collects from web pages, sites, and searchers, giving SEO professionals an indirect hint at what Google values, SEO expert Mike King wrote in his summary of the documents.
The leaked documents cover topics such as what data Google collects and uses, which sites Google prioritizes on sensitive topics like elections, and how Google treats small websites. According to Fishkin and King, some information in the documents appears to contradict public statements made by Google representatives.
“'Lie' is a harsh term, but it's the only accurate word that can be used here,” King wrote. “While I don't necessarily blame Google representatives for trying to protect their company's proprietary information, I do take issue with their efforts to actively discredit those in marketing, technology, and journalism who have published reproducible findings.”
Google is not responding The VergeRequest for Comments Questions about the document include direct requests to deny its legitimacy. The Verge The company said in an email that it was not disputing the veracity of the leaks, but that employees had asked it to change some of the wording in the posts depicting certain events.
Google's secretive search algorithm has spawned an entire industry of marketers who adhere to Google's published guidelines and execute them for millions of businesses around the world. Widespread and often annoying tactics have led to the common view that Google search results are suffering, full of junk that website owners feel they need to create to get people to visit their sites. The VergeIn 's past coverage of SEO-driven tactics, Google representatives have often fallen back on the familiar defense that Google's guidelines don't say so.
However, some details in the leaked documents call into question the accuracy of Google's public statements about how search works.
One example Fishkin and King gave was whether Google Chrome data was used at all in rankings. Google representatives repeatedly stated that they don't use Chrome data to rank pages, but Chrome is specifically mentioned in the section on how websites appear in search. In the screenshot I captured below as an example, the links that appear below the main URL for vogue.com may have been created in part using Chrome data, the document states.
Another question is what role EEAT plays in rankings. EEAT stands for Experience, Expertise, Authority and Trustworthiness and is a metric Google uses to evaluate the quality of search results. Google representatives have previously said EEAT is not a ranking factor. Fishkin noted that he doesn't see much mention of EEAT by name in the documents.
However, King detailed how Google does collect author data from pages, and that there is a field for whether an entity on a page is an author. The portion of the document King shared reads that the field is “primarily developed and tailored for news articles, but is also populated for other content (e.g., scientific articles).” While this doesn't confirm that the byline is an explicit ranking metric, it does show that Google is at least tracking this attribute. Google representatives have previously argued that author bylines don't impact rankings, and are therefore something website owners should do for their readers, not for Google.
While the documents aren't conclusive evidence, they offer a detailed and revealing look at a closely guarded black box of systems. The US government's antitrust lawsuit against Google over search has also led to the release of internal documents that offer further insight into how the company's flagship product works.
Google's general caution about how search works has led to websites looking similar as SEO marketers try to outsmart Google based on hints provided by Google. Fishkin also accuses publications of believing Google's official claims and promoting them as truth without much further analysis.
“Historically, some of the search industry's loudest and most prolific publishers have been content to uncritically repeat Google's public statements. Rather than headlines like, 'Google asserts XYZ, but the evidence suggests otherwise,' they write headlines like, 'Google says XYZ is true,'” Fishkin wrote. “Please, try harder. If this leak and the Justice Department trial can produce just one change, I hope it's this.”