A trove of documents explaining how Google ranks search results were discovered on GitHub by several SEOs. They claimed that the documents prove that Google has always been inconsistent about its ranking process. However, Google has remained silent on the issue, but has confirmed that the leaked documents are authentic.
The document, which may have been accidentally committed to GitHub by Google's automated tools around mid-March, contains data that Google may have tracked and used in the company's secretive ranking algorithms, offering users a glimpse into one of the most important systems that has shaped the internet.
Report from The Verge Google spokesman Davis Thompson said the company advises people to be careful not to make inaccurate inferences about searches based on “out-of-context, out-of-date or incomplete information.” “We share extensive information about how search works and the types of factors our system weighs, while also working to protect the integrity of our search results from manipulation,” the spokesman said.
What was leaked and how?
The leaked materials were first discovered and outlined by a search engine optimization expert. Rand Fishkin (SparkToro) One of them is Mike King (iPullRank), who published an analysis of the document and its contents a few days ago, and it has also been reported that the material was first discovered by another SEO expert, Erfan Azimi of EA Digital Eagle.
Researchers note that the error occurred in Google's automated tool process on March 13, when the automation committed the Apache 2.0 open source license, which is a standard process for Google's public documentation, and that a subsequent commit on May 7 also reportedly attempted to undo the previous commit.
The leaked documents describe an old version of Google's Content Warehouse API, which offers an inside look at how search engine rankings work. They do not contain any code or technical content – just references to internal systems and projects, and likely internal documentation of the processes involved.
It's worth noting here that while Google has already released a similarly named Google Cloud API documentation into the public domain, the GitHub documentation seems to go much further than that: it includes references to what Google considers important when ranking web pages based on relevance, which is what the SEO community is eagerly awaiting.
There's still a lot we can speculate about.
There are over 2,500 pages of documentation (You can check it here) contains over 14,000 attributes that are accessible or associated with the API. Of course, we can only speculate as to how much Google uses these signals, since there is no information about how much weight Google gives these signals in its ranking algorithm.
Of course, SEO consultants believe the document contains ample detail and differs significantly from documents Google publishes from time to time. Rankin wrote in his post that the leak contradicts longstanding public statements from Google employees, “notably the company's repeated denials that it employs click-centric user signals or that subdomains are considered separately in rankings.”
But there are some that offer clarity.
In his post, King quoted Google search advocate John Mueller, who said in the video that the company doesn't have anything like a website authority score to measure whether Google considers a site authoritative and worthy of being ranked highly in search results.
However, the document published on GitHub does indeed reveal that part of the compression quality signal Google stores for documents is a computable “siteAuthority” score, another is about the importance of click type as a ranking factor for web searches, and another uses websites viewed in Chrome as a quality signal, which appears as “ChromeInTotal” in the leaked API.
Of course, there are references in the documentation that confirm what we’ve known for years: that Google considers factors such as the freshness of content, authorship, whether the page is relevant to the site’s core focus, alignment of page titles and content, and the average weighted font size of terms within the body of the document.
How much can we believe it?
While some may question the authenticity and currency of the documents, given Google's close secrecy regarding its algorithms, they are sure to be a veritable treasure trove for the SEO, marketing and publishing industries. These leaks, coming on the heels of Google's testimony in the US Department of Justice's antitrust lawsuit, are significant.
The choices Google makes about search have a huge impact on how the internet works for every person and business. Needless to say, there has been a proliferation of experts who claim to have found ways to outwit Google's algorithms through SEO efforts. Google has always been vague about its process, which makes these leaked documents all the more valuable.
Additionally, the company's response to the leak, while acknowledging it, further highlights that the industry is probably onto something. We may just have to wait and see how the company responds to this going forward.