A trove of leaked Google documents has provided an unprecedented look inside Google search, revealing some of the most important factors Google uses to rank content.
what happened. Thousands of documents that appear to have come from Google's internal content API warehouse were published on Github on March 13 by an automated bot called yoshi-code-bot. The documents were shared with SparkToro co-founder Rand Fishkin earlier this month.
Why we care. It gave us a glimpse into how Google’s ranking algorithm works, which is invaluable information for SEOs who can make sense of it. In 2023, a leak gave unprecedented insight into Yandex Search ranking factors, which was one of the biggest news stories of the year.
This Google doc leak? This is probably going to be one of the biggest news stories in the history of SEO and Google search.
What's inside? Thanks to Fishkin and iPullRank CEO Michael King, here's what we know about the internal documents.
- the current: The documents state this information was accurate as of March.
- Ranking feature: The API documentation represents 2,596 modules with 14,014 attributes.
- Weighting: The document does not specify how the ranking functions are weighted, only that they exist.
- Zwirler: These are reranking features that “can adjust a document's information retrieval score or change the document's ranking,” King said.
- Demotion: Content can be demoted for a variety of reasons, including:
- The link does not match the target site.
- SERP signals indicate user dissatisfaction.
- Product reviews.
- position.
- Exact match domain.
- Porn
- change history: Google keeps a copy of every version of every page it's ever indexed, which means that Google can “remember” every change you've made to a page, but when analyzing links, Google only uses the most recent 20 changes to a URL.
Deep Dive: A 20-slide guide to how Google is hurting search advertisers
Links are important. Shocking! We can see from the documents that link diversity and relevance still matter. Also, PageRank still plays a key role in Google's ranking function. The PageRank of a website's home page is taken into account for all documents.
- This doesn't prove that Google spokespeople are lying by saying that links aren't a “top 3 ranking factor” or that they're not that important for rankings. Both things could be true at the same time. Again, we don't know how much weight any of these features receive.
Successful clicks matter. This isn’t surprising, but if you want to rank well, you need to continue to create great content and user experience based on your documentation. Google uses a variety of metrics, including badClicks, goodClicks, lastLongestClicks, and unsquashedClicks.
“[Y]I need to drive more success “If you want to maintain your high rankings, you need to increase link diversity by using broader queries to get more clicks,” King says. “Conceptually, it makes sense because really strong content will do that. Focusing on driving more qualified traffic and improving the user experience will send a signal to Google that your page is worthy of ranking.”
Documents and testimony in the United States v. Google antitrust trial confirmed that Google uses clicks for rankings, specifically its Navboost system, which is “one of the key signals” Google uses for rankings.
Brands matter. Fishkin's biggest takeaway is that brand matters more than anything: “If there's one universal piece of advice I would give to marketers looking to dramatically improve their organic search rankings and traffic, it's: 'Build a brand that's notable, popular, and well-known in your field outside of Google search.'”
Entities are important. Google stores authorship information associated with content and attempts to determine whether an entity is the author of a document.
Site Permissions: Google uses something called “siteAuthority”.
- Google told us this happened after the Panda update was launched in 2011, publicly stating that “low-quality content in one part of your site can affect the ranking of your entire site.”
- However, since then, Google has continued to deny websites authority scores.
Chrome data. The module, called ChromeInTotal, shows that Google is using Chrome browser data for search rankings.
Go deeper: Is Google a monopoly? The Department of Justice's case explained in 11 slides
Whitelist. Several modules indicate that Google has whitelisted certain domains related to elections and COVID (isElectionAuthority and isCovidLocalAuthority), but it has long been known that Google (and Bing) have “exception lists” for cases where “certain algorithms unintentionally affect websites.”
article.
I'll explain it briefly. There is some debate as to whether these documents were “leaked” or “discovered” – it is more likely that they were accidentally included in a code review and exposed in Google's internal codebase, where they were discovered.
origin. Erfan Azimi, CEO and SEO director of digital marketing firm EA Eagle Digital, posted the video and claims responsibility for sharing the documents with Fishkin. Azimi is not a Google employee.