In the news: On March 27, 2024, over 2,500 pages of Google's API (Application Programming Interface) documentation, including 14,014 API attributes, were reported to have been leaked on Github and remained on the site until May 7. These API documents reveal what Google considers important when ranking websites in its search engine. The leak was discovered by Erfan Azimi, founder of search engine optimization (SEO) company EA Eagle Digital, and reported by SEO expert Rand Fishkin.
Based on their analysis of the leaked data, the two SEO experts said Google uses click data (including good, bad and long clicks) in systems such as NavBoost and Glue. According to testimony by Pandu Nayak, Google's vice president of search, in Google's US lawsuit, these systems help rank content that ultimately appears on the search engine results page. The data suggests that Google has ways of filtering out clicks it doesn't want to count in its ranking system and including those it does. The company also appears to measure click length and impression length.
The use of click data here is interesting, given that Google has previously denied that it factors click data into search rankings. However, it's important to note that Google has not acknowledged any data leaks. We've reached out to Google and will update this article if we hear back from the company.
Other key findings about the data breach:
- Google creates site links based on the most clicked URLs in Google Chrome. Sitelinks are sublinks that appear under the main site listing. For example, if you search for MediaNama, the following sitelinks appear:
Fishkin's analysis of the leaked data revealed that the site links created by Google also take into account clicks on pages in the Chrome browser.
- Google whitelists certain institutions and sites. If you search the leaked data for travel, you'll find a model dedicated to “quality travel sites.” Fishkin claims this model suggests the company has a whitelist in place for travel sites. Similarly, one could argue the company also has other whitelists based on the leaked code that flag local governments involved with COVID-19 and elections.
You can read Fishkin's full analysis of the breach and efforts to verify that the data belongs to Google here.
Read also: