United States

A Massive Document Leak Reveals Discrepancies In Google’s Search Practices And Hidden Algorithm Secrets

Google has confirmed the authenticity of a significant leak of over 2000 internal documents related to its search engine. These documents reveal discrepancies between Google's public statements and its actual practices.

Pinterest
Representative image Photo: Pinterest
info_icon

Google has confirmed the authenticity of a significant leak involving some 2,500 internal documents related to its search engine. The leaked documents highlight the inner workings of Google’s algorithms, with one expert asserting that they reveal discrepancies between Google’s public statements and its actual practices.

The major Search Engine has kept the specifics of its search engine algorithms a secret, despite its substantial influence over the flow of information, traffic, and ad revenue.

Some of the details in the document contradict past public statements by Google employees regarding the factors influencing search rankings. For instance, a Google Search employee in 2016 said that the company does not have a “website authority score.” Google, in past, has also denied using data from its Chrome web browser to determine search rankings.

But contradictory to these statements, the leaked documents reveal something else. The documents indicate that Google considers click rates, Chrome browser data, website size, and a factor known as “domain authority” — a measure of a website’s relevance or importance on a particular subject — to rank content.

CEO of iPullRank Michael King, who published the first analysis of the documents, said, “The main takeaway here is Google tells us one thing and they do another. These documents give us clarity on that. We don’t have the recipe that Google is using for search, but we now have a really clear indication of what the ingredients are.”

According to some experts, the documents also mention modules suggesting Google uses “whitelists” for certain topics, including elections (IsElectionAuthority) and the COVID-19 pandemic (IsCovidLocalAuthority), to identify “quality sources” on these subjects.

There is not much detail available on how the whitelists operate but allegations against Google have been made for exhibiting a left-wing bias for years.

A recent analysis by AllSides, a media company, concuded that 63% of articles on Google News were from left-leaning outlets, compared to just 6% from right-leaning sources.

The Media Research Center, a right-leaning watchdog, documented 41 instances of alleged “election interference” by Google since 2008. The report cited data from Dr. Robert Epstein, who has testified before the Senate Judiciary Committee that “biased search resulted generated by Google’s search algorithm” shifted “at least 2.6 million votes to Hillary Clinton.”

Google has consistently denied any bias against conservative viewpoints and dismissed Epstein’s research as “widely debunked.”

The leaked documents reportedly contain over 14,000 ranking factors considered by Google while organizing website including news outlets as well as business owners and beyond. This data surfaced on the online code repository GitHub in March but public scrutiny emerged only after SEO experts Rand Fishkin and Michael King obtained and posted separate analyses.

“Journalists and publishers of information about SEO and Google Search need to stop uncritically repeating Google’s public statements, and take a much harsher, more adversarial view of the search giant’s representatives,” Fishkin suggests. “When publications repeat Google’s claims as though they are fact, they’re helping Google spin a story that’s only useful to the company and not to practitioners, users, or the public.”

One of the examples highlighted by Fishkin and King was that one section mentions "chrome_trans_clicks" as a factor in determining which links from a domain appear below the main webpage in search results. Fishkin interprets this to mean that Google considers the number of clicks on pages in Chrome browsers to identify the most popular or important URLs on a site, which then influences the sitelinks feature.

Google confirmed implicitly, the authenticity of the documents but warned that they lacked important context and should not be used to draw conclusions about how search works.

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation,” Google spokesperson Davis Thompson said.

Google also claimed that the documents do not provide a comprehensive, relevant, or up-to-date view of its search ranking algorithm. It is still not clear whether Google has implemented any of the detailed ranking factors or was merely testing them. It is not possible to figure out how important these factors are in framing user’s search result even if these factors were in use. The documents did not disclose the weighting of these factors.

Barry Schwartz, a prominent SEO expert and owner of web consultancy RustyBrick, says that the documents offer an interesting yet incomplete view of Google’s search processes. “The question is, we don’t know what they’re weighted, how important are these signals, are they used at all. That’s the issue with this,” Schwartz said.

Despite the uncertainties, King described the leak as “the biggest, most transparent insight that we’ve ever seen into how Google functions.”

Advertisement

Advertisement

Advertisement

Advertisement

Advertisement