A fortnight ago, news broke that documentation of a Google Search API had been accidentally left on a public server. The leak has since been verified as real, although we don’t know to what extent this matches how search works today.
The internet has been flooded with hot takes on this. The actual text is thousands of words of dense technical documentation. I suspect remarkably few people have actually read the text – much less understand it – and I thus don’t trust most of the hot takes.
We took leaked text and put GPT-4o to work on it summarising and analysing. We split the documentation into chunks, ran it through GPT-4o section-by-section, returned an analysis of the text, and then categorised the responses.
This let us filter for specific areas and start drawing novel conclusions.
One of the most interesting findings is how the leak aligns with information from Google’s recent court cases. Those documents gave the impression that Googlers operate under the assumption that anything they put in writing will eventually come out through legal discovery. The leaked API documentation corroborates and builds upon those earlier disclosures.
By digging into the technical details and reading between the lines, we can glean a clearer understanding of how Google really operates today and where search is headed. So let’s dive in and unpack the most important revelations.
Google’s patents and court cases already tell us a lot
I have a daily search set up for any new PDFs matching either Google’s new patents or court cases. These already tell us a lot: last year Google argued in court it does not hold a monopoly over search and from this released a huge amount of new information.
One got the impression from reading these documents that internally the view was, “If you write this down, it’ll end up being disclosed through discovery at some point,” and indeed this is what happened for the documents in question.
It was from one of these documents that we saw how Google learns from what users do when they make a search. User activity is tracked and if a user is particularly satisfied or dissatisfied with a search result, the result may be moved up or down. This remarkably amateurish slide was how a Googler explained it internally:
From this, Google is able to use machine learning to predict what the user will want to see in future. Suddenly, you’re using ML to rank results and tracked clicks aren’t technically a ranking factor because the ML model is the ranking factor. That is trained on user clicks, but “we don’t use clicks as a ranking factor” is technically true.
One of the big claims from these documents is the alleged use of clicks (through Chrome data) as a ranking factor when this has been previously denied. Given the above, this is not surprising to us. This informs my overall reaction: there are a lot of interesting nuggets in the leak, but I’m seeing this as a further source to explore rather than a groundbreaking and radical change.
That said, new information is good and this cements our current thinking.
Clicks are king
Content is not king. Clicks are king, and the content serves the click. If your content can’t justify or “hold” a click then its rank may not hold.
Given this, we can note a couple of critical items:
1. You must get the click. Your title and meta must attract the click from the SERP.
I feel there’s a lot we can learn from YouTube here, where the thumbnail and click play such a big role.
Creators spending millions of dollars on producing each video claim to spend half the production time just on the title, thumbnail, and first 5 seconds of the video.
We already use FALCON AI to solve for this problem for written content: FALCON generates different options at scale and loops through them to predict which specific combination is going to perform best. We have an update in progress to double check there’s nothing else that needs to be done on clickworthiness.
Getting clicks means you can get more clicks. You must get the click.
2. You must deliver quickly. Once you’ve got the click, you need to deliver on the value you promised. This is where a fast website and fantastic first impression play a role.
Website speed has long been important, but this sees it from a different lens: a fast website is valuable not because it’s a ranking factor, but because it allows you to deliver value to the human reader faster.
What the reader then sees on your website is the next hurdle. They’ll be rapidly assessing: What have I clicked on? Is it trustworthy? Am I going to get what I need?
I see design as an important piece here. The internet is filled with the same 20 Unsplash photos and ubiquitous cartoons of people standing around pointing at computer screens. To me, these are a signal of low-effort and commodified content.
This is why Ellipsis has a fantastic full-time Graphic Designer on the team: the feature graphic can signal the effort, quality, and care that has gone into the piece the reader is about to see. That keeps the reader on the page and sends a positive signal to Google.
Multi-modal AI offers an opportunity here: GPT-4o can read a screenshot of a website and tell you what first impression you’re offering. We’ve implemented this for every piece of content we publish: we automatically screenshot, send to GPT-4o, look for opportunities, and if the opportunity passes a sufficient threshold it gets sent to the project lead.
Double check the first impression you’re making. This is the outcome of clicks being important.
E-E-A-T is mostly for show – but it’s still an important show
Ellipsis is a fully remote team. When you’re remote, you have to understand: there are 2 parts to working effectively. The first is doing the work and the second is performing the work.
For me, writing this newsletter is a good example of doing the work. Sharing with the team how hard I’ve worked on this and then responding really quickly to suggested edits is a good example of performing the work. You do still need the performance, but it must be balanced with doing the work.
E-E-A-T (Experience Expertise Authority Trust) has been a huge trend over the last 12 months. The leaked documents suggest this is slightly overhyped and Google can’t quite assess E-E-A-T to the level that one assumes. There is no specific mention of E-E-A-T in the documents, although terms like “authority” do come up a lot.
We do see evidence that Google extracts metadata like author names from multiple sources on a page, including the URL, byline, NLP analysis, structured data, and sitemaps. Interestingly, Google seems to care about mismatches in author names across these sources. While public documentation allows an author to have many names, inconsistency could potentially create bias, especially for edge cases like plural systems with multiple authors.
However, the documents give us no insight into how much weight Google places on author name consistency or how sophisticated its author analysis is. As with many aspects of E-E-A-T, there’s likely still a gap between what Google aspires to do and what its algorithms can reliably achieve at scale. E-E-A-T remains important for signaling quality to users, but we should be cautious about overestimating Google’s capabilities in this area based on the leaked information.
This does not mean E-E-A-T is not important.
First, the leaked documents may be out of date. Second, the performance of E-E-A-T is what actually matters, for precisely the reasons I’ve discussed above: clicks and the first impression are really important. Well-done E-E-A-T is going to create a great first impression. This gives you a higher ranking.
Hence, performance and delivery: E-E-A-T is mostly for show, but shows get clicks and thus you still need to take it seriously.
Search is becoming more technical
The documents are technical. You need to be technical to read and understand them.
What I’d call “legacy SEO” heralds from a time that’s rapidly disappearing. The balance between subjectivity and objectivity is moving towards the objective.
I am bullish that subjectivity is becoming more important on specifically what and how you choose to write – this is part of our overall AI thesis, that you have to differentiate with human expertise – but on title selection, how you format the page, how you select your internal links, etc., it’s increasingly clear that objective answers are the way forwards.
Consider one particularly interesting section, discussing how Google uses text embeddings to assess how similar one page is to all other pages on a website. Another for local SEO discussed how the location of a business from an intersection was calculated.
The future of SEO is hearing this and figuring out how you optimise topic selection by using text embeddings. If that’s confusing to you or your SEO agency, I’d advise you find a new one (we’d be happy to have you).
Legacy SEO still has a role but it’s on its way out
Related to the above, it was noteworthy to me that whilst the future of search is AI and ML powered, there are still some legacy metrics kicking about. Keyword density in the title and content likely has some role, for example.
Google search is very big and very complicated, so this makes sense. I’d argue you should see legacy metrics like keyword density as technical debt in the algorithm that are either going or becoming much less important over time.
Keyword density, for example, is absurd in the genAI era: do we really think that including “best coffee grinder” 100 times is going to fool anything in 2024? But it, like legacy SEO metrics, has some import and it’s a game we sadly have to engage in – just enough, and not too much.
However, it’s important not to get too caught up in the minutiae revealed here. It’s easy to miss the forest for the trees. Obsessing over every single one of the 14,000 ranking factors is a fool’s errand. Many of these factors may be outdated, carry little weight, or only apply conditionally.
Instead, focus on the big picture: creating high-quality, user-focused content that naturally aligns with Google’s overarching goals. Don’t lose sleep over minor technical details at the expense of your overall content strategy. The SEO game is evolving, but the core principles – relevance, quality, authority, and user experience – remain the north star. Navigate by those, and you’ll be well-positioned no matter what new ranking factors come to light.