Yesterday I spent a couple minutes looking for an old post to reference. My headings weren’t clear. WordPress site search wasn’t cutting it. A site: search in Google wasn’t surfacing it.
Couldn’t find it so it never got linked.
A few minutes is an eternity to do something that feels menial like that. Once a threshold of 10 seconds is hit, the cognitive load is just too high for something like that to get done nine times out of 10.
And so we don’t link to old posts with key concepts we are building on in new posts. But.
By not doing that, we have one less way to identify relationships between pieces of content.
One less way for users to take another step into the world we’re creating for them.
One less signal for Google to determine what, on our own site, we think is most important.
I spaced those points for dramatic effect.
The amount of friction determines whether linking gets done
I don’t have an easy solution here. But I got the idea of “bidirectional links” from playing with roamresearch.com (a note taking tool for networked thought).
It’s actually not technically “bidirectional,” it just generates a footnote with a link on the page you’re linking – much like a trackback/pingback.
If you’re unfamiliar with pingbacks, a lot of CMSs have a sort of legacy feature where if someone links to your post, it “pings” your site, your CMS listens for those pings, and you have the option to “approve” that linked mention to be added below your article.
Of course, this got totally abused as the web became increasingly spammy, but it was a nice thought and at least in WordPress, that feature still exists.
So that got me thinking.
How can I automate this process of internal linking to key content?
Hacking internal pinbacks
Okay, fine. No “hacking” required.
It’s managed in comments area. So I could just go into comments, filter by pings, search my domain, and bulk approve all internal pingbacks.
Then rework the post template to put the list after the post, make those links follow and then style them to look more reference-y:
This also gives me a footprint I can use later when crawling/scraping to identify within-article link relationships between posts.
I made the snippet a gist (it would require some tweaks for non-Genesis wp sites).
What I like about this approach is that as long as I can take on the cognitive load of linking to posts when I want to reference them, after trackback approval, users will see the link I added, and the mirror linked reference on the old post as well.
Why is this better?
It’s not always necessarily. But when the load is otherwise too high to go back to older posts and link forward to newer ones, this is a good alternative.
Over time, you naturally link internally to the most authoritative page on whatever concept you happen to be referencing. And so do others.
You elevate a post by internally linking to it. Then others elevate it further by sharing or linking to it. Then over time as inbound links acrue to that page, you can use this trackbacks method to send crystal clear signals to Google that these pages are all related.
Automating in-article internal linking
The other approach in my mind is to have a script that will automatically handle adding links across a site to the most authoritative article on a topic when that topic is mentioned. This has been banging around in my head every since I saw this:
In a Feb 2020 Reddit AMA, JR Oakes, a top tier (data science heavy) technical SEO, was asked, “What’s your favorite process that you’ve automated?” and replied:
Internal linking. It finds the best pages to link to. Finds the pages to link from. Even lets you know potentially what to change on the page to include the link. Big efficiency gain, and big results.
JR Oakes Reddit AMA
I’m so glad that Christoph Cemper (who runs LinkResearchTools) mentioned in the comments to not “overthink” this.
In fact, semi-automating internal linking should be totally doable as long as we eeither have 1. database access or 2. the ability to export and import/overwrite post content.
This can be as manual or near-fully automated as we want. To be careful, I want to lean toward manual.
If I can replace all instances of “keyword research” in post content with a link using that same anchor text, then it goes from looking like:
…useful model in thinking about keyword research. But now instead of…
to looking like
…useful model in thinking about keyword research. But now instead of…
It’s not hurting the flow of content for the reader and now there is a reference we can use to highlight and link to our authoritative post on keyword research in the 10 other pages.
I went into phpMyAdmin, clicked wp_posts, went to the search tab, and selected these:
That returned a list of 11 posts, and generated this query:
SELECT * FROM `wp_posts` WHERE `post_content` LIKE '%keyword research%' AND `post_status` LIKE 'publish' AND `post_type` LIKE 'post'
so I went to “Export” tab and exported as CSV to see what we had, lots of references like this:
We can see that any “keyword research” references we want would be ones that:
- are in paragraph tags, not headings or image blocks
- aren’t already associated with a link, because links in links break code
- are the first reference in that post, if multiple references
We don’t need to annoyingly link every single time, just once a page, and the higher in the post the better typically, and we’re simply replacing “keyword research” with something like
<a href="/strategy/keyword-research/">keyword research</a>
There are some built in functions for queries in SQL to do this, but any code would be specific to WordPress, and therefore would require a lot of testing a QA workflow.
I’d much rather have a way to be platform agnostic with it:
- export all posts and enrich with our target links
- have a way to “test” that all my links work, that nothing broke
- be able to quickly analyze/determine our new layout is in fact an improvement
- re-import and overwrite old posts with new enhanced internal linking
- run a crawl to make sure we didn’t break anything
Beyond string replacing to add internal links
As soon as we can get our site data properly loaded into a graph database with fulltext indexing, we can do some powerful things to find the relationships that should exist between our posts.
For example, with NLP we can identify phrase matches (n-grams) and use more simple algorithms like term frequency (TF-IDF) to treat phrases as units to identify and target, but we can take that a step further and identify semantic relations between words.
Existing tools can do this for us, like Graphaware’s Hume product – basically an NLP enabled knowledge graph based insights engine.
But we don’t need to do a bunch of unsupervised learning on our own content. We know how our ideas relate, we just need to be able to quickly tag posts based on weights and characteristics of their topic relatedness.
Still working on that.
Some notes to self re exporting, enriching, and re-importing enriched site content
- With ScreamingFrog, we can extract post content as html and as text and load it where ever we want for changes and analysis. You could find/replace your links in excel or Google Sheets yourself and then re-import the updated CSV.
- You can import export posts in WordPress with a CSV importer plugin.
- If you use SquareSpace, which has no easy export / import feature, you can still use a WordPress .xml export workflow anyway, because WordPress sites are the one thing SquareSpace does care about letting people import.
- We can use Neo4j as our website’s database.
- We can sync our WordPress database with Neo4j.
- We can also take a more modern, even serverless, approach. Data can get loaded into Neo4j, we can find our post content instances, replace with new links, and then export to an easy to use format for a static site generator.