In doing some work for a client, all their Jetpack stuff was configured for images to load via i2.wp.com or some variation (Jetpack seems to be a go-to for offshore WordPress dev resources of which we’ve been working with more of for Inbound Found).
When all your images load via a CDN like Jetpack’s Site Accelerator (previously known as Photon), it looks like this:
Basically, i2.wp.com/default-url-path?some-params
A site:domain.com search in Google will return indexed pages. A quick click on the “Images” tab will return the indexed images.
Now, I should probably suss this out more before making claims, but it’s always been my understanding that Google adds a layer of relevance in prioritizing pages when returning results (ie. the whole point of a search engine).
So in a search on Google like this:
site:domain.com
You will see Google results for pages ONLY on your website (just replace domain.com with your domain). And what you should expect to see is your homepage returned first, then an about page, or some other important pages, and on and on, as would be expected, with the least relevant pages being last.
Same with images.
Again, aware this is an assumption.
The accidental CDN vs no-CDN test
But in our oopsy-daisy test, our dev subdomain was not using Jetpack! and being indexed by G, creating a perfectly flawless double blind experiment in the wild with an excellent control and independent variable and no possible confounding variables whatsover (sarcasm).
Anyway, in G Images, I see only the dev dubdomains’s images showing up in the first 20 some image results.
Because crawlers can access the dev site (someone forgot to block robots), which does NOT use Jetpack and has no CDN delivered images, we can compare it to the live site that does have CDN loaded images via Jetpack’s Site Accelerator.
And guess what, our non-CDN near-zero authority dev subdomain is outranking our CDN delivered root domain images:
This also has interesting implications for the “is there such thing as domain authority” debate and the questionable announcement that “Google is getting better at treating subdomains like part of the root domain.”
Our root domain has about 80 to 100 linking root domains, while our dev subdomain has one forum link asking a dev question on wordpress.org. To be fair, very few high DA follows.
Why does this matter? Because of the prioritization search engines have to do algorithmically. Images in Google Images is supposed to be a result of image + landing page combination. The implication here is that G relies on landing page signals (alt tags, PR, internal links, etc.,) to determine what images to serve.
That means that the page the image is on should have an impact on that image’s ability to rank in Google.
Some worthwhile twitter threads for the detail-oriented:
- https://twitter.com/randfish/status/1103000473395556353
- https://twitter.com/glenngabe/status/1043118263461392385
Okay a best practice is to follow cdn.yoursite.com/img, not i0.wp.com/yoursite.com/img – good to know, but only relevant if switching CDNs.
And also good to know that most CDNs totally ignore this in default implementation, instead using weird hashes in URLs and managing everything on their own domain properties.
The idea of no “SEO Bonus” as John Mu said in above second tweet thread, for hosting your images on your own domain is not what we’re seeing.
We see old pages, accidentally being exposed to Google on a dev subdomain, showing images (associated with accidentally exposed landing pages) being served before our much higher authority, much more linked to, website with all sorts of pinned images and links from around the web.
Caveats to our very imperfect experiment
This site got moved for a few months to another domain. The dev site may have been left live and the live site 301 or 302’d to the new domain. When we came in, our first recommendation was moving the website back to the original domain as rankings were climbing while the site was down and 302ing. This was happening for months.
Also whackadoo making this accidental experiment at best questionable in generalizing to the real world.
The twist – G doesn’t support rel canonical headers for images
In addition, the images we don’t see any from the supposed to be live site at all, even though those links have a <link> response header pointing at the correct website:
Here’s the response using CURL
$ curl -I https://i0.wp.com/smithhonig.com/wp-content/uploads/2018/05/D110-FBY-CL-3.jpg
Response:
HTTP/1.1 200 OK
Content-Type: image/jpeg
…
Link: <http://smithhonig.com/wp-content/uploads/2018/05/D110-FBY-CL-3.jpg>; rel=”canonical”
First of all. HTTP/1.1 ? Woah. I guess HTTP2 isn’t fully supported yet, so maybe that’s why that decision was made?
Maybe all CDNs are like that. Let’s see. A quick check on a client using WPEngine’s default CDN setup – I think it’s still Max CDN:
$ CURL -I https://18xha94a1d9wy7eno2lex0r7-wpengine.netdna-ssl.com/wp-content/uploads/2018/11/Minwawa-1Before.jpg
HTTP/1.1 200 OK
Content-Type: image/jpeg
…
Weird. No canonical on that image. Probably because it doesn’t mean a thing. And also HTTP/1.1 protocol. Are both Jetpack Site Accelerator and Max CDN outdated or is there a good reason?
Let’s see what EWWW does, they’re on the up and up (even though I hate seeing their monthly charges).
$ curl -I rockofages-com.exactdn.com/wp-content/uploads/2017/12/Homepage.background.quarry@1x.jpg
HTTP/1.1 200 OK
Content-Type: image/jpeg
…Link: <https://rockofages.com/wp-content/uploads/2017/12/Homepage.background.quarry@1x.jpg>; rel=”canonical”
Ok, interesting – once again, we have rel canonical tags and HTTP1.
Whatever, the point is that the image SEO value is not being attributed to the canonical.
Canonical is a suggestion
People forget this. It’s one signal in a sea of signals that Google has to account for in deciding what page to return for a result in what order.
Medium.com knows this. They know that if you put all your content over there, even if they give you the canonical link, (which they do by default) that their version will probably outrank your version. People are just more likely to link to it, find it, comment on it, and clap on medium.com than your site.
So having rel canonical tags doesn’t “fix your seo” or 100% protect your content from duplicate content issues. It’s just a thing you can do in a situation.
Rel canonical headers and SEO
Rel canonical can be implemented a bunch of ways. See G Webmasters Docs for examples.
One of the methods listed is in returned headers on a request (what we’ve been looking at in the above pasted stuff like:
HTTP/1.1 200 OK
Content-Type: image/jpeg
Link: <http://smithhonig.com/wp-content/uploads/2018/05/D110-FBY-CL-3.jpg>; rel=”canonical”
This is the recommended implementation by G:
Pink highlight mine. Why? That’s the twist!
Wohz. Google only supports rel canonical in headers for web search? Versus what? Youtube search? Image search? Let’s continue on.
Ok, that was written like 8 years ago, but still. That implies what we thought. Rel canonical headers are for web search, which is a separate thing from CDN delivered images and other non-web search assets.
The takeaways
Have a policy for governing CDN implementation by content type
If your site lives and dies on image SEO, think deeply about how/if you will apply a third party CDN setup. It may make a lot more sense to optimize your site by following CDN design patterns vs using a CDN, eg. use hosting that has multiple server locations and does a nice job with geo IP serving of requests.
Put CDN on your own site
For the rest of us, the takeaway here is use a CDN anyway. Just put it on your own cdn.yoursite.com subdomain so at least it’s on your property and make sure the same URL path with image hosted on your root domain exists as a fallback. You’ll always have the option to 301 to respective landing pages associated with your images later. Which, note to self, raises questions I have to look into about whether 301’d image URLs pass pagerank.
Don’t link to your image URLs on your site, use a light box or no link
Additionally, prevent image clicks from taking users to the image URL because then they’ll link to the CDN version of an image, instead of the page the image is on.
We can see this happening (albeit not a ton) on pinterest:
This also makes me think that the image attachment URL thing that I always found annoying about WordPress may have some utility in a pinch.
Ignore rel canonical headers for images atm
Don’t worry about CDN image request headers having a rel canonical link *yet* but it can’t hurt if G decides to support that feature in the future.
Articles referencing this one