There are a lot of reasons to let Google power your site search results. For one, the whole thing is complicated.
Even WordPress’ built in method for ordering results in a keyword phrase query by relevance is pretty limited:
‘
WP WP_Query Docsrelevance
‘ – Order by search terms in the following order: First, whether the entire sentence is matched. Second, if all the search terms are within the titles. Third, if any of the search terms appear in the titles. And, fourth, if the full sentence appears in the contents.
Other reasons to let Google handle it are things like having content across multiple subdomains or databases.
Recently we had a project requirement of returning relevant site search results where the website was on WordPress and the blog was on Hubspot. Given Hubspot’s site search is pretty bad, with no option to order results by something like relevance, we needed an alternative to merging results on the fly.
In addition, WordPress’ site search, though configurable, can leave a lot left to be desired. We can’t solve for synonyms without creating indices, or layer in other signals that would help us sort more effectively.
But Google does a nice job of all of these things.
- Google Custom Search: Embed vs JSON API
- Choose what sites and pages to include and exclude on your CSE
- Which API should I use?
- Create a new developer project
- Enable Custom Search API
- Create your API key (tricky UI)
- Test your API key in the browser
- Chain together your request URI
- Restrict your API key to only work on your website and with the Custom Search API
- The full search.php file
Google Custom Search: Embed vs JSON API
A few years ago, Google announced it was shutting down Google Site Search (GSS) and did in 2018. If you weren’t familiar, you could power your website’s search feature with Google results from just your website.
This ability actually lives on in Google Custom Search, but the default method of pasting the embed snippet litters your site search results with ads from other websites.
But there’s also a Google Custom Search JSON API. It’s free up to a certain number of requests threshold, at which point you can opt to pay $5 for each additional 1000 searches above that threshold.
It requires some development work to chain together the parameters of your request and then parse the results, but the premise is actually quite simple.
Create your site search engine with Google Custom Search (CSE)
The first step, and a pretty neat tool in and of itself is creating your own GCS. Once you’re logged into your Google account, you can “add” a search engine.
Aside: you can build search engines around multiple sites. For example, you could add your site and competitor sites to quickly see who has what content for a given topic.
Choose what sites and pages to include and exclude on your CSE
You can also use some simple patterns to only allow the searching of specific subdomains or subdirectories.
If you want to exclude sections of your site, that’s slightly hidden, but totally doable:
This can be useful if you want to exclude an images folder or section of your site, like everything in a /careers/ subdirectory.
The way to match URL patterns isn’t 100% intuitive. It’s sort of a simplified version of regular expression matching the wildcard (*) symbol.
Here is my notes based on tinkering and working through Google’s notes under “More on URL patterns” section of this page:
Pattern | How it matches – some “gotchas” here |
domain.com | Match all pages on a domain |
domain.com/ | Matches a single page (here, homepage) |
*.domain.com/* | Match all pages of all subdomains |
*.domain.com/*tag*career | Match pages of any subdomain that have the string “tag” and “career” in them (doesn’t have to be in that order or end with career) |
www.domain.com/* | Matches URLs starting with www.domain.com/ (but also matches root domain’s URLs like domain.com/whatevs) |
www.domain.com/*career | Matches all URLs that begin with www.domain.com/ and contain “career” |
www.domain.com/careers/*.html | Matches all URLs that begin with www.domain.com/careers/ and also contain “.html” |
Another way to further refine search results to include or exclude pages pages is to request URLs to be indexed or removed in Google Search Console. Keep in mind that these types of requests will be applied to Google.com searches as well.
You have a couple more options on the Setup screen:
Search the entire web: should be off
Augment your results with general Web Search results: should be off
Programmatic Access options
- Custom Search JSON API – Limit of 10,000 queries per day.
- Custom Search Site Restricted JSON API – No daily query limit.
Restrict Pages using Knowledge Graph Entities
Restrict pages from the above site list to only those that are about the Entities listed below. You can add up to five (5) Entities to your Search Engine.
Restrict Pages using Schema.org Types
Restrict pages from the above site list to only those that contain Schema.org types from the list below. You can add up to ten (10) schema.org types to your Search Engine.
There are actually some really interesting use cases in these, which are definitely for another time unless you know you want to exclude certain schema types or knowledge graph entities.
Which API should I use?
Both rely on the same parameters, and per query pricing is the same ($5 per 1000 queries) so they’re about the same. But there are some edge case limitations to each.
The short version is you get 100 free queries per day with the default API, but if a 10k queries per day limit would be a problem for your site, (and you have less than 10 sites in your engine) you’ll want to use the site restricted JSON API (docs)
Google Custom Search JSON API
https://www.googleapis.com/customsearch/v1?[parameters]
Google Custom Search Site Restricted JSON API
https://www.googleapis.com/customsearch/v1/siterestrict?[parameters]
If you need more than 10k queries per day and your Custom Search Engine searches 10 sites or fewer, you may be interested in the Custom Search Site Restricted JSON API, which does not have a daily query limit.
Custom Search docs
Use Google Developer Console to setup your API
Create a new developer project
Once you’ve picked which API, you’ll want to head over to Google Cloud Developer Console and create a new project
Name your project something recognizable or you will not be able to easily find it when you have a lot of projects.
Honestly, Cloud Console is super confusing. So once created, just click the notification icon to quickly get to your project.
The project home dashboard looks like this:
On your project page scroll to Getting Started first item and click “Explore and enable APIs”
Enable Custom Search API
Then click “Enable APIs and Services”
Search for the API you want to use. This will allow us to enable that API.
It should pop up, click it.
Hit Enable.
Create your API key (tricky UI)
Now pay attention, because, again this gets super confusing. API key as a credentialing option is not available by default.
Once you’re in Credentials section, click the anchor text that says “Credentials in APIs & Services”:
Now you can add an API key by clicking “Create Credentials.”
It should be the first option, click it:
You should see a popup with your API key.
Test your API key in the browser
Use the copy icon to grab your API key.
At this point we can choose to test our API key in the browser or go ahead and restrict our API key to the site we’ll be working from.
Originally, I made the mistake of restricting the key to the intended live site before testing it, and then using a dev site to work on it.
To quickly check your API key works, you can actually just build your query and run a search in your browser. If you aren’t familiar with how API calls work, this is a good exercise to go through anyway.
So we’re ready for a “Hello World.” We want to check that our API call works. Here’s what we need:
- The Custom Search JSON API URI
- Your API key
- Your Search Engine ID
- A string to run a test search
The Custom Search JSON API URI
Both JSON and Restricted Site JSON API URIs are listed above, but repasting the standard API URI here for convenience:
https://www.googleapis.com/customsearch/v1?[parameters]
Copy your API key:
If you closed the window, you can still click the other copy icon:
Mine is:
AIzaXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXHu4z AIzaSyB9atsswT0tVCNwBqQuCXegdkY0Ru6RHh0
Get your search engine ID
Go back to CSE dashboard, select “Edit search engine” > Your search engine from the drop down.
Mine:
Search engine ID: 003982449146190185859:a6m0wtlvljd
Choose a test search phrase
I’m using “hello world” but you should use any phrase that should return more than one result from your website.
Chain together your request URI
https://www.googleapis.com/customsearch/v1? key=INSERT_YOUR_API_KEY cx=INSERT_YOUR_SEARCH_ENGINE_ID q=your search query
Altogether you get something like this:
https://www.googleapis.com/customsearch/v1?key=AIzaSyB9atlasT0tHeNwBqQuCXegdkY0Ru6RHh0&cx=003982449146190185859:a6m0wtlvljd&q=hello world
And this returns our JSON data, and in this case two result objects as “items,” showing just the first one here so you can see the structure:
{ "kind": "customsearch#search", "url": { "type": "application/json", "template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json" }, "queries": { "request": [ { "title": "Google Custom Search - hello world", "totalResults": "2", "searchTerms": "hello world", "count": 2, "startIndex": 1, "inputEncoding": "utf8", "outputEncoding": "utf8", "safe": "off", "cx": "003982449146190185859:a6m0wtlvljd" } ] }, "context": { "title": "Content Audience" }, "searchInformation": { "searchTime": 0.302757, "formattedSearchTime": "0.30", "totalResults": "2", "formattedTotalResults": "2" }, "items": [ { "kind": "customsearch#result", "title": "The Expert Content Research Project", "htmlTitle": "The Expert Content Research Project", "link": "https://contentaudience.com/research/", "displayLink": "contentaudience.com", "snippet": "... a graph database technology. And then she spent HOURS helping me get \nsome CSV data loaded in for a Hello World! using Neo4j's query language (CQL)\n.", "htmlSnippet": "... a graph database technology. And then she spent HOURS helping me get \u003cbr\u003e\nsome CSV data loaded in for a \u003cb\u003eHello World\u003c/b\u003e! using Neo4j's query language (CQL)\u003cbr\u003e\n.", "cacheId": "XbxGQAFzjEQJ", "formattedUrl": "https://contentaudience.com/research/", "htmlFormattedUrl": "https://contentaudience.com/research/", "pagemap": { "metatags": [ { "viewport": "width=device-width, initial-scale=1", "msapplication-tileimage": "https://contentaudience.com/wp-content/uploads/2019/09/content-audience-favicon-8.png" } ], "webpage": [ { "headline": "The Expert Content Research Project" } ], "creativework": [ { "text": "Table Of ContentsResearch QuestionBackgroundThe traditional path for experts with audiencesIt’s clear what works is changingThree-Pronged ApproachInitial Three Question SurveyFollow Up Five...", "headline": "The Expert Content Research Project" } ], "wpheader": [ { "description": "Turn your piles of content into recurring revenue", "headline": "Content Audience", "url": "$150 CONSULT" } ], "sitenavigationelement": [ { "name": "About", "url": "About" } ], "searchaction": [ { "query-input": "name=s", "target": "https://contentaudience.com/?s={s}" } ] } } ] }
Now that we know our request works as intended, we can restrict our API and get on with developing a new search.php template.
Restrict your API key to only work on your website and with the Custom Search API
Your key will pop up. You can copy it now or later, but go to “Restrict Key” if you want to limit potential abuse.
You can limit a number of ways. I chose HTTP referrers to limit requests to be coming from my website, eg. contentaudience.com/?s=some+search and then I restricted my key to only be used with the Custom Search API:
NOTE: Do not use HTTP referrer option for WordPress sites
HTTP referrers as the application restriction is unreliable at best. After a few hours of troubling a 60% request failure rate, I learned that
“…the referer is not reliable and shouldn’t be used for a security check. Additionally, it is not guaranteed to be set.”
via WordPress Stack Exchange
Instead, I’m using the host website’s IP address, for WPEngine and Digital Ocean, these are more stable for my situation. In addition, you can edit your Quotas in Cloud Dev Console, to throttle a user from running more than would be appropriate searches in a period of time.
We hit save, and boom. We have our updated key.
If you imposed an IP restriction, double check the IP restriction works
Retest your URI in the browser and you should see the following:
{
"error": {
"code": 403,
"message": "The supplied API key is not configured for use from this IP address.",
"errors": [
{
"message": "The supplied API key is not configured for use from this IP address.",
"domain": "global",
"reason": "forbidden"
}
],
"status": "PERMISSION_DENIED"
}
}
Dealing with query limits
There are two options for dealing with query restrictions being reached as I see it. Without a credit card for billing, your site searches will be restricted to 100 searches a day and then the API calls will error out.
Option 1) Pay for additional queries
To handle this, we can set up a credit card and pay $5 for every 1,000 additional requests. The alternative is using the embedded Google CSE which returns ads (yuck). In my mind it is well worth the cost. Learn more about setting up billing options.
Option 2) Conditionally return the default WordPress search results
You get 100 free requests a day. Once you hit that limit, a 403 response code will be returned. We can use that status code to conditionally revert back to serving the default search results.
Replacing WordPress site search with our Google CSE results
A quick review of how html forms work – to create a search form for your site, you can use a widget or paste something like this (changing the URL instances of my domain to your own):
<form class="search-form" method="get" action="https://contentaudience.com/" role="search" itemprop="potentialAction" itemscope="" itemtype="https://schema.org/SearchAction">Search this website" itemprop="query-input"> <input class="search-form-submit" type="submit" value="Search"> <meta content="https://contentaudience.com/?s={s}" itemprop="target"> </form>
As HTML, that will output like this:
When a search is run inputting something like “hello world,” it will make a GET request for the URI and redirect the user there:
https://contentaudience.com/?s=hello+world
Keep in mind that if you are not using WordPress, the query parameter might be something other than “s,” but it’s usually either “s” for search or “q” for query.
The search.php file
Most WordPress themes have a search.php file that WordPress looks for to determine what template to return and how to handle the field value (in this case, the user’s query).
Conveniently, WordPress has a function that allows us to grab the search query value
get_search_query()
This is important because it’s the key parameter for us to pass through to our Google Custom Search API query URI.
// build our request
$key = 'XXXX_PASTE_YOUR_API_KEY_HERE_XXXX';
$cx_search_id = '011042974731601094488:vv2heducjzz';
$base_url = 'https://www.googleapis.com/customsearch/v1?key='; $request_url = $base_url . $key . '&cx=' . $cx_search_id . '&q=' . $q;
Now we have built our $request_url just like we did when we ran our browser test earlier. And we can check that it looks good with a quick
echo $request_url;
At this point we need to capture the response object outputted in our request.
$response = wp_remote_get( $request_url, array ('method' => 'GET', 'timeout'=> 100, 'sslverify' => false ));
$results = array();
the wp_remote_get() function simply performs our request with the parameters we set. We define a results variable as an array so we can pass our search results items in a PHP array.
Having this response object is helpful because now we can also do some error handling by checking the response code and adding some conditional statements
$response_code = wp_remote_retrieve_response_code( $response );
if ( $response_code == 403 ) {
echo 'Oopsies, request was blocked.';
}
// capture response object from gcs
$body = wp_remote_retrieve_body( $response );
$results_response_object = json_decode($body, true); // convert json to php object
$results = $results_response_object['items'];
As you can see from the comments, we’re converting our json response body to a php object with json_decode() function, and clarifying our $results as the nested “items” being returned.
// get a count of results (working)
$num_results = count($results);
At this point we can just output a loop of the returned results with some scripting:
// if response is OK
if ( $response_code == 200 ) {
// if there are results
if ( $num_results < 0 ) {
$i = 0;
echo '<ul>';
// loop through each result and grab the title, link, blurb
foreach ( $results as $result ){
echo '<li><h3><a href="' . trim( $result['link'] ) . '">' . ($i + 1) . '. ' . trim( $result['title'] ) . '</a></h3><p>' . trim( $result['htmlSnippet']) . '</p></li>';
$i++;
}
echo '</ul>';
}
}
And that’s about it. My full gist for the search.php template that I am currently using (relies on some Genesis templating) is linked and embedded below.
The full search.php file
And here’s the embedded gist. The styles should be pretty reasonably light, just showing the different background colors for alternating results, and otherwise match you website’s output.
I’ll do my best to keep it updated as I improve error handling.
Let me know if you have questions about adapting it for your purposes.
Articles referencing this one