Last week I wrote about a content feature first approach to organizing content. Below is a list of things you might not have thought about that can help isolate key features for easier analysis. I think writing style is the most interesting so I’ll start there:
Writing style
Writing style characteristics are more indicative than you might think. It is often easier to pick out common phrases and what they are likely associated with.
In fact, Google now claims they can tell who wrote an article just from the contents of the article. The way it took years to figure out who the unabomber was from analyzing writing patterns, Google can now automate at scale. If that’s what Google can do, what can you see when actively looking for patterns in your own content? I bet a lot.
Check this one out.
I have received about 1,000 emails being on Jonathan Stark’s email list. He often will write some things, some story, some conversation and then say “Here’s the thing…”
In fact as you can see above, about 120 of 1,040 emails follow this structure of words, words, words, then key point.
Inspecting all content after “here’s the thing” will show me a distilled version of what arrays of points he typically makes in emails. The one point he is making in a sample of 12% of his emails would probably uncover the top few points and array of sub points, and he’s probably not even aware of what that mix is.
Scrape the article content, use some regex to pull out the end of the article after “here’s the thing” or even easier, put it in a spreadsheet and add a column with this pasted in it
=SPLIT(A2,"Here's the thing...", FALSE)
In the above, you can see the article text in column A, and then the formula in column B will split the text, extracting what comes after “here’s the thing” into column C. Easy peasy!
These patterns are everywhere. Many of us put our key points at the ends of our daily emails.
Reading a few of articles for a current prospect, they say “my client” when telling stories and use “you” language in high density when providing key takeaways.
When I write “Why?” it’s because I’m about to explain something I am trying to get at.
If you want a head start on identifying common language you use that may signal a pattern of your writing, you can use a word and phrase frequency counter like the ngram Analyzer that I mentioned in a recent email.
Formatting and HTML patterns
These are easier if there is a primary author, but patterns exist across authors. Think about the use of:
- formatting
- heading types (eg h3 vs h4)
- bold text, italics, underlining, strikethrough
- bullet/numbered lists
- block quotes
- html patterns
- in article links
- text in a “Tweet This” widget
- inline CTA placement
Typically if we look at the HTML/formatting patterns on a per author basis, we can find some interesting things. When do you use italics? When do you bold things? When do you underline? (Hopefully links only)
Article length
Word count for article length is an obvious one. But there are subtleties. Depending on your own writing style, word count will be correlated with other interesting features of your content.
On the high word count side, you may have your full out guides or detailed content assets. Or on the short side, you may occasionally write punchy 3-5 sentence posts like Seth Godin, and rounding all those up together into one long mega advice page would cut your number of pages by 100 with very little effort (recommended).
We can do other interesting things, like look at length over time. I had a client with a few thousand posts. Part of the reason their traffic had fallen off was that their posts had gotten longer. But her audience wasn’t looking for longer form content.
It’s relatively simple to access word count by post in a way you can sort:
- word count plugins
- export the results of a query for word count in your database to CSV
- crawl a posts sitemap xml in Screaming Frog’s list mode (free gives you 500 URLs)
If you use a plugin, you can then show up to 999 posts at a time, sort by word count and bulk add a tag called “long” “medium” or “short.”
Taxonomies
Taxonomies are groupings of discrete concepts and the governing rules we use to make those things discrete. For our purposes, we’re referring to categories (with subcategories), tags (labels).
I won’t spend too much time here, I’ve discussed this a fair amount:
- How not to use tags (externally)
- How to use tags (internally)
Content topics
Topics and terms used will often correlate with the audience segments or groupings you were thinking about when you wrote a given post.
If I’m writing about keyword research, that’s pretty advanced. I personally feel as though experts recently striking out on their own should probably not be thinking about SEO that much or spending a bunch of money on a website that will become a static artifact while their business rapidly changes.
Others might completely disagree about the SEO or expensive website thing – but for me I’m speaking (imagining speaking) to seasoned vets when I discuss those topics. To me they are advanced. And I’d only curate them where appropriate as a result.
Time
Time is another one and it will get its own post (maybe next?).
Articles referencing this one