Summarizing reviews at scale for product comparison during user purchase journey

The client was a direct to consumer ecommerce website doing approximately $3 million a year in online sales and $30 million in brick and mortar sales.

A key issue uncovered through support chat transcripts, sales and returns, and traffic data was that users had a lot of trouble comparing similar products. We hypothesized that confusion at this late stage in the purchase journey could be improved with more comparative product information at the product description page level, reducing friction for users to make more informed purchasing decisions without having to ballpark differences combing through hundreds of reviews.

We created a proof of concept for semi-automating a process of summarizing hundreds of products, often each with hundreds to thousands of reviews.

We created a corpus using thousands of reviews across the web for products in a given top level product category. The project included common machine learning exercises in natural language processing:

document similarity
topic clustering
key phrase extraction
sentiment analysis

It quickly became apparent that surfacing similarities in product reviews (e.g. characteristics of products at category level) was easier than surfacing the subtle differences we wanted to make users aware of to help inform their choices in comparing products.

To solve for this, we created a data enrichment workflow in the following order:

1. Determine components of consideration

We started by using topic clustering at the product category level to uncover patterns in key factors users discussed on reviews. This allowed us to work from a larger corpus when products did not have enough reviews and helped us determine what sorts of things users cared about most.

2. Surface key differences between products within a category

It’s easy to surface characteristics about a class of products – imagine comparing SUVs to compact cars or fruits to vegetables. But it’s harder to surface distinctions between SUVs or types of apples.

By surfacing key points users were making with ngram keyphrase extraction, we could start to distinguish differences between products within a category as they related to a consideration factor.

The idea was that copywriters could take our samples and clarify once (if) product characteristics met a certain threshold of being distinct enough to be noteworthy on a PDP (product page).

3. Determine whether a difference was good or bad

We’re all different. If a product that is too big for me, or larger than expected, it may be a perfect size for someone else. Ultimately we had to further enrich our keyphrases to get a sense of sentiment: was the product “too big” for most? or “not too big” for many?

Running sentiment analysis on a per product characteristic basis ultimately proved to be too challenging for the purposes of our proof of concept, so instead we created access to context for copywriters to review product information quickly and get a sense of, using our example, relative size to comparable products.

This proof of concept never ended up being implemented on the site, so we don’t have any information related to increased conversions. But it was a really fun first machine learning project to manage.

Summarizing reviews at scale for product comparison during user purchase journey

1. Determine components of consideration

2. Surface key differences between products within a category

3. Determine whether a difference was good or bad

Articles referencing this one

Sitemap

Services

Recent Articles

1. Determine components of consideration

2. Surface key differences between products within a category

3. Determine whether a difference was good or bad

Reader Interactions

Articles referencing this one

Sitemap

Services

Recent Articles

Footer CTA