Tools for Finding unhelpful Content

Identifying “unhelpful” Content

Google has suggested the Helpful Content Classifier flags “unhelpful” content pages because they provide a poor user experience, have no info gain and are written for search engines not humans.

How to Find “unhelpful” Pages on Your Site

I always begin this process by removing pages that are obvious candidates for “unhelpful” content pruning. These are pages that meet the following criteria:

  1. Page doesn’t rank on Google
  2. Page has no inbound links
  3. Page not getting other traffic
  4. Page isn’t part of conversion journey or site functionality

The reason you can process them immediately is because they’re not providing any benefit to your site and since the classifier is based upon a threshold in the number of “unhelpful” pages it is important to remove them immediately. You can only recover from an HCU hit by reducing the numbers of “unhelpful” pages. I assume that isn’t a number but a % based on other metrics like industry and query space intent and competition.

If you’re in tourism unfortunately this is a “niche” that Google’s disruptive No Click strategy has entered with AI Overviews! The SEO and publisher’s communities have noted and are actively discussing on twitter how the HCU classifier hit this niche particularly hard and afterwards announced at Google IO Google’s AI travel guides (currently the AIOverview results have instances of plagiarized content from sites it classified as “unhelpful”) and AI Overviews are prominent in these query spaces.

These pages should be set to draft so that later you can review them to see if they can be re-purposed or more suited to creating another site. Do not NoIndex the above pages unless they are ranking on another search engine, getting good traffic but not from Google (often Social) or have links but page is definitely “unhelpful” in Google’s opinion. I am currently researching and writing about different ways to block Google Search, Gemini and SGE using robots tag and snippet tags.

“unhelpful” Content Discovery

We’ll use the tools below to gather data to establish pages that can be removed because they provide no benefit to the site. Preparation for this process starts with doing the following:

  1. access to Google Analytics, Google Search Console, Majestic, SEMrush, Ahrefs and ranking reports credentials
  2. Run ScreamingFrog w/subscriptions to above
  3. Run ranking report or use GSC Search Results report

One of the other things I noticed looking at the Google analytics of the person I’m assisting is that engagement is very low for the site. If page level engagement data is available it can be an indicator of low quality “unhelpful content”. Really low engagement at the page or site level makes it highly probable to be ad implemntation or “site performance” related.

Using Each Data Source to Identify “unhelpful” Content

I’ve found the best way to organize this data is to collect the metrics using pages as an index, color code each source, import them into one file then order by page to group them for quick review.

  • ScreamingFrog the ONLY Essential Tool for SEO!

The beauty of the ScreamingFrog crawler is it takes all of the tools above that you have credentials for and puts all of the data from those tools into the crawl data for your site! You can create lists of URLs for each type of issue you find. The addons enable you to put all of the data from those sources into one place for analysis of pages for “unhelpful” content.

ScreamingFrog provides most of the data from the tools below plus page speed, details about titles and other SEO optimization targets.The free version allows you to collect data for a limited number of pages. The paid version will allow you to add GA and GSC API data to the ScreamingFrog data at page level. You can plug in SEMrush, Majestic and Ahrefs for ranking and other data. ScreamingFrog could cut a lot of time in identifying content to remove or repurpose.

Identifying All Pages on a Site

Primarily I’m going to use ScreamingFrog to index the site to get a list of pages on the site with metrics from the addons, however, I haven’t subbed to any of those in years. That works for me not saying that’s for everybody. I also only have the unlicensed version but had a sub when I was more active and did use the tools.

Another way is to take advatage of the sitemap that WordPress builds that can be found here: https://www.example.com/wp-sitemap.xml or sitemap.xml if you are using Yoast or AllinOneSEO plugins

Replace example.com with your domain including www. if you use it as well. This is the default URL path of the file and the file name. Below are the steps I use to convert the wp-sitemap.xml into metrics we use in the next step

  1. Copy the lists of urls it provides add them to a text file
  2. Copy the text file and paste into an excel spreadsheet. I wouldn’t worry about the extra columns yet
  3. Save it for use later
  • GA4: Engagement; User Sessions; Direct traffic, Organic Social KPI

Google Analytics 4 (GA4) has several useful metrics to start the “unhelpful” content pruning process. First You want to log into Google Analytics:

  1. Choose Reports in left column
  2. Choose “Traffic Acquisition”
  3. Scroll down past the timeline if the dialog box isn’t set to Session Primary Channel Group change it to that
  4. Press the + beside “Session Primary Channel Group” and click Page / screen
  5. Choose Landing Page + query string
  6. Click Apply in lower right
  7. Get the data by Clicking the share symbol in top right below profile picture
  8. I choose Download CSV

Note: you choose Page + query string because Facebook properties and groups can(?) be identified by query string. I open Excel and import the CSV. You may use Google sheets or some other but this is what I use.

Now to segment the channel sources and pages following these steps:

  1. Highlight Column A which is the “channel”
  2. In Excel go to Sort Filter choose A-Z and click
  3. Now you have grouped the “channels” by source
  4. Next color code each “channel” a different color so it is easy to distinguish them during analysis
  5. Next highlight the “Landing Page + query string” column and repeat the Sort Filter step
  6. Now the pages are grouped and you can see all the sources to the page

This spreadsheet has an incredible amount of useful data for finding “unhelpful” page candidates including engagement, events on page, conversion and revenue data that is excellent as an indicator of user likes and dislikes.

Referral channel is also a useful channel metric because they indicate links that are sending traffic to your site. Later you should review these sites and try to establish regular additions to the site. Event data is extremely useful to determine what the user did on the pages and the revenue generated by each event on each page. Like event data engagement metrics can indicate a good or bad user experience.

When assessing “unhelpful” page candidates even if they are meeting some of the retaining criteria poor engagement metrics and to a lesser extent event data are important metrics to analyze and do a deep dive to discover the types of events that are pushing conversion and engagement.

A deep dive on speed metrics is better left to tools like GTmetrix which provides information on load speed in particular identifying elements that are slowing the page or affecting user experience. If your site is running ad networks and avg. engagement time has numbers less than 20 seconds. Ads are a leading cause.

Below is an image of the data in Excel:

Note: FB count is the number of sessions from Facebook properties. I count them and enter the count here because they aren’t grouped and there can be hundreds.

Also note: that users is the number of unique user sessions and sessions are first time and repeat sessions so they can be higher than number of unique user sessions.

Breaking Down Organic Search for Traffic Analysis

Another way to get information about organic search is to repeat the above analytics report, but, just for the organic traffic. Later we’ll add in Google Search Console page data to get rankings, however, analyzing the organic search channel can be uplifting when you see Bing rising and others maintaining or in the case of DuckDuckGo the first time I’ve seen it appear in anlysis on a medium sized site!

  • GSC: Page and Query Information

Google Search Console is the best place to get ranking and query data. Since the Core update has HCU baked in now I’ll take 3 different snapshots of data at three different time ranges. They will be the last 28 days, 1 yr (the maximum) and either 3 or 6 months depending on when the site peaked and tanked.

To find this data go to GSC and login once you’re in:

  1. In the left column under Performance click “Search Results”
  2. Click the date you want the
  3. In the dialogue box click apply
  4. At top right is the Export click and save locally
  5. Repeat for each time period you want to analyze

When you’re reviewing this data consider that with infinite scroll people are likely looking at more than the 10 results. Keep in mind that Featured Snippets and AIo (Google AIOverviews) will affect click throughs negatively.

Adding Ranking Data to our Spreadsheet

To complete the data gathering process we’ll do the following:

  • Go to GSC
  • For date range choose one covering a period before you were hit to present
  • Under “Performance” Click “Search Results”
  • In upper right click the “Export” button and save locally label it pre-HCU
  • Repeat using the same time period of the Google Analytics channel data you gathered earlier
  • Export that and name it post-HCU
  • Check Discover and News to see if there is any activity if so repeat the complete process for each

While trying to streamline this process I decided that a good idea is to process some of this ranking data before adding it to the main spreadsheet with all the other data. Here are a few steps to speed up this process.

  1. Open the latest “Performance” spread sheet
  2. In Excel highlight the “position” column
  3. Go to Sort and choose smallest to largest
  4. Create a tab call it ranking
  5. Highlight the records until you’re down to the worst ranking you think will not affect your site
  6. Put that list in the tab marked ranking

These are pages that you can gauge what Google thinks is helpful and what’s “unhelpful”. Note there is a caveat on these high ranking pages on Google. It may not be that they weren’t caught in the classifier but they rank because the query space doesn’t have much content.

I would also suggest you read these 2 documents by Google Creating Helpful Content and the Panda List called More guidance on building high Quality Sites. I suggest doing some of the pages with these lists open like a checklist and eventually you’ll be able to fix the bad things without the lists.

I would be more worried about not cutting enough “unhelpful” pages than I would moving to many into drafts or NoIndexing them both are easily undone and often can be repurposed on the current site or new ones.

Once you have removed the “unhelpful” content by:

  • Putting “unhelpful” pages into draft for processing
  • NoIndex using googlebot robots tag for pages getting traffic
  • Pruning or repurposing pages that are too far removed from the topic you started with

Once you have removed these pages I would strongly advise going to GSC and submitting a sitemap was created after removing the pages from the site.

Why is This Important For SEO and Publishers?

Quite simply the quicker you get “unhelpful” content off your site the quicker the clock starts on your full recovery. After a month or so I’d try adding a new page on a topic that is related but a new query space in IR terms or in SEO terms a new keyword.



By Terry Van Horne

Terry Van Horne has been developing and marketing websites since the early 90’s. In 2007 Terry developed a YouTube Marketing Strategy for WorldMusicSupply at last check those 300+ videos have received over 40,000,000+ downloads. Since, 2009 Terry has been an SEO trainer and mentor in several Private SEO communities. Currently Terry is enjoying retirement (ie a monthly check that requires he only keep breathing) and renewing his passion for SEO exploring the next SEO frontier... the semantic web!