Google has taken a stance to lower the rankings of sites that appear to be made up of low-quality and thin content by using a filter they call Panda. In a nutshell, if Google identifies pages on your site that don’t appear to add value to their search results, they can hold it against you. The Panda filter has been rolled into their algorithm, making it harder to know if your site has been hit by it or not.
“Site appears to consist of low-quality or shallow pages which do not provide users with much added value (such as thin affiliate pages, doorway pages, cookie-cutter sites, automatically generated content, or copied content).” – Google
In this post I’m going to explain how you can determine if your site’s search performance is being hindered by this Google filter, using something we call Content Strength.
What is Content Strength?
Content Strength is a simple formula we developed in order to calculate the chance a site could be triggering Google to lower the strength of your website from a search engine visibility perspective.
The Content Strength Formula
Content Strength = Search Traffic Pages / Total Pages
Total Pages = Total pages found on the site when a complete crawl was executed.
Search Traffic Pages = Pages that have driven search traffic over the past 90 days.
How to Find Your Content Strength
Step 1: Determine how many total pages are on your site. We recommend using Screaming Frog to crawl your entire site as a search bot in order to find all your pages. We don’t recommend using a sitemap since it might be missing the type of content we’re trying to uncover.
Step 2: Determine which of those pages have generated organic search traffic over the past 90 days.
If you’re using Google Analytics, below are the steps you should take.
- Select Content > Site Content > Landing Pages.
- Under Advanced Segments select “Non-Paid Search Traffic”.
- Change your date range (past 90 days recommended).
You’ll now have a list of pages which drove visits from organic searches over the past 90 days.
Step 3: Apply the Content Strength formula.
- The site has 500 pages.
- 200 of those pages are generating organic search traffic.
- The Content Strength would be 200 / 500 = 40%.
What is a Good / Bad Content Strength Score?
Although there is no exact known score that will absolutely determine whether your site is being affected by Google Panda, we developed these ranges help give you a framework to work from.
Low Risk (70-100%) - There is a low chance that your site is being filtered due to low-quality content. We’d still recommend reviewing the pages which aren’t driving search traffic.
Moderate Risk (50-70%) - There is a chance that some of your content could be hindering your organic search performance.
High Risk (30-50%) - There is a very high chance that your site is not performing as well as it could be due to many low-quality pages.
Deadly (0-30%) - If your site is in this range, there is a good chance your site is being penalized.
How to Improve Your Content Strength Score
Now that you know your Content Strength score, you’re probably wondering how you can improve it. We’ve developed this process to do just that.
Step 1: Review the content which isn’t driving organic search traffic.
Step 2: Categorize your low-quality content.
Rogue Content (Extremely Dangerous) – These are pages that are typically generated by a content management system or coding errors. An example would be WordPress Attachment Pages. These are pages WordPress makes when you attach an image to a blog post. Google doesn’t consider these pages very valuable because they’re not adding value to a visitor’s experience on the site.
Broken Content (Extremely Dangerous) – Broken Content refers to pages that are broken in some way. Ask yourself if a visitor arrived at your page, would they notice something was wrong with the code? This typically includes broken links (internal and external), broken images, embedded content that no longer works and improper use of HTML or CSS.
Thin Content (Very Dangerous) – These are pages that have little to no content on them. For example you may have created a blog post that announced a picnic your company recently had with just the date of the event and a couple pictures. Google may filter pages like this since they don’t appear to have much value.
Taxonomies (Moderately to Very Dangerous) – Taxonomies occur when content is grouped together with other content. An example would be WordPress Tag pages. When you tag multiple articles as “news” those articles can now be also found by going to the tag URL. Since this content is found elsewhere on the site, Google typically considers them low-quality.
Duplicate Content (Moderately Dangerous) – Google defines Duplicate Content as substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Since the content can be found elsewhere and doesn’t appear to offer anything new, Google considers these low-quality.
Redundant Content (Somewhat Dangerous) – This refers to content that is not identical but is very similar to other content on your website. Lets say you run a flower shop website and you write a blog post titled “Choosing the Right Flowers for Valentines Day,” then a year later you write another post titled “Picking the Right Flowers for Your Valentine.” Though both of these pages are unique, Google is typically only going to show one of these posts in the search results.
Stale Content (Somewhat Dangerous) – This is content that doesn’t fit any of the other categories. This content may appear to be valuable but it isn’t driving any search traffic.
Step 3: Clean up your low-quality content. We’ll go into more detail about this in a future post, but here is a quick overview.
Rogue Content – In most cases you’ll want to either noindex these pages so Google can’t see them or fix the coding errors.
Broken Content - As long as the page doesn’t fit any of these other categories you’ll just need to fix what’s broken. If it’s a broken image, replace it with a working image. If it’s a broken link update or remove it. If it’s broken code then have your developer clean it up.
Thin Content – You’ll want to either enhance these pages with more valuable content or noindex/remove them.
Taxonomies – In most cases you’ll simply want to noindex these pages. Here’s an article that may help.
Duplicate Content – You’ll either want to remove the duplicates and 301 redirect them to the one you want to keep or place the canonical tag on the duplicates and make the one you want to keep the source.
Redundant Content – Identify the most valuable piece of content and link, redirect or canonicalize the others to that page. Which approach you take depends on if you see value in keeping the other pages around for your website visitors.
Stale Content – If you feel this content is valuable to your audience then you may want to keep it, right where it is. Take a moment to think about why it might not be satisfying your visitors’ needs. You may find that a few tweaks may make the content more valuable.
Step 4: Recalculate your Content Strength score.
Let’s take our original example and apply the above process.
- The site has 500 pages with 200 driving traffic.
- You’ve now cleaned up 200 pages.
- You recalculate your total pages by subtracting your cleaned-up pages. Your new total pages would be 500 – 200 = 300.
- Your new estimated Content Strength score would be 200 / 300 = 67%.
This will give you a prediction of what your Content Strength score should be once Google considers the changes you’ve made. You’ll need to give Google 3-6 months before you can get the true Content Strength score again.
It’s become clearer to SEO professionals that Panda is not a penalty. Rather, it’s a score that’s built into their algorithm, meaning it’s not something that only needs to be addressed on a case by case basis. Content Strength is something all sites need to consider if they want to perform well in Google.
What are your thoughts and experiences with Content Strength?