RSS feed aggregator on WordPress using the PlanetPlanet plugin

Screenshot of Paddling Planet

Running a RSS feed aggregator site on WordPress has become very easy indeed. All you need is to install the PlanetPlanet plugin on a fresh WordPress site, add the links and the plugin will take care of the rest.

What is an RSS feed aggregator

An RSS feed aggregator is a web site that collects information about posts on other sites in one single place using the RSS feeds that most blogging software makes available.

Usually the sites followed has a shared theme, such as Paddling Planet, which collects posts from a number of sea kayaking related site.

The purpose is to have a single place to get all the latest news about something.

Why a WordPress plugin

Often such sites are called “Planets” after the first program to do this several decades ago. It was called “planet” or “planetplanet”.

The usual programs for RSS feed aggregator sites are self-contained, written in C or Python, and have to handle reading the feeds, storing the data and generating HTML output all by themselves. Scheduling was usually an external issue, handled by cron or some other means.

A “planet” style WordPress plugin can leave a lot of these tasks for WordPress to handle. That means more regular security updates for many functions, and having many other tools available to manipulate the data.

The PlanetPlanet plugin contains absolutely no code to handle display. The WordPress theme will handle that. Choose and customize your theme in any way you want to display the imported posts.

The plug-in by default uses the WordPress integrated “wp-cron” service to schedule the updates, but updates through an external scheduler are equally possible with the WP-CLI interface.

Posts from the other sites are imported as normal posts in WordPress, and featured images are stored in the media library. They can be manipulated in WordPress like any other posts and media.

WordPress has a “Links” section for keeping a blog-roll. Links can be added, modified and removed easily. The plug-in stores the RSS feeds in this links section, so there’s not need for specialised tables.

WordPress post categories are used to keep track of where the posts originate.

The plug-in intercepts links to imported posts and their categories, and redirects them to the originating site. After all, the purpose of a “planet” style RSS feed aggregator is not to copy content from other sites, but to funnel traffic to the contributing sites. The content should be read there.

Setting up a Planet site

To setup a Planet style site you need to:

  1. Setup a WordPress instance on the site;
  2. Install the Planet Planet plug-in;
  3. Configure the plug-in (see below);
  4. Add feeds to the site.

That’s it. The Planet Planet plug-in will do all the rest unattended.

The plug-in is intended to be in charge of the site. Posts and media will appear and disappear with the regular functioning of the site and plug-in. Post categories will be created for each feed.

The Planet Planet plug-in doesn’t interfere with pages, so there’s no problem in having “About” and “Contact” pages on the site.

Other plugins can be used, as long as they don’t interfere with Planet Planet. I have Google Analytics active on such a site using a plug-in.

Configuring the PlanetPlanet plug-in

The PlanetPlanet plug-in can create an RSS feed aggregator site on any WordPress installation where the plug-in can be installed.

The plug-in has no external dependencies. It requires PHP 7.4 and WordPress 5.8 (because it is what I use and have tested it with).

The plug-in has a settings page under the “Settings” menu.

Configuration parameters

  • How often to check feeds
    The frequency of site updates – the options come from the intervals used by the WP-cron scheduler. Select “None” if you run updates otherwise.
  • Discard posts older than this
    If set, a weekly task will delete any posts that are older that this date. The value should be something the PHP class DateTime can parse. It can be an absolute date, like “2020-12-15”, or a relative date, like “6 months ago” or “-180 days”. If empty, no posts are ever discarded automatically.
  • Number of errors before feed is suspended
    If a feed causes too many consecutive errors, it is suspended (by marking it as “not visible” in the WordPress Links section). It can be resumed manually.
  • Email for updates
    If you want emails about updates, enter the email address here.
  • Level of detail in mails
    Select when you want to receive emails. The available options are “Only error messages” (only when a feed causes an error), “Errors and updates” (feed errors and new posts) and “Everything” (full debug output with every item in every feed).
  • Timeout for feed requests
    Number of seconds for network timeouts. Increase if you have regular timeouts.
  • User-Agent
    If the default PHP User-Agent causes some feeds not to load, enter a User-Agent here that passes.

Command line interface

The plug-in augments the WP-CLI app with several commands for managing the Planet site.

List feeds

wp planet list [--match=<string>] [--fields=<fields>] [--format=<format>]

List the feeds registered in the “Links” section in WordPress.

The ‘match’ option restricts the output to feeds whose name, url or RSS url contains the text <string>.

The ‘fields’ option can limit the output to some of the fields in the “Links” section. Available fields are link_id, link_name, link_url, link_rss, link_visible, link_updated, link_rating (contains error count) and link_comment.

The output format can be any of: table (default), csv, json, count or yaml.

Update single feeds

wp planet update <feed-id>... [--log-level=<level>]

Update the given feeds immediately, showing the output.

The ‘log-level’ overrides the configured log level on the site. Available values are ‘errors’, ‘messages’ and ‘debug’.

The feed ids can be found using the “wp planet list” command.

Update all feeds

wp planet update-all [--log-level=<level>]

Like above, but runs a full update of all active feeds.

Purge old posts

 wp planet purge [--log-level=<level>]

Purges all posts with related media from before the date configured.

This is normally run automatically, on a weekly schedule, if a discard date is configured for the site.

Update featured images

wp planet thumbnail <post-id>... [--log-level=<level>]

Download and attach featured images to all the posts without. The command “wp post list –format=ids” can be used to find the ids.

Search a site for feed urls

wp planet scan <site-url> [--format=<format>]

A web site often offers several feeds, and this command will help find the available feeds for a site. Just give it the url of the site, and it will output a list of available feeds with their names and description.

$ wp planet scan https://rene.seindal.dk/
+-------------------------------------------+----------------------------------------+
| Title                                     | Feed url                               |
+-------------------------------------------+----------------------------------------+
| René Seindal » Feed                       | https://rene.seindal.dk/feed/          |
| René Seindal » Comments Feed              | https://rene.seindal.dk/comments/feed/ |
| René Seindal » René Seindal Comments Feed | https://rene.seindal.dk/about-me/feed/ |
+-------------------------------------------+----------------------------------------+
Success: Found 3 feeds.

Adding a feed

wp planet add <feed-url>

Add the feed to the site. The argument should be the link to an RSS or Atom feed, not the site itself.

The “wp planet scan” command above can be used to find the available feeds.

Site name and site url are deduced from the feed.

Bugs, questions and other issues

Please post your feedback, questions or bug reports on the WordPress plugin support board.