Many WordPress web-sites now use the ActivityPub plugin, which federates the web-site with the rest of the Fediverse.
I’ve quite come to like the idea of such an integration — if only people would comment some more 🙂
What has bothered me a bit, is that plenty of Fediverse posts coming from WordPress blogs contain the entire post, which is sometimes very long. Get a few of those in a search result on your phone, and you’ll be scrolling a lot for very little gain.
The problem is that when configuring the ActivityPub plugin, the only options are either the entire content, or just the excerpt. If you actually use the excerpt field as intended, that’s either everything or at most 150 characters.
We need something in between. Fortunately, there’s a way.
Shortcodes
Luckily, the things that look like shortcode in the ActivityPub plugin settings, actually are shortcodes.
This means that it is straightforward to make another shortcode to send the ‘right’ amount of text to the Fediverse.
Add to that some of the latest additions to the WordPress API — the WP_HTML_Tag_Processor
class — and it is not that difficult to extract and sanitize bits of a post to create an automatic excerpt of intermediary length.
A code example
Below is an example which works for me on some sites.
I often write a few paragraphs of introduction before the first header, and the idea is to extract those paragraphs, sanitizing the code to make sure only basic HTML gets through.
The code below imposes a character limit, so a post without any header won’t get too long.
It always selects entire paragraphs, within the character limit.
HTML tags for simple text markup are reduced to the basics. It will pass bold and italics and links. Figures and images are removed. It handles the markup WordPress generates for footnotes and multilingual texts.
It doesn’t expand the post content completely, so patterns aren’t expanded. This is on purpose, as I have a site with various boxes in the margins, which I don’t want included.
/************************************************************
*
* Generate a longer excerpt of a post with limited markup.
*
* It extracts leading paragraphs from the content, outputting
* always a set of whole paragraphs, within the character
* limit of $max_length.
*
* Processing stops at any unrecognised tag, such as
* headers, lists, blockquotes etc., and at the <!--more-->
* marker to allow manual intervention.
*
************************************************************/
function generate_excerpt( $post, $max_length = 1000 ) {
$post = get_post( $post );
// Expand in-text shortcodes but not blocks/patterns
$input = do_shortcode( $post->post_content );
$paragraphs = []; // found paragraphs
$buffer = ''; // output buffer
$done = false;
$parser = new \WP_HTML_TAG_Processor( $input );
while ( !$done && $parser->next_token() ) {
switch ( $parser->get_token_type() ) {
case '#text':
$buffer .= $parser->get_modifiable_text();
break;
case '#comment':
if ( 'more' === $parser->get_modifiable_text() )
$done = true;
break;
case '#tag':
$tag = $parser->get_token_name();
switch ( $tag ) {
case 'I': // Copy along
case 'EM':
case 'B':
case 'STRONG':
if ( $parser->is_tag_closer() )
$buffer .= "</$tag>";
else
$buffer .= "<$tag>";
break;
case 'A': // Copy with href
if ( $parser->is_tag_closer() )
$buffer .= "</$tag>";
else {
$href = $parser->get_attribute( 'href' );
$buffer .= sprintf( '<A HREF="%s">', $href ?? '' );
}
break;
case 'SUP':
case 'FIGURE':
// Footnote and figure markup -- ignore
// everything up to the closing tag -- This is
// very primitive - sorry!
if ( $parser->is_tag_closer() )
$buffer = $stashed;
else
$stashed = $buffer;
break;
case 'IMG':
// Images -- skip the markup
break;
case 'BDO':
// Text in other languages -- skip the markup
break;
case 'P':
if ( $parser->is_tag_closer() ) {
// Take into account <P> </P> and a newline
$max_length -= strlen( $buffer ) + 8;
if ( $max_length >= 0 )
$paragraphs[] = trim( $buffer );
else
$done = true;
$buffer = '';
}
break;
default:
// Unknown markup will stop processing, but
// also ditch the partially processed
// paragraph. This might not be optimal, but
// it resolves the problem of having
// surprising breaks in the text.
$done = true;
}
}
}
if ( empty( $paragraphs ) )
return get_the_excerpt( $post );
return '<P>' . join( "</P>\n<P>", $paragraphs ) . '</P>';
}
Lastly, a thank you to @pfefferle@mastodon.social for the ActivityPub plugin, which is fantastic.
Leave a Reply