Post length with ActivityPub

Many WordPress web-sites now use the ActivityPub plugin, which federates the web-site with the rest of the Fediverse.

I’ve quite come to like the idea of such an integration — if only people would comment some more 🙂

What has bothered me a bit, is that plenty of Fediverse posts coming from WordPress blogs contain the entire post, which is sometimes very long. Get a few of those in a search result on your phone, and you’ll be scrolling a lot for very little gain.

The problem is that when configuring the ActivityPub plugin, the only options are either the entire content, or just the excerpt. If you actually use the excerpt field as intended, that’s either everything or at most 150 characters.

We need something in between. Fortunately, there’s a way.

Shortcodes

Luckily, the things that look like shortcode in the ActivityPub plugin settings, actually are shortcodes.

This means that it is straightforward to make another shortcode to send the ‘right’ amount of text to the Fediverse.

Add to that some of the latest additions to the WordPress API — the WP_HTML_Tag_Processor class — and it is not that difficult to extract and sanitize bits of a post to create an automatic excerpt of intermediary length.

A code example

Below is an example which works for me on some sites.

I often write a few paragraphs of introduction before the first header, and the idea is to extract those paragraphs, sanitizing the code to make sure only basic HTML gets through.

The code below imposes a character limit, so a post without any header won’t get too long.

It always selects entire paragraphs, within the character limit.

HTML tags for simple text markup are reduced to the basics. It will pass bold and italics and links. Figures and images are removed. It handles the markup WordPress generates for footnotes and multilingual texts.

It doesn’t expand the post content completely, so patterns aren’t expanded. This is on purpose, as I have a site with various boxes in the margins, which I don’t want included.


/************************************************************
 *
 * Generate a longer excerpt of a post with limited markup.
 *
 * It extracts leading paragraphs from the content, outputting
 * always a set of whole paragraphs, within the character
 * limit of $max_length.
 *
 * Processing stops at any unrecognised tag, such as
 * headers, lists, blockquotes etc., and at the <!--more-->
 * marker to allow manual intervention.
 *
 ************************************************************/

function generate_excerpt( $post, $max_length = 1000 ) {
    $post = get_post( $post );

    // Expand in-text shortcodes but not blocks/patterns
    $input = do_shortcode( $post->post_content );

    $paragraphs = [];        // found paragraphs
    $buffer = '';       // output buffer

    $done = false;

    $parser = new \WP_HTML_TAG_Processor( $input );
    while ( !$done && $parser->next_token() ) {

        switch ( $parser->get_token_type() ) {
        case '#text':
            $buffer .= $parser->get_modifiable_text();
            break;

        case '#comment':
            if ( 'more' === $parser->get_modifiable_text() )
                $done = true;
            break;

        case '#tag':
            $tag = $parser->get_token_name();

            switch ( $tag ) {
            case 'I':   // Copy along
            case 'EM':
            case 'B':
            case 'STRONG':
                if ( $parser->is_tag_closer() )
                    $buffer .= "</$tag>";
                else
                    $buffer .= "<$tag>";
                break;

            case 'A':   // Copy with href
                if ( $parser->is_tag_closer() )
                    $buffer .= "</$tag>";
                else {
                    $href = $parser->get_attribute( 'href' );
                    $buffer .= sprintf( '<A HREF="%s">', $href ?? '' );
                }
                break;


            case 'SUP':
            case 'FIGURE':
                // Footnote and figure markup -- ignore
                // everything up to the closing tag -- This is
                // very primitive - sorry!
                if ( $parser->is_tag_closer() )
                    $buffer = $stashed;
                else
                    $stashed = $buffer;
                break;

            case 'IMG':
                // Images -- skip the markup
                break;

            case 'BDO':
                // Text in other languages -- skip the markup
                break;

            case 'P':
                if ( $parser->is_tag_closer() ) {
                    // Take into account <P> </P> and a newline
                    $max_length -= strlen( $buffer ) + 8;
                    if ( $max_length >= 0 )
                        $paragraphs[] = trim( $buffer );
                    else
                        $done = true;

                    $buffer = '';
                }
                break;

            default:
                // Unknown markup will stop processing, but
                // also ditch the partially processed
                // paragraph. This might not be optimal, but
                // it resolves the problem of having
                // surprising breaks in the text.

                $done = true;
            }
        }
    }

    if ( empty( $paragraphs ) )
        return get_the_excerpt( $post );

     return '<P>' . join( "</P>\n<P>", $paragraphs ) . '</P>';
}

Lastly, a thank you to @pfefferle@mastodon.social for the ActivityPub plugin, which is fantastic.


Comments

4 responses to “Post length with ActivityPub”

  1. René Seindal avatar
    René Seindal

    @renes-old-blog @pfefferle Hi Matthias, I tried to mention you in this post, but I haven't figured out how yet.

    1. Matthias Pfefferle avatar
      Matthias Pfefferle

      @seindal @renes-old-blog @pfefferle@notiz.blog seemed to work fine, but blogs does not yet accept direct mentions, so you could use my mastodon profile the next time. Isn‘t there an attribute for the excerpt shortcode to define the length? Haven’t looked in a while… I am getting old 😌

      1. René Seindal avatar
        René Seindal

        @pfefferle @renes-old-blog @pfefferle@notiz.blog
        If post_excerpt is set, that is used.

  2. René Seindal avatar
    René Seindal

    @BeAware @renes-old-blog
    Yes, the official app.

Leave a Reply

Your email address will not be published. Required fields are marked *