Post length with ActivityPub

Many WordPress web-sites now use the ActivityPub plugin, which federates the web-site with the rest of the Fediverse.

I’ve quite come to like the idea of such an integration — if only people would comment some more 🙂

What has bothered me a bit, is that plenty of Fediverse posts coming from WordPress blogs contain the entire post, which is sometimes very long. Get a few of those in a search result on your phone, and you’ll be scrolling a lot for very little gain.

The problem is that when configuring the ActivityPub plugin, the only options are either the entire content, or just the excerpt. If you actually use the excerpt field as intended, that’s either everything or at most 150 characters.

We need something in between. Fortunately, there’s a way.


Luckily, the things that look like shortcode in the ActivityPub plugin settings, actually are shortcodes.

This means that it is straightforward to make another shortcode to send the ‘right’ amount of text to the Fediverse.

Add to that some of the latest additions to the WordPress API — the WP_HTML_Tag_Processor class — and it is not that difficult to extract and sanitize bits of a post to create an automatic excerpt of intermediary length.

A code example

Below is an example which works for me on some sites.

I often write a few paragraphs of introduction before the first header, and the idea is to extract those paragraphs, sanitizing the code to make sure only basic HTML gets through.

The code below imposes a character limit, so a post without any header won’t get too long.

It always selects entire paragraphs, within the character limit.

HTML tags for simple text markup are reduced to the basics. It will pass bold and italics and links. Figures and images are removed. It handles the markup WordPress generates for footnotes and multilingual texts.

It doesn’t expand the post content completely, so patterns aren’t expanded. This is on purpose, as I have a site with various boxes in the margins, which I don’t want included.

 * Generate a longer excerpt of a post with limited markup.
 * It extracts leading paragraphs from the content, outputting
 * always a set of whole paragraphs, within the character
 * limit of $max_length.
 * Processing stops at any unrecognised tag, such as
 * headers, lists, blockquotes etc., and at the <!--more-->
 * marker to allow manual intervention.

function generate_excerpt( $post, $max_length = 1000 ) {
    $post = get_post( $post );

    // Expand in-text shortcodes but not blocks/patterns
    $input = do_shortcode( $post->post_content );

    $paragraphs = [];        // found paragraphs
    $buffer = '';       // output buffer

    $done = false;

    $parser = new \WP_HTML_TAG_Processor( $input );
    while ( !$done && $parser->next_token() ) {

        switch ( $parser->get_token_type() ) {
        case '#text':
            $buffer .= $parser->get_modifiable_text();

        case '#comment':
            if ( 'more' === $parser->get_modifiable_text() )
                $done = true;

        case '#tag':
            $tag = $parser->get_token_name();

            switch ( $tag ) {
            case 'I':   // Copy along
            case 'EM':
            case 'B':
            case 'STRONG':
                if ( $parser->is_tag_closer() )
                    $buffer .= "</$tag>";
                    $buffer .= "<$tag>";

            case 'A':   // Copy with href
                if ( $parser->is_tag_closer() )
                    $buffer .= "</$tag>";
                else {
                    $href = $parser->get_attribute( 'href' );
                    $buffer .= sprintf( '<A HREF="%s">', $href ?? '' );

            case 'SUP':
            case 'FIGURE':
                // Footnote and figure markup -- ignore
                // everything up to the closing tag -- This is
                // very primitive - sorry!
                if ( $parser->is_tag_closer() )
                    $buffer = $stashed;
                    $stashed = $buffer;

            case 'IMG':
                // Images -- skip the markup

            case 'BDO':
                // Text in other languages -- skip the markup

            case 'P':
                if ( $parser->is_tag_closer() ) {
                    // Take into account <P> </P> and a newline
                    $max_length -= strlen( $buffer ) + 8;
                    if ( $max_length >= 0 )
                        $paragraphs[] = trim( $buffer );
                        $done = true;

                    $buffer = '';

                // Unknown markup will stop processing, but
                // also ditch the partially processed
                // paragraph. This might not be optimal, but
                // it resolves the problem of having
                // surprising breaks in the text.

                $done = true;

    if ( empty( $paragraphs ) )
        return get_the_excerpt( $post );

     return '<P>' . join( "</P>\n<P>", $paragraphs ) . '</P>';

Lastly, a thank you to for the ActivityPub plugin, which is fantastic.


6 responses to “Post length with ActivityPub”

  1. @renes-old-blog @pfefferle Hi Matthias, I tried to mention you in this post, but I haven't figured out how yet.

    1. @seindal @renes-old-blog seemed to work fine, but blogs does not yet accept direct mentions, so you could use my mastodon profile the next time. Isn‘t there an attribute for the excerpt shortcode to define the length? Haven’t looked in a while… I am getting old 😌

  2. @renes-old-blog @seindal well, AFAIK, most apps shorten the post with an option to expand to show the full post. Which app do you use that doesn't do this?🤔 Official Mastodon app?😬

      1. @seindal @renes-old-blog damn, that sucks! Sorry to hear that.

        If you're on iOS, I'd consider using something like @Mona or @IceCubesApp that have this feature to shorten posts with an option to expand.

        If you're on Android, I'd consider using @moshidon or @megalodon

        If those don't look like apps you'd enjoy, I'm out of ideas because this is strictly a Mastodon issue. They don't allow "article" type posts which are how WP defines them. I don't see this changing anytime soon but you can request a feature on their roadmap page:

Leave a Reply

Your email address will not be published. Required fields are marked *