Skip to:

Opened 13 years ago

Closed 13 years ago

#3452 closed defect (bug) (fixed)

Long Inserted Image in Post doesn't Render in Activity Stream

Reported by: nahummadrid's profile nahummadrid Owned by:
Milestone: 1.5 Priority: normal
Severity: normal Version: 1.5
Component: Activity Keywords: has-patch


Let's say I insert an image in the body of a post and the html is unusually long because the title, alt, or just the image file name, the posted image thumbnail doesn't render correctly in the activity stream item for that post. The html gets cut off because of the excerpt length and the only thing that shows in the activity is the cutoff html <img src="file...[...]

Attachments (4)

3452.01.patch (7.9 KB) - added by boonebgorges 13 years ago.
3473.02.patch (8.0 KB) - added by boonebgorges 13 years ago.
3473.03.patch (10.7 KB) - added by boonebgorges 13 years ago.
3473.04.patch (10.6 KB) - added by boonebgorges 13 years ago.

Download all attachments as: .zip

Change History (21)

#1 @boonebgorges
13 years ago

  • Component changed from Core to Activity
  • Milestone changed from Awaiting Review to 1.5

I noticed this in one of my plugins as well. We should be able to fix it by doing some selective HTML removal before creating the excerpt.

There should be two steps:

  • make sure that we don't excerpt in the middle of a tag (will force_balance_tags() do this?)
  • exclude the content of tags from the character count when excerpting (maybe we can remove tags from the content before excerpting, store them in a stack while creating the excerpt, and then put them back?. Alternatively, we could count up the number of characters in the HTML tags of an excerpt, and then re-excerpt with normal_excerpt_length + characters_in_html_tags as the length)

#2 @DJPaul
13 years ago

So.. this isn't a regression, or a bug introduced in 1.5? Maybe punt to 1.5.1 unless someone comes up with a patch?

#3 @boonebgorges
13 years ago

It's introduced in 1.5 with the "Read More" functionality. I don't think we excerpted comments before 1.5.

#4 @r-a-y
13 years ago

CakePHP uses this to truncate while preserving HTML:

License is MIT.

#5 @boonebgorges
13 years ago

  • Keywords has-patch added

That function from CakePHP is pretty awesome. I was able to drop it in more or less as-is, and it seems to be working great. I modified it a bit to match BP coding standards, to add filters, and to do some backward compatibility.

As for function arguments. In 1.2.9, the parameters looked like this:
function bp_create_excerpt( $text, $length = 123, $filter_shortcodes = true )
The previous BP 1.5 version added a fourth parameter, $append_text, which defaulted to ' [&hellip;]'. The Cake function worked a bit differently:
function truncate( $text, $length = 123, $options = array() )
where $options is an array (which includes 'ending', corresponding to our $append_text). I decided to switch over to the Cake method, to minimize refactoring. I've maintained backward compatibility with BP 1.2.9's function arguments (see the first line, where I do is_bool() on $options). The correct arguments from here on out are spelled out in the PHPDoc for the function.

Please give it a go with some different HTML-strewn content and see if you get what you'd expect.

#6 @DJPaul
13 years ago

Some thoughts:

  • we should i18n "[&hellip;]"
  • the "class_exists( 'Multibyte' );" check won't do anything since BP/WP doesn't have that class (guess it's from cakephp).
  • it uses mb_substr, mb_strlen, mb_strrpos. WP core has fallback version of mb_substr, but not the others.

#7 @boonebgorges
13 years ago

Good call on the first two points.

Re: mb_ string functions. Why do we need fallbacks? The official docs say that all three functions have been around since PHP 4.0.6. I'm guessing WP's fallback in compat.php is left over from when PHP < 4.0 was supported. (Or am I missing something, and the multibyte string functions need an extra PHP dependency?)

#8 @DJPaul
13 years ago

Issue is that mbstring is a non-default extension. You have to set it when PHP is compiled (or find an appropriate binary). Looks like PHP 5.4 is the first release which will have it enabled by default (, under "General Improvements")

#9 @boonebgorges
13 years ago

OK, good call. I hadn't seen that.

In 3452.02.patch I implement your first two suggestions.

As for the multibyte issue. In most places, I think that the non-multibyte equivalent will be fine.

  • Using strlen() instead of mb_strlen() means that sometimes the excerpt will end up being shorter than intended (because multibyte characters will have each of their bytes counted), but this is not the end of the world. (In languages where it makes a huge difference, because most characters are multibyte, a filter can be placed on 'bp_excerpt_length' to make the number larger by default. Imprecise, but better than nothing.)
  • Using strrpos() instead of mb_strrpos() - doesn't really make a difference, as long as we're consistent. mb_strrpos() is used in Cake to get the final position of a whitespace character, so that words aren't broken. strrpos( $truncate, ' ' ); and mb_strrpos( $truncate, ' ' ); will give different values, but as long as we're consistent in how they're used (and as long as we use either substr or mb_substr, as appropriate, when trimming) I don't think there will be an issue.

The remaining problem with multibyte, aside from excerpt length, is that MB characters can sometimes get cut in the middle. When !$exact (ie when we only break on word breaks) it won't be an issue, since we'll be chopping everything after the last space anyway. For other cases, I've stolen some code from wp_html_excerpt() to remove partial entities at the end of the string.

In 3452.02.patch, I have simply taken out the mb_ functions. What do you think instead of providing our own versions of these functions (maybe in bp-core-wpabstraction.php or something)? They would simply return the non-multibyte return value if the MB function doesn't exist. That way, the mb_ functions could be used if they exist, which it seems like will be more likely to be true in parts of the world where it's more important.

#10 @DJPaul
13 years ago

In languages where it makes a huge difference, because most characters are multibyte, a filter can be placed on 'bp_excerpt_length' to make the number larger by default.

I think it's a very bad idea to expect users to create a filter just to get their language parsed right. The average BP user shouldn't know or care about any of the filters.

What do you think instead of providing our own versions of these functions (maybe in bp-core-wpabstraction.php or something)? They would simply return the non-multibyte return value if the MB function doesn't exist.


#11 @boonebgorges
13 years ago

3473.03.patch restores the multibyte functions, and adds fallback versions (taken from MediaWiki) of the necessary functions, in wp-core-abstraction.php.

#12 @boonebgorges
13 years ago

DJPaul points out that I have to remove the function prefixes. See 3473.04.patch.

#13 @DJPaul
13 years ago

I just tested with a blog post with pictures and a lot of other body text. I get the picture and the generated excerpt in the activity stream fine, but when I click the "read more" button, it doesn't return any extra text. e.g. the javascript seems to work, but the rest of the post isn't returned. Maybe the "read more" link shouldn't be shown for these actions.

Last edited 13 years ago by DJPaul (previous) (diff)

#14 @boonebgorges
13 years ago

It's fetching the *activity item*, not the blog post. The activity item is itself an excerpt of the blog post. (Maybe this suggests that we shouldn't be doing Read More excerpting for activity items of type new_blog_post)

#15 @DJPaul
13 years ago

Let's update the patch to not add the "read more" link for new blog posts and blog comments, and probably forum updates, too. I think those are the only types where we link out to the original content? Other than this, the patch seemed to work fine for me.

#16 @boonebgorges
13 years ago

Instead of making an explicit exception for certain activity types, I went with a slightly different method. The check
if ( strlen( $excerpt ) > $excerpt_length ) )
wasn't working right anymore, because the excerpt could contain additional characters due to HTML tags that would make this conditional true, even if the rendered result (when Read More was clicked) would show content which is less than $excerpt_length. So instead, I run *every* item through bp_create_excerpt(). If it's too short, then bp_create_excerpt() just returns the original string; otherwise it truncates. Then, before appending the 'Read More' link, I check to see whether the return value of bp_create_excerpt() is equivalent to the original string - a true test of whether it's been truncated. It seems much more elegant and efficient now.

#17 @boonebgorges
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [4989]) Modifies bp_create_excerpt() to take account of HTML tag length when truncating long texts. Thanks to CakePHP for the function. Refactors BP use of bp_create_excerpt(), where necessary, to account for new function argument structure. Reworks bp_activity_truncate_entry() so that already-truncated activity items (like blog posts) do not get a Read More link when they contain HTML. Introduces fallback functions for some multibyte string functions, to make bp_create_excerpt() work properly. Thanks to MediaWiki for these fallbacks. Fixes #3452. Props DJPaul for all the help with this patch.

Note: See TracTickets for help on using tickets.