Opened 10 years ago
Closed 9 years ago
#6254 closed defect (bug) (fixed)
Chunk of Unicode text does not show any excerpt in BP activity stream at all
Reported by: | rosyteddy | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 2.3 | Priority: | normal |
Severity: | normal | Version: | |
Component: | (not sure) | Keywords: | |
Cc: |
Description
Chunk of Unicode text does not show any excerpt in BP activity stream at all – just shows a “read” link.
For example (in WP 4.1.1 and BP 2.2.1) in Whats New box, post this (from the box below please select-all, then-copy-paste)
मृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृमृ
You get nothing as an excerpt. Is this error on my end? Can you reproduce the error?
Thanks for your help.
Change History (7)
#2
@
10 years ago
One instance I can imagine would be posting a series of emoji. Say, fifty snowflakes to imply a blizzard. I suspect we will want to special case this and use an mb_
function if it's available.
#3
@
10 years ago
@boonebgorges These are actual use cases, when users have entered various things, long texts, multiple pictures and other unexpected stuffs. I started hobby sites with flat forum 14 years ago, and have since then experimented with drupal, wordpress and others and seen many things users can do. If BP is by design like that, we may need to warn users that for long Indic words without space they may not get auto-excerpts. In certain Indic mantras (=religious chant) you may have such actual long words. Conjoined words or "Juktashara" in Indic words may actually appear much shorter in width and yet have more than 200 to 300 characters.
Thanks for your help.
#4
@
10 years ago
Most of our excerpt generator function was borrowed from CakePHP. It might be interesting to see if we are using the latest version (perhaps this is a bug?), and consider reporting it upstream.
#5
@
10 years ago
- Keywords 2nd-opinion added; reporter-feedback removed
- Milestone changed from Awaiting Review to 2.3
Thanks all. Just to reiterate: I don't think this issue has anything to do with Unicode or character sets in particular. Any unbroken 226+ character sequence at the beginning of an activity excerpt will produce the same thing.
DJPaul - Thanks for the reminder about CakePHP. Here's their truncate()
method that we stole from https://github.com/cakephp/cakephp/blob/d21b046fc829e46cb6b7f103034b71fc78d9a192/lib/Cake/Utility/String.php#L519. It appears that the behavior described here - that when exact=false
, an initial word longer than $length
will result in a zero-length excerpt - is actually expected behavior. Here's their test: https://github.com/cakephp/cakephp/blob/d21b046fc829e46cb6b7f103034b71fc78d9a192/lib/Cake/Test/Case/Utility/StringTest.php#L470
IMO, the fix that makes sense is to add a special case to bp_create_excerpt()
that will detect when an excerpt would be trimmed to zero-length due to this issue, and if so, to essentially force exact=false
. So foo...barbaz
, a 226-character string, will be truncated to foo...barba […]
. This will result in mid-word truncation in these cases, but it's probably better than the current behavior. The only question is whether this should be default behavior, and if so, whether it should be configurable - ie, do we need a new value of 'exact' that would be something like 'nonempty', and should it be the default?
#6
@
9 years ago
- Keywords 2nd-opinion removed
Thinking about this more, it's hard for me to imagine a case where someone would want the current behavior - truncating to 0 with a first word that's longer than the $length
. In cases where it does matter for a plugin's purpose, let's let them handle it themselves. I'm going to change the behavior so that strings will never be truncated to zero in these cases.
I can confirm the behavior, but I don't think this is really a bug, and I'm not sure it's specific to Unicode characters.
Basically, what's happening is this. When an activity post has more than 225 characters, BP will truncate it, and add
... [Read More]
at the end. The truncating function -bp_create_excerpt()
- is smart enough not to split at the middle of a word. It will show as many whole words as will fit under the 225 character limit. But the text you've entered here has no spaces before 225 characters, so BP is interpreting it as one very long word, which won't fit under 225 characters. The same thing happens if you have a string of 300a
s or any other character.For this reason, I don't think it's a bug, but before closing the ticket I would like to know how you came across this. Were you just doing some testing, or did you find an actual situation where someone entered an unbroken 400 character string as an activity update? If I thought that there were real-life situations where this might happen, I'd be more inclined to try to find a "fix".