Skip to:
Content

Opened 4 years ago

Closed 3 years ago

#2194 closed defect (bug) (fixed)

RSS feed serious accent problem

Reported by: chouf1 Owned by: sorich87
Milestone: 1.5 Priority: normal
Severity: Version:
Component: Activity Keywords: has-patch needs-testing
Cc: chouf1, rogercoathup

Description

on the rss feed, words with accent are not correctly interpreted.

In the title i have this: admin a téléchargé
In the exerpt, i have admin a téléchargé

in bp-activity-templatetags:910

putting this

$title = trim( strip_tags( utf8_encode( html_entity_decode( $content[0] ) ) ) );

solved partially the problem for the title, but this is not satifying. For the description the filter didn't work too.

Take an example here:http://bp-fr.net/activity/feed/
there are some description with accent at the bottom of the page...


Attachments (3)

2194.diff (1.7 KB) - added by sorich87 3 years ago.
2194.001.diff (2.0 KB) - added by cnorris23 3 years ago.
2194.002.diff (6.8 KB) - added by cnorris23 3 years ago.

Download all attachments as: .zip

Change History (49)

comment:1 chouf14 years ago

To solve the description accent problem, i use the light modified return code from 1.1.3 !

return apply_filters( 'bp_get_activity_feed_item_description', html_entity_decode( str_replace( '%s', '', $content ), ENT_COMPAT, 'UTF-8' ) );

comment:2 chouf14 years ago

  • Milestone changed from 1.3 to 1.2.4
  • Resolution set to fixed
  • Status changed from new to closed

i solved the accent problem for title and description.

For the title:910
`
function bp_get_activity_feed_item_title()
replace
$title = trim( strip_tags( html_entity_decode( utf8_encode( $content[0] ) ) ) );

by

$title = trim( strip_tags( utf8_encode( html_entity_decode( $content[0] ) ) ) );
and
return apply_filters( 'bp_get_activity_feed_item_title', $title );

by

return apply_filters( 'bp_get_activity_feed_item_title', utf8_decode( str_replace( ':', , $title ), ENT_COMPAT, 'UTF-8' ) );
`

and for the description:944

`
function bp_get_activity_feed_item_description()
replace
return apply_filters( 'bp_get_activity_feed_item_description', html_entity_decode( str_replace( '%s', , $content ) ) );

by

return apply_filters( 'bp_get_activity_feed_item_description', html_entity_decode( str_replace( '%s', , $content ), ENT_COMPAT, 'UTF-8' ) );
`

comment:3 chouf14 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:4 chouf14 years ago

i'm ok with my modification, but the description title is now missing.

http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fbp-fr.net%2Factivity%2F

and i have a description tile at least. Did this mean that the title is intended to appear twice in the original code ?
So i guess it is not necessary because the output is finally correct, with an not bolded title.... witch is a minor difference.

comment:5 johnjamesjacoby4 years ago

Are you able to formulate your fixes into a patch so I can test it?

We could use some help handling issues like these since all of our native languages are English.

comment:6 chouf14 years ago

  • Cc chouf1 added
  • Resolution set to fixed
  • Status changed from reopened to closed

OK, this was a bit weird to debug, but finally we found the trick.

Here are the final results - tested on bp-fr.net, but should be tested on other language site (de, es, dk, etc)

bp-activity/bp-templatetags.php

line 901 function bp_get_activity_feed_item_title()
replace line 910 by
$title = trim(html_entity_decode(strip_tags($content[0]),ENT_COMPAT, 'UTF-8' ));

line 944 function bp_get_activity_feed_item_description()
replace line 952 by
return apply_filters( 'bp_get_activity_feed_item_description', html_entity_decode( str_replace( '%s', , $content ), ENT_COMPAT, 'UTF-8' ) );

Additionnal (not innocent) question
Why does the title content an CDATA ? This is not usual, because UTF8 should always be used in RSS.

Except if the title is not UT8 somewhere. If you could explain me this, i would appreciate, thanks !

comment:7 rogercoathup4 years ago

  • Cc rogercoathup added
  • Milestone changed from 1.2.4 to 1.2.6
  • Priority changed from major to normal
  • Resolution fixed deleted
  • Status changed from closed to reopened

This problem still exists in 1.2.5

e.g. feed://clubbelote.com/activite/feed/

Club Belote a cr�� le sujet test thread sur le forum du groupe Club belote
Yesterday, 3:43 PM
Club Belote a cr�� le sujet test thread sur le forum du groupe Club belote: test

comment:8 DJPaul4 years ago

  • Milestone changed from 1.2.6 to 1.3

No time left to get this into 1.2.6.

sorich873 years ago

comment:10 sorich873 years ago

  • Keywords has-patch added
  • Owner set to sorich87
  • Status changed from reopened to accepted

comment:11 boonebgorges3 years ago

Strangely, I'm unable to reproduce the original problem with accents in RSS feeds on the latest trunk. They are rendering properly in the RSS feed. Can I ask someone with the original problem (chouf1 or rogercoathup?) to test 2194.diff to ensure that it's solving the problem?

comment:12 chouf13 years ago

the same problem remains in 1.2.7....
i already give the solution here:
http://trac.buddypress.org/ticket/2680
marked as duplicate by DJPaul - and apparently not read !!!
sorich87 you give the same solution as I 3 mounth before.
But now things are OK on 2194.diff

Last edited 3 years ago by chouf1 (previous) (diff)

comment:13 cnorris233 years ago

I can't replicate this in trunk either, even under the French translation. @chouf1, the text you posted in you description looks like what happens when you change the MySQL database character set and/or collation of a table. Could this be the case? Also, using the BP 1.2.8 code as-is, do you have the same issue for new activity feed items, or is it just on old items?

comment:14 cnorris233 years ago

  • Keywords reporter-feedback added

comment:15 chouf13 years ago

When using 1.2.8 file as-is, the error is still on.
I have a absolutely blank activity RSS page after each upgrade of BP since 1.2.5. This has nothing to do with old items or any DB settings which i never worked on. My DB is utf8 since BP 0.9 !

the solution is to put this on line 1031:

$title =  strip_tags(html_entity_decode( $content[0],ENT_COMPAT,"UTF-8"  )) ;

and this for line 1071:

return apply_filters( 'bp_get_activity_feed_item_description',  html_entity_decode( str_replace( '%s', ' ', $content ), ENT_COMPAT,  'UTF-8' ) );
Last edited 3 years ago by chouf1 (previous) (diff)

comment:16 cnorris233 years ago

Thank you for looking into this. I can tell you're frustrated, but very few people can reproduce this, so we're trying to troubleshoot. As far as old items go, you didn't really answer the question. I asked if it was on old items only, or it was happening on new items too. I also wasn't very clear, which is my fault, on what I meant by using the BP 1.2.8 code as is. What I meant for you to do was to do something to create a new activity while using the raw, un-patched BP 1.2.8 code and see if you have the problem still. Just because you never touched DB settings doesn't mean something funky didn't occur. Obviously a change occurred in 1.2.5 that either a) uncovered a previously existing problem, or b) caused a problem. That's what we're all trying to figure out. That being said, I have one more thing to ask of you. Could you go into your database to see if you can find the activity item you used as reference when you posted this ticket? I would like you to check if you have "admin a téléchargé" in the database, or "admin a téléchargé."

comment:17 DJPaul3 years ago

In trunk and branch, I fixed bug #1445 where bbPress would install some of its tables with the latin1 charset in error (on new installs, not upgrades). Therefore, trying to recreate this on a new install of trunk or 1.2.8, *not* 1.2.7, may not trigger the error, if the problems are related and if the invalid RSS items are coming from the Forums tables. I think this bug should be tested on 1.2.7.

p.s. I believe that html_entity_decode does not work correctly UTF-8 for PHP versions < 5, it will throw "cannot yet handle MBCS in html_entity_decode" warnings; http://bugs.php.net/bug.php?id=25670. I think the trick is to wrap the output of html_entity_decode in utf8_encode. However, this needs confirming on a PHP4 installation.

comment:18 boonebgorges3 years ago

I just installed a fresh version of BP 1.2.7, and I was not able to reproduce the problem. Accented characters in the activity RSS feed are coming through just fine, as well as in the trunk. All my tables were created with encoding utf8_general_ci, fwiw.

Any more details on how to reproduce would be appreciated. chouf1, if you could check the content of the activity items in the database as cnorris23 suggests, that would help to narrow down why some of us are unable to reproduce on our setups.

comment:19 cnorris233 years ago

Ran across this comment:6:ticket:2038 while going through Trac. This removes a significant portion of doubt, in my mind, that this is as issue the data already in DB, not a BP issue. #2038 and this ticket are almost exactly 3 weeks apart.

comment:20 chouf13 years ago

I didn't find nothing like téléchargé in the activity table.
For your past questions:
1) i never changed the character set and/or collation of a table
2) in 1.2.5 and 1.2.6 i had the same problem. I modified the code and every thing went well for me untill the next release.
3) After 1.2.8 update, the RSS feed disapeared again.

Actually, bp-fr.net RSS feed is working with the code I give here: http://bp-fr.net/2010/10/reparer-votre-fil-rss/

comment:21 cnorris233 years ago

@chouf1 you're using the Group Documents plugin right?

comment:22 cnorris233 years ago

  • Resolution set to invalid
  • Status changed from accepted to closed

Per comment:21, that was a leading question. I actually know the answer. The issue here comes from the French Group Documents translation file. Therefore, this is not an issue with BuddyPress, and explains why no one was able to replicate the issue.

comment:23 johnjamesjacoby3 years ago

cnorris23 and chouf1 - Your diligence is much appreciated. Thanks for clearing this up.

comment:24 cnorris233 years ago

@johnjamesjacoby no worries. glad we can finally put this one to rest.

@chouf1 For your reference, since you translated the group documents plugin into French, in the po file, 'téléchargé' should actually appear as 'téléchargé', rather than 't&eacute;l&eacute;charg&eacute;'. I couldn't tell you why, but this is what's causing the problem. It's correct in two spots, and incorrect in three. One of which is causing your headache.

comment:25 chouf13 years ago

I don't understand what is happening.
I deactivated the translation of group document
I uploaded a brand new 1.2.8 bp-activity-tempatetags file

the feed is blank - even if the page source is avaible...

Then i changed the accent in the BP translation (use of normal accentuated letters instead of &eacute;)

Cleared the browser cache and call the feed.
Now i have this : http://bp-fr.net/activity/feed/

For me, this means at this time that accentuated letters in the DB are retruned correctly, even if bad interpreteted by the browser.

Note that the item is bad and the description is OK !

Last edited 3 years ago by chouf1 (previous) (diff)

comment:26 chouf13 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:27 chouf13 years ago

Sorry, but after the changes explained above, i have this in the feed:

Dan a répondu au sujet Blogs impossible à supprimer sur le forum Utilisation, configuration, optimisation de buddyPress
lundi 21 mars 2011 21:25

Dan a répondu au sujet Blogs impossible à supprimer sur le forum Utilisation, configuration, optimisation de buddyPress: 								

0 commentaires

To have a good title output i put this again in the activity-templatetags file:

$title =  strip_tags(html_entity_decode( $content[0],ENT_COMPAT,"UTF-8"  )) ;

I suggest to trim before insertion into the DB. To output the feed, trim is not necessary in my mind.

Also a question: why are title and description identic ? The title didn't need to mention x posted, and the description doesn't need to mention the title again....

Last edited 3 years ago by chouf1 (previous) (diff)

comment:28 cnorris233 years ago

Finally able to confirm this. Taking a cue from WP's creation of rss titles and descriptions, I present the following patch. The approach I've taken more or less mirrors how WP does things, and how things will be done after the switch to CPTs. This patch was created against the 1.2.x branch, because trunk activity feeds seem to be broken at the moment, but the patch should still apply cleanly.

@chouf1 you can ignore what I said in comment:24. You may continue to use entities or accentuated characters. However, be advised that the group documents translation still has some errors.

cnorris233 years ago

comment:29 cnorris233 years ago

  • Keywords needs-testing added; reporter-feedback removed

comment:30 boonebgorges3 years ago

cnorris23 - Thanks for the patch. Can I ask how you reproduced the problem?

comment:31 follow-up: chouf13 years ago

@cnorris23 thank you for your efforts.
I applied your patch and tested the feeds with a test text with many apostrophes
I also activate/deactivate the group doc translation file, this doesn't change nothing.

The site activity stream is correct
http://bp-fr.net/activity/feed/

The group activity feed is not correct
http://bp-fr.net/groups/groupe-de-tests/feed/

My groups activity feed is also incorrect
http://bp-fr.net/members/admin/activity/groups/feed/

This needs also adjustment i presume ;-)
i've also contacted peter anselmo, the goup doc plugin author.

Last edited 3 years ago by chouf1 (previous) (diff)

comment:32 in reply to: ↑ 31 ; follow-up: chouf13 years ago

Replying to chouf1:

@cnorris23 thank you for your efforts.
I applied your patch and tested the feeds with a test text with many apostrophes
I also activate/deactivate the group doc translation file, this doesn't change nothing.

The site activity stream is correct
http://bp-fr.net/activity/feed/

The group activity feed is not correct
http://bp-fr.net/groups/groupe-de-tests/feed/

My groups activity feed is also incorrect
http://bp-fr.net/members/admin/activity/groups/feed/

This needs also adjustment i presume ;-)
i've also contacted peter anselmo, the goup doc plugin author.

In addition, i noticed a subtle change: there are no more excerpts since, i guess, update of WP 3.1 In my case 2011/02/23

See here:
http://hub.tccd.edu/activity/feed/
and here:
http://bp-fr.net/groups/sites-francophones-utilisant-buddypress/feed/

comment:33 in reply to: ↑ 32 cnorris233 years ago

@chouf1 I'm not sure I understand what you're saying. Are you saying that the patch doesn't work? Also, "a subtle change" gives absolutely no information as to what the problem may be. Could you be more descriptive?

As to the group documents translation file, I'm not totally sure what you're trying to say, but there are a few spots in the file, which you will need to fix and resubmit to Mr. Anselmo, that don't contain accentuated characters or HTML entities. I think they were for error messages, so you may not have come across them, and may never, but they should be fixed. As an example, there are a few instances where téléchargé shows up instead of 't&eacute;l&eacute;charg&eacute;' or 'téléchargé'.

comment:34 chouf13 years ago

@cnorris23 - the patch is working great.

But you forgot to see the links ?
If you go to the url's you would see that titles even presents some errors. & # 39 for apostrophe for exemple...
These "subtle" details must be corrected in the other feed files, on the same maner as you change it in the already patched bp-activity-sitewide-feed.php, in this case the retrieve of CDATA.

I also changed the accent translation in the group document mo file. But i don't understand why do you speak about this file ? My activity table contains only 6 time the word téléchargé and these instances are all over 6 or 7 mounths, so they are actually no more in the upcoming activity feed.
And they don't never more, because i sanitazed manually the activity table before your patch was published.


comment:35 cnorris233 years ago

Okay patch updated. This time the patch was done against trunk, now that feeds are fixed. Hopefully, we've got this one now :)

cnorris233 years ago

comment:36 chouf13 years ago

Everything is OK for me now ! Thank you for this long work ;-)

comment:37 ultimateuser3 years ago

I am running the latest version of Wordpress and Buddypress. Ive installed all 3 patches (2194, 2194.001, 294.002)

However, the latests patch mentions changes "bp-activity/bp-activity-template.php". I do not have this file, I only have bp-activity-templatetags.php.

It still doesn't work for me?

What to do now? Is there a link to all changed files which I can upload (without having to change each file manually)?

comment:38 cnorris233 years ago

@chouf1 No worries. Glad we've finally got this working. Thanks for your persistence.

@ultimateuser 2194 and 2194.001 are incomplete patches and were written for the 1.2 branch. The milestone for a fix is 1.3, so 2194.002 was made against trunk. If you open the patch in your favorite code editor, and change "bp-activity/bp-activity-template.php" to "bp-activity/bp-activity-templatetags.php", you may be able to get all or most of the patch to apply. Past that, the only way would be to manually patch the files.

comment:39 ultimateuser3 years ago

@cnorris

What do you mean with:

and change "bp-activity/bp-activity-template.php" to "bp-activity/bp-activity-templatetags.php"

Im trying to understand but dont get it...

comment:40 cnorris233 years ago

However, the latests patch mentions changes "bp-activity/bp-activity-template.php". I do not have this file, I only have bp-activity-templatetags.php.

The line that references "bp-activity/bp-activity-template.php", change that to "bp-activity/bp-activity-templatetags.php".

comment:41 ultimateuser3 years ago

@cnorris

? - Sorry I am a beginner and dont fully understand your instructions.

Ok with "the line" you mean the line in which file? Or do you mean changing the file name?

How do I install a patch anyway?

It would be so much easier if I could download the correct files somewhere and replace them by the faulty ones...

comment:42 cnorris233 years ago

@ultimateuser

Sorry, I assumed you knew what I was talking about based on your original post. The short answer is, no there are no files at the moment, as a fix has not yet been committed. Even when it has been committed, you'll need to be using 1.3+ which hasn't been released. Unless you'd like to learn about using subversion, the only way to get this fix on the 1.2 branch is to change each file manually.

comment:43 DJPaul3 years ago

See also #3040

comment:44 cnorris233 years ago

DJPaul and boonebgorges,

I was reading through the IRC logs for the day, and wanted to fill you in on the question you two had about the CDATA tag being removed. When I was troubleshooting the ticket I kept finding that if an entity was used rather than an accented character, the actual entity would display. The expected behavior was that the browser would then convert the entity. I was testing in Firefox, so curiosity and the fact that I knew Firefox was sometimes weird in the way it cached feeds, I tried things in Safari. That's when I found CDATA was the issue. After scanning the RSS docs I found that while it's not explicitly stated that CDATA in a title tag is incorrect (feedvalidator.org doesn't flag it either), there didn't seem to be a place where it was stated that it could be expected. Conversely, docs for the description tag do state that CDATA may appear. It seems to be a gray area that the two browsers have handled differently. Since, strip tags is run on the feed titles, I chose to remove the CDATA from the title tags, which seems to clear up the problem.To note, WP doesn't use CDATA in the feed titles either. I only tested in Firefox 3.6 and Safari 5, but since both Chrome and Safari are Webkit based, I assume that Chrome never had the issue. Hope that explains it, and sorry for the length. It's the reason I didn't address it earlier ;)

comment:45 boonebgorges3 years ago

cnorris23 - You're right that WP doesn't use CDATA for these, which is a good enough argument for me.

comment:46 boonebgorges3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [4373]) Fixes text encoding in RSS titles for better character support. Fixes #2194. Props cnorris23

Note: See TracTickets for help on using tickets.