Skip to:
Content

BuddyPress.org

Opened 12 years ago

Last modified 18 months ago

#4831 new defect (bug)

BP bases site data/activity entries on WP search engine instructions

Reported by: hnla's profile hnla Owned by:
Milestone: Awaiting Contributions Priority: normal
Severity: minor Version:
Component: Blogs Keywords:
Cc: hnla

Description

If the WP dashboard option under settings > reading is set to checked to discourage search engines from indexing the site or page via robots.txt / meta tags Buddypress uses the option value to determine whether it shows or records activity stream data.

The contention is this behaviour is wrong!

If I have a site, essentially a private community; it has sub blogs I have checked the box as I do not want Google indexing however I do want those sites recording their activity. Presently this can't happen unless I uncheck the indexing option and let search engines index.

BP appears to take the WP option 'blog_public' to be literal even though as far as I can see in /wp-includes/functions.php L:1077 & /wp-includes/general-template.php L:1709 WP uses the option value to set robots.txt & meta tags respectively.

In bp-blogs-functions.php l:141 we check for value of $_POST['blog_public] to determine whether blog activity is recorded to activity stream.

My argument is that using this option or post value to decide what to do is wrong as it can't be determined that the user has set the index option for the right or wrong reasons and regardless the option is simply there to set an instruction to search engines that has no formal power, it doesn't have to be honoured. BP is thus saying as this option has a value we are going to assume that the intention is the user doesn't want their sub site activity recorded across the primary BP site, but that feels a bit of a assumption.

I can understand the argument that indeed users may want to set their blog as 'Private' and that this is the only means possible that BP has to look for some clue as to the users desire but it's not a good one given this 'blog_public' option really has little to do with privacy in the strict sense.

I also realise there is likely little that can be done to effect a change as an option would need to be provided from individual blogs and also likely would really need to be a new option altogether.

However I first ran across this issue reported by a user in respect of iirc forum (bbp) posts not displaying in activity stream and on a non MS install.

Change History (13)

#1 @johnjamesjacoby
12 years ago

  • Milestone changed from Awaiting Review to Future Release

The reason we use this is it's the only available WordPress setting that comes close to being what we want. If we're going to introduce the concept of blog-visibility, I'd like to have real access control instead of just a piece of meta-data.

Moving to Future Release.

#2 @hnla
12 years ago

Realised that this was the reason it was used as such, and agreed it would need to be real access control not this piece of meta data which is what worried me, meta data of this form and purpose isn't suitable as it's not defined as privacy, Just wanted to get the 'issue' logged as this is the second time it's come to my attention as mildly problematical under different circumstances - easy to work around though but nonetheless...

#3 @hnla
12 years ago

Further thoughts:
I'm going to argue that the principle of hiding this activity data based on this meta option ought to be removed. If it's mooted that it is not the best approach why is it being used, why indeed is there the notion of privacy attempted when it's based on a setting that I would hazard a guess few site owner/users are aware of, at least as a setting that provides this activity behaviour. Do users expect to be able to hide blogs? would they if it were a plain vanilla WP/MS install? This is a perceived requirement that BP introduces yet we don't explicitly offer the option, we don't say "check this box it will remove blog activity from the stream"

#4 @boonebgorges
12 years ago

  • Component changed from Core to Blogs

If it's mooted that it is not the best approach why is it being used, why indeed is there the notion of privacy attempted when it's based on a setting that I would hazard a guess few site owner/users are aware of

I've always understood the logic thus: If your blog posts are sent to the network activity feed, they will be indexed by Google (assuming that your BP_ROOT_BLOG is open to crawlers). So, if you have actively marked your blog as no-robots, then it follows that you wouldn't want your content crawled on the activity stream either.

I agree that it's perhaps not the ideal setup, but simply reverting it could be even worse. Right now, the worst that happens is that some activity doesn't get recorded - annoying. If we ignored the blog_public setting, on the other hand, the worst that could happen is that people would unwittingly have what they thought were "private" blog entries showing up in search engines. This, IMO, is more than annoying: it's a violation.

we don't explicitly offer the option, we don't say "check this box it will remove blog activity from the stream"

At a very minimum, we should filter the text on Settings > Privacy (or supercede it - I can't remember how filterable it is) to say "Please note that this setting will prevent your blog posts from appearing in the sitewide activity stream." Or perhaps we could add a whole other section to the Settings > Privacy screen, along the lines of:

Activity Settings

[x] I would like posts and comments from my blog to appear in the sitewide activity stream
[x] I would not like....

#5 @hnla
12 years ago

  • Component changed from Blogs to Core

I've always understood the logic thus: If your blog posts are sent to the network activity feed, they will be indexed by Google (assuming that your BP_ROOT_BLOG is open to crawlers). So, if you have actively marked your blog as no-robots, then it follows that you wouldn't want your content crawled on the activity stream either.

My problem here would be "your blog " it seems from the bp function that we aren't actually checking the blog options value so can't differentiate between blogs, we simply place a blanket dictate that all sub blogs won't be able to feed into the root blog activity feed, perhaps one blog does want to yet another not?

<snip>If we ignored the blog_public setting, on the other hand, the worst that could happen is that people would unwittingly have what they thought were "private" blog entries showing up in search engines

The danger here is two fold 'blog_public' might suggest something other than what is really is, a instruction that is not mandatory to search engines, and having people even for a minute assume that what was happening here was setting a privacy level is in itself dangerous - nothing about this setting has anything, truly, to do with privacy, any user assuming it has is technically being misled; even if BP prevent the activity from being recorded that does not actually prevent search engines accessing that content from some other avenue and it will be a case that somewhere that content will have been indexed unless one fancies gathering a list of the bad bots and adding them to ones htaccess file. However I realise that in some manner if the blog_public setting is false then somewhere somehow we do need to pay heed to that setting.

At a very minimum, we should filter the text on Settings > Privacy (or supercede it - I can't remember how filterable it is)

It does appear filterable or at least a hook is provided 'blog_privacy_selector' which oddly? changes the nature of the settings block if hooked into - not sure why but does let one pass some additional text:

function bp_blogs_activity_feed() {
 _e('Enabling the setting \'Discourage search engines from indexing this site\' will remove all MS blogs content from the BP activity feed', 'buddypress');
}
add_action('blog_privacy_selector', 'bp_blogs_activity_feed');

That would be best run from bp-blog-functions I guess and based on what state the blog is set as e.g. flip the message 'site visibility is set to discourage, BP is not adding blogs to activity feed' / 'enabling this setting will remove blogs from activity feed.

Looking further at the functions in bp-blogs-functions.php I do see functions that take a param to set a blog as 'not tracked' but feel the $blog_id ought to be checking blog_option($blog_id, 'blog_public') then it's a viable check for individual blog settings options.

Last edited 12 years ago by hnla (previous) (diff)

#6 @boonebgorges
12 years ago

it seems from the bp function that we aren't actually checking the blog options value so can't differentiate between blogs, we simply place a blanket dictate that all sub blogs won't be able to feed into the root blog activity feed

Where do you get this idea? https://buddypress.trac.wordpress.org/browser/tags/1.6.4/bp-blogs/bp-blogs-functions.php#L174 This is a blog-specific check, and has nothing to do with the BP root blog.

#7 @hnla
12 years ago

I'm referencing trunk and there are 1.7 additions that cause your line numberings to be off.

Originally I focussed on this:

$is_private = !empty( $_POST['blog_public'] ) && (int) $_POST['blog_public'] ? false : true;

As the test for whether to record activity.

But have realised there's a need to fully grasp all of that file & it's functions.

#8 @boonebgorges
12 years ago

If you're talking about https://buddypress.trac.wordpress.org/browser/trunk/bp-blogs/bp-blogs-functions.php#L166, then the purpose of that line is to check whether the *creation of a blog* ("Hugo created a new blog....") should be recorded in the activity stream. The blog_public field is relative to the newly-created blog, *not* the BP root blog.

Blog posts are handled further down, in bp_blogs_record_post().

#9 @hnla
12 years ago

Got ya, like I said needed to read all of the functions in more detail. ignore the latter wafflings as they are a digression. Issue is as originally outlined albeit not easily dealt with, however the settings are hookable so would be possible to at least add a little extra text that may help users understand what the extended consequences of search engine visibility are or even through the do_action add a BP setting for specifically setting a activity visible option.

#10 @boonebgorges
7 years ago

  • Component changed from Core to Blogs
  • Milestone changed from Awaiting Contributions to Up Next

Going to bring this one back to life.

It's probably fine to use blog_public to determine the *default* behavior. See #1226, #6274. However, it's not currently possible to filter this in a fine-grained way. I'd suggest a few layers of fixes:

  1. All instances of the bp_is_blog_public filter should receive the blog ID as a parameter. Things get tricky with switch_to_blog() in multisite, and it'd be much easier to reason about what's happening if we had the blog ID to work with. This might be best if we moved it to a centralized bp_is_blog_public() function.
  1. We should un-hardcode the 0 === (int) checks. A number of plugins extend the blog_public setting by adding additional potential values. A more sensible default check might be 1 !== (int).
  1. That said, I think we can go further, by abstracting the specific blog_public checks in such a way that they can be filtered in a targeted way. For example, instead of the hardcoded check here https://buddypress.trac.wordpress.org/browser/tags/2.9.3/src/bp-blogs/classes/class-bp-blogs-component.php?marks=98#L86, perhaps we'd have something like this:
if ( apply_filters( 'bp_blogs_record_activity_for_site', bp_is_blog_public( $blog_id ), $blog_id ) ) {

This way, you could filter bp_is_blog_public, or you could do finer-grained filtering (as when you *want* activity items to be recorded, but you want them to be recognized as non-public when determining hide_sitewide).

Any thoughts or objections to this proposal?

#11 @DJPaul
7 years ago

Improvements along the lines you've suggested seem good.

I think what could be worth considering is adding an explicit site option to control this. I'd like to be able to pick between "use SEO setting", "always", "always if blog post is not private", or "never". etc.

#12 @boonebgorges
7 years ago

That is a good thought, @DJPaul. The kinds of code-level abstraction I've described would be the first step toward this kind of site option. I'll try to work on it soon - it would be a natural fit for a GDPR-focused release.

#13 @DJPaul
6 years ago

  • Milestone changed from Up Next to Awaiting Contributions
Note: See TracTickets for help on using tickets.