Skip to:
Content

BuddyPress.org

Opened 12 years ago

Closed 12 years ago

#4086 closed defect (bug) (fixed)

Friends, Groups & Co are handled as 404 & nothing gets indexed by search engines

Reported by: wpdennis's profile wpdennis Owned by:
Milestone: 1.6 Priority: normal
Severity: major Version:
Component: Core Keywords: dev-feedback has-patch
Cc:

Description

I've found a very critical bug in handling detail pages in buddypress. It will throw a 404 internally, is setting wrong canonical urls and results in a strange behaviour of functions like is_single().


Just have a look into the html source of:
http://buddypress.org/community/groups/installing-buddypress/home/
The canonical tag says "http://buddypress.org/community/groups/"

The result is: nothing gets indexed by search engines. The forum posts have wrong canonical tags, too. So, Google doesn't know any of the existing topics. Each one is redirecting to the group index:
https://www.google.de/#q=site:buddypress.org+inurl%3Aforum%2Ftopic
(This query would show a lot of posts in the forums, if the canonical wouldn't disallow it)

Further problems are, that functions like "is_single()" have unpredictable results.

Example: is_single() returns FALSE at
http://buddypress.org/community/groups/installing-buddypress/
BUT if you have an image uploaded in wordpress with the name "installing-buddypress", is_single() will return TRUE at
http://buddypress.org/community/groups/installing-buddypress/


A try to break it down

I implemented a quick backtrace if wp_query->is_single is set.

A call to http://buddypress.org/community/groups/installing-buddypress/
(just as an example) will result in:

#0  WP_Query->__set(is_single, ) called at [/wp/wp-includes/query.php:1290]
#1  WP_Query->init_query_flags() called at [/wp/wp-includes/query.php:1344]
#2  WP_Query->init() called at [/wp/wp-includes/query.php:2911]
#3  WP_Query->query(Array ([attachment] => installing-buddypress)) called at [/wp/wp-includes/class-wp.php:460]
#4  WP->query_posts() called at [/wp/wp-includes/class-wp.php:507]
#5  WP->main() called at [/wp/wp-includes/functions.php:1570]
#6  wp() called at [/wp/wp-blog-header.php:14]
#7  require(/wp/wp-blog-header.php) called at [/wp/index.php:17]

This is just the initialization and init_query_flags() will set is_single to FALSE.

The second call is because the query gets parsed. The group is handled as attachement. If an attachement with the same name exists, it will set is_single to TRUE. If not, a 404 is handled internally.

#0  WP_Query->__set(is_single, 1) called at [/wp/wp-includes/query.php:1466]
#1  WP_Query->parse_query() called at [/wp/wp-includes/query.php:1921]
#2  WP_Query->get_posts() called at [/wp/wp-includes/query.php:2913]
#3  WP_Query->query(Array ([attachment] => installing-buddypress)) called at [/wp/wp-includes/class-wp.php:460]
#4  WP->query_posts() called at [/wp/wp-includes/class-wp.php:507]
#5  WP->main() called at [/wp/wp-includes/functions.php:1570]
#6  wp() called at [/wp/wp-blog-header.php:14]
#7  require(/wp/wp-blog-header.php) called at [/wp/index.php:17]

With existing attachement:
parse_query() sets is_single to TRUE.

Without existing attachement:
parse_query() sets is_single to FALSE.

If an attachement exists, is_single isn't set again. If no attachement exists, wordpress handles it as an 404. In this case set(is_single) is called a third time:

#0  WP_Query->__set(is_single, ) called at [/wp/wp-includes/query.php:1290]
#1  WP_Query->init_query_flags() called at [/wp/wp-includes/query.php:1871]
#2  WP_Query->set_404() called at [/wp/wp-includes/class-wp.php:484]
#3  WP->handle_404() called at [/wp/wp-includes/class-wp.php:508]
#4  WP->main() called at [/wp/wp-includes/functions.php:1570]
#5  wp() called at [/wp/wp-blog-header.php:14]
#6  require(/wp/wp-blog-header.php) called at [/wp/index.php:17]

Because of the init_query_flags() is_single is now set to FALSE. So is_single() will return FALSE again.


There are multiple bad things we should investigate:

  1. The wrong canonical tags are a blocker. Nothing gets indexed by search engines. Because of a plugin (YOAST SEO) my canonical tags gets stripped out in the most cases - except on groups where an attachement with the same name exists.
  1. Is it avoidable to throw a 404 internally?
  1. is_single() should return the same value for each group (I rely on it for a breadcrumb menu, for example). This would be the case, if we can avoid the 404, I think.

Important: This isn't group specific, so I set "Core" as Component for this ticket. Members have the exact same issues. Every profile throws an 404 and has a wrong canonical url.

To check this, just add

global $wp_query;
print_r($wp_query->query_vars);

to the theme template file for member or group home. You'll find the error=>404 except you have an image with the same name. Then it'll contain attachement=>groupOrMemberName.

I would love to hear some oppinions from more experienced bp-developers, before I digg in the wrong direction.

Attachments (1)

4086.01.diff (1.1 KB) - added by boonebgorges 12 years ago.

Download all attachments as: .zip

Change History (5)

#1 @boonebgorges
12 years ago

  • Keywords reporter-feedback added
  • Severity changed from blocker to major

I've moved the rel="canonical" issue into a separate ticket. See #4087.

I can't reproduce the 404 issue. BP content, such as members and groups, is displayed using WP pages. On all of my installations, this is working correctly - dumping $wp_query->query_vars shows no error, and correctly queries pagename for the current component. It's possible that you have a strange configuration that is causing the problem, something that may be caused by another plugin or by an issue with BP itself. I'm going to do some more investigation, but wpdennis, any additional information you can provide about your setup (other plugins installed, pretty permalink settings, WP/BP versions) would be quite helpful.

I'm marking down the priority as this is clearly not an issue that is affecting all installations, or it would be impossible to use any BuddyPress site.

#2 @wpdennis
12 years ago

Seperating the canonical issue is a good idea, thank you.

I've disabled all relevant plugins in my test case. But to be sure, I just installed a new BuddyPress test site. Every instance is running with WordPress 3.3.1 and BuddyPress 1.5.4.

In the new test environment I have no additional plugins installed and nothing activated but BuddyPress itself. I'm using the default theme, too.

I print out $wp_query->query_vars in a new group and it says [attachment] => test :
http://bptest.dennispietsch.de/groups/test/

On this url it says [error] => 404 :
http://bptest.dennispietsch.de/groups/test/home/

I found the reason why your query_vars are looking good, too: the permalink config. I'm using "/%postname%/" in every installation. If I set it to anything else, I get the results you described. Thanks for pointing me into that direction!

"or it would be impossible to use any BuddyPress site."

I have a productive installation with that 404/attachment behavior. If I wouldn't rely on is_single() in my breadcrumb menu AND have a picture with the same name as one of the groups, I wouldn't have discovered this issue. So, it is very likely to run a BuddyPress site without recognizing this behavior.

But the real "blocker" is the canonical issue. If we seperate the tickets, the issue described in this ticket isn't a real "blocker" anymore.

#3 @boonebgorges
12 years ago

  • Keywords has-patch added; reporter-feedback removed
  • Milestone changed from Awaiting Review to 1.6

Confirmed. Here's an outline of what's happening.

When you set permalinks to /%postname%/, a flag called use_verbose_page_rules is set in $wp_rewrite. The purpose of $wp_rewrite->use_verbose_page_rules is to verify the existence of a requested page before returning it as a match in WP::parse_request(). WP uses the function get_page_by_path() to do this verification. get_page_by_path(), in turn, uses the following logic:

1) bust the query request up into chunks (so that /members/admin/ becomes 'members' and 'admin')
2) do a direct query for pages that match one of the chunks
3) cycle through the results to see if the page_name matches the last part of the request, as in cases where a URL is http://example.com/boone/is/very/cool/ and the post_name is cool (with very as post_parent, is as post_parent of very, etc - the standard WP page setup)

In the case of BP pages, steps (1) and (2) go just fine - WP successfully finds our page called members - but it fails at step (3). That's because members is not the *last* chunk of the request, which in turn is because we are constructing our URLs in a non-standard way. As a result, the function returns false, and WP::parse_request() fails to locate a match, which is the source of the 404. (BTW, having an attachment or a manually created page with post_name = 'admin' will trick get_page_by_path() into returning true, which explains wpdennis's perplexing attachment report in the OP.)

There are a couple of things we could do to fix this.

a) Manually set $wp_rewrite->use_verbose_page_rules = false in our catchuri routine, after we have done our own checks to make sure that we are, indeed, viewing a BP page. (See 4086.01.diff.) This should be safe, since by this point, we have already decided conclusively whether or not we'll be displaying BP content; if not, we return (if ( empty( $matches ) ) return false). The downside is that it seems a bit hackish.
b) Pass a patch upstream to WP that filters the return value of get_page_by_path(). We can filter it, check to see whether we are viewing a BP page (we'll already have this information in the $bp global, because of the existing catchuri routine), and then return a relevant value. This solution has the advantage of actually addressing the underlying cause of the discrepancy (namely, that we match page URIs to pages in the DB in a different way than WP does). However, we'll have to convince the WP team that it's a good idea (we are, after all, "doing it wrong"), and it will add about 20 lines of code to our codebase where (a) will only add one.

Thoughts?

Last edited 12 years ago by boonebgorges (previous) (diff)

#4 @boonebgorges
12 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [5970]) During the catchuri process, manually set wp_rewrite->use_verbose_page_rules to false after we have verified that we are viewing a BuddyPress page.

Once BP has done its own matching against bp_pages, there is no need to allow WP to do a check against get_page_by_path() to ensure that the requested page exists - particularly when WP's check will fail, due to BP's unorthodox URI structure. Thus this changeset will fix the problem of WP_Query returning a 404 status when permalinks are set to /%postname%/.
Fixes #4086

Note: See TracTickets for help on using tickets.