Skip to:
Content

Opened 2 years ago

Last modified 7 months ago

#6789 new enhancement

XProfile: do not store serialized arrays for multi-value profile field data

Reported by: Offereins Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Extended Profile Keywords:
Cc: slaFFik, dcavins, espellcaste@…, lmoffereins@…

Description

At the moment, data from multi-value profile fields (selectboxes/multiselect) are stored as serialized arrays in $bp->profile->table_name_data. This prohibits querying users by this data using BP_Xprofile_Query, which is obviously something BP needs to continue to grow. So let us do something about it (with love)!

At the moment, backpat is the greatest concern, but that should not stop us. Please join the discussion on how to make this happen.

Attachments (7)

6789.01.diff (2.2 KB) - added by Offereins 7 months ago.
profile type supports_multiple_values class property
6789.02.diff (3.5 KB) - added by Offereins 7 months ago.
.01 + bp_xprofile_field_supports_multiple_values
6789.docs.01.patch (1009 bytes) - added by Offereins 7 months ago.
Docfix for xprofile_sanitize_data_value_before_save
6789.docs.02.patch (2.2 KB) - added by Offereins 7 months ago.
Filter docs for BP_XProfile_ProfileData::save
6789.03.patch (18.4 KB) - added by Offereins 7 months ago.
.01 + .02 + first pass at multiple row data
6789.04.patch (22.5 KB) - added by Offereins 7 months ago.
.03 + improvements
6789.05.patch (22.6 KB) - added by Offereins 7 months ago.
.04 + improvements

Download all attachments as: .zip

Change History (23)

#1 @slaFFik
2 years ago

  • Cc slaFFik added

#2 @dcavins
2 years ago

  • Cc dcavins added

#3 @espellcaste
2 years ago

  • Cc espellcaste@… added

#4 @boonebgorges
2 years ago

  • Milestone changed from Awaiting Review to Future Release

First, <3. It would be great to do away with this kind of serialized data.

Second, some initial thoughts.

  • The most obvious way to store the data is to split individual entries into their own rows in wp_bp_xprofile_data.
  • But this will mean that user_id+field_id will no longer be a de facto unique identifier for xprofile_data. That is, a given user may have more than one row in xprofile_data corresponding to a given field.
  • Various parts of our query API expect to pull only a single row from the table. Eg, BP_XProfile_ProfileData::populate(), which does $wpdb->get_row(). In this case we'll probably need to do get_results() and then figure out how far up the stack we need to add support for arrays. Another example is BP_XProfile_ProfileData::get_data_for_user(), which keys data by field_id. Here, we'll either need to do the array conversion right in the query method (so the data structure can remain the same in, eg, BP_XProfile_Group::get(), or we'll have to change the way data is keyed and then make necessary modifications to consuming methods.
  • In all of these cases, we're going to have to weigh backward compatibility against design. Returning arrays from functions that didn't previous return arrays is almost certainly going to break something somewhere.
  • We should do a scan for plugins that interact directly with the xprofile_data table, especially those that expect serialized data. If we find that lots of folks are doing this, we may decide to provide legacy support for it. We did something similar with user last_activity data (you can continue to access/modify the data in usermeta, though BP throws a bunch of _doing_it_wrong() notices). Ideally, we would simply drop support for this kind of direct access; but this will take some good documentation and publicity.
  • The migration/update routine is going to be pretty large, and will probably need to be broken into chunks. We can do some background migration (eg, fire off a cron job during update that migrates data in batches), but since this is asynchronous, we'd need to maintain backward compatibility for the old data format, at least for a version or two.

Would love to hear more thoughts on any of the above, or anything I am overlooking :)

#5 @Offereins
2 years ago

  • Cc lmoffereins@… added

I have nothing to add at the moment. I'm not that much invested in the current way of how it works, so perhaps the other guys can weigh in. Maybe we should start with a proper set of tests having in place before we rock it - though I'm not really into that game.

This ticket was mentioned in Slack in #buddypress by offereins. View the logs.


18 months ago

#7 @thomaslhotta
18 months ago

It's possible to do this using "LIKE". This probably does not offer the best performance, but it also doesn't require any of the changes listed above.

Right now this only works with "OR" relations if one wants to search for multiple values. When using "AND" every value causes a new JOIN, what really kills performance. The reason for this is found in BP_XProfile_Query on line 519:

 } elseif ( isset( $sibling['field_id'] ) && isset( $clause['field_id'] ) && $sibling['field_id'] === $clause['field_id'] ) {
     $compatible_compares = array( '!=', 'NOT IN', 'NOT LIKE' );
 }

Adding "LIKE" her allows "AND" queries to work fine.

This ticket was mentioned in Slack in #buddypress by offereins. View the logs.


15 months ago

This ticket was mentioned in Slack in #buddypress by offereins. View the logs.


14 months ago

@Offereins
7 months ago

profile type supports_multiple_values class property

#10 @Offereins
7 months ago

So, to get a start somewhere, I decided to first identify whether fields (and their types) would accept multiple values. This is to find a way to still support fields that want to save their values as serialized arrays versus fields that would have the data as separate value rows.

By default this is the case for the Multiselect and Checkbox field types. While they already are uniquely identified by their supports_multiple_defaults class property, that prop actually signals a different function than saving multiple values. That is why I introduce the supports_multiple_values property/flag - also just for these types.

This ticket was mentioned in Slack in #buddypress by dcavins. View the logs.


7 months ago

#12 @Offereins
7 months ago

Just stumbled upon the (hypothetical) issue of xprofile meta, which is equally possible for data(!) as for fields and groups. The unique id of a profile data object currently equals the data row's id. This apparently gets messed up when multiple rows of data exist for a single data object. In case of meta, this triggers the question of which id to use to store meta for the data object.

For now I'm suggesting to use the last row's item id. For storing profile data meta, this is probably not so much a problem. Querying profile data by meta is a different story perhaps, as you get a single data row where multiple may apply for the field - though it's probably advised to use the default API functions anyway. You know, filtering.

@Offereins
7 months ago

.01 + bp_xprofile_field_supports_multiple_values

#13 @Offereins
7 months ago

Latest patch adds a function to check the field's support for multiple values. It follows the same logic as bp_xprofile_is_richtext_enabled_for_field().

@Offereins
7 months ago

Docfix for xprofile_sanitize_data_value_before_save

@Offereins
7 months ago

Filter docs for BP_XProfile_ProfileData::save

#14 @Offereins
7 months ago

In the process of working on this, I stumbled on a few errors/shortcomings in the XProfile ProfileData inline documentation. Two patches are added to cover this.

@Offereins
7 months ago

.01 + .02 + first pass at multiple row data

#15 @Offereins
7 months ago

Some notes on the WIP changes presented in the latest patch.

In BP_XProfile_ProfileData

  • populate() - multiple values are returned as an array per field.
  • save() - multiple values are stored as multiple rows. The previous rows are deleted through calling $this->delete(). The new id of the data object is the one from the first created data row.
  • get_data_for_user() - multiple values are returned as an array per field.
  • get_fielddataid_byid() - no changes, but the main data id is returned in case of multiple-value fields.
  • get_value_by_id() - multiple values are returned as an array per field.
  • get_value_byfieldname() - multiple values are returned as an array per field.
  • get_random() - a subquery is used to find the random field id, then all related data is fetched.

In xprofile functions

  • xprofile_get_field_data() - multiple values are intercepted through bp_xprofile_field_supports_multiple_values(), but further logic stands as before.
  • xprofile_set_field_data() - multiple values are sent in as an array of serialized values (in case of multiple array values).
  • bp_unserialize_profile_field() - now also parses arrays into comma-separated lists.

Further checking on template functions parsing field data is TBD.

Considerations

Since multiple values previously were stored and returned as serialized arrays, should we now return non-serialized row data also as a (re)serialized array - to keep backpat in order? Then data would be stored in the new fashion, but returned as is done previously. Any ideas about that?

@Offereins
7 months ago

.03 + improvements

@Offereins
7 months ago

.04 + improvements

#16 @Offereins
7 months ago

Reiterating on this, the latest uploaded patch (.05) contains the following fixes to .03:

In BP_XProfile_ProfileData

Empty lists for multiple-values are stored as a serialized empty array. This is to signal that the field is deliberately empty, ignoring the need to suggest defaults in the editing interface. The current patch suggests to continue to do so. The logic is implemented for the following methods comparing array() === maybe_unserialize( $value ):

  • populate()
  • save()
  • get_data_for_user()
  • get_value_by_id()
  • get_value_byfieldname()
  • get_random()

In xprofile functions

  • xprofile_sanitize_data_value_before_save() - don't bail the filtering when the field value is actually valid: 0 in case of a number field. See similar checks in xprofile_set_field_data().
Note: See TracTickets for help on using tickets.