Opened 9 years ago
Last modified 7 years ago
#6789 new enhancement
XProfile: do not store serialized arrays for multi-value profile field data
Reported by: | Offereins | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Contributions | Priority: | normal |
Severity: | normal | Version: | |
Component: | Extended Profile | Keywords: | |
Cc: | slaFFik, dcavins, espellcaste@…, lmoffereins@… |
Description
At the moment, data from multi-value profile fields (selectboxes/multiselect) are stored as serialized arrays in $bp->profile->table_name_data
. This prohibits querying users by this data using BP_Xprofile_Query
, which is obviously something BP needs to continue to grow. So let us do something about it (with love)!
At the moment, backpat is the greatest concern, but that should not stop us. Please join the discussion on how to make this happen.
Attachments (9)
Change History (30)
#5
@
9 years ago
- Cc lmoffereins@… added
I have nothing to add at the moment. I'm not that much invested in the current way of how it works, so perhaps the other guys can weigh in. Maybe we should start with a proper set of tests having in place before we rock it - though I'm not really into that game.
This ticket was mentioned in Slack in #buddypress by offereins. View the logs.
8 years ago
#7
@
8 years ago
It's possible to do this using "LIKE". This probably does not offer the best performance, but it also doesn't require any of the changes listed above.
Right now this only works with "OR" relations if one wants to search for multiple values. When using "AND" every value causes a new JOIN, what really kills performance. The reason for this is found in BP_XProfile_Query on line 519:
} elseif ( isset( $sibling['field_id'] ) && isset( $clause['field_id'] ) && $sibling['field_id'] === $clause['field_id'] ) { $compatible_compares = array( '!=', 'NOT IN', 'NOT LIKE' ); }
Adding "LIKE" her allows "AND" queries to work fine.
This ticket was mentioned in Slack in #buddypress by offereins. View the logs.
8 years ago
This ticket was mentioned in Slack in #buddypress by offereins. View the logs.
8 years ago
#10
@
7 years ago
So, to get a start somewhere, I decided to first identify whether fields (and their types) would accept multiple values. This is to find a way to still support fields that want to save their values as serialized arrays versus fields that would have the data as separate value rows.
By default this is the case for the Multiselect
and Checkbox
field types. While they already are uniquely identified by their supports_multiple_defaults
class property, that prop actually signals a different function than saving multiple values. That is why I introduce the supports_multiple_values
property/flag - also just for these types.
This ticket was mentioned in Slack in #buddypress by dcavins. View the logs.
7 years ago
#12
@
7 years ago
Just stumbled upon the (hypothetical) issue of xprofile meta, which is equally possible for data(!) as for fields and groups. The unique id
of a profile data object currently equals the data row's id
. This apparently gets messed up when multiple rows of data exist for a single data object. In case of meta, this triggers the question of which id to use to store meta for the data object.
For now I'm suggesting to use the last row's item id. For storing profile data meta, this is probably not so much a problem. Querying profile data by meta is a different story perhaps, as you get a single data row where multiple may apply for the field - though it's probably advised to use the default API functions anyway. You know, filtering.
#13
@
7 years ago
Latest patch adds a function to check the field's support for multiple values. It follows the same logic as bp_xprofile_is_richtext_enabled_for_field()
.
#14
@
7 years ago
In the process of working on this, I stumbled on a few errors/shortcomings in the XProfile ProfileData inline documentation. Two patches are added to cover this.
#15
@
7 years ago
Some notes on the WIP changes presented in the latest patch.
In BP_XProfile_ProfileData
populate()
- multiple values are returned as an array per field.save()
- multiple values are stored as multiple rows. The previous rows are deleted through calling$this->delete()
. The new id of the data object is the one from the first created data row.get_data_for_user()
- multiple values are returned as an array per field.get_fielddataid_byid()
- no changes, but the main data id is returned in case of multiple-value fields.get_value_by_id()
- multiple values are returned as an array per field.get_value_byfieldname()
- multiple values are returned as an array per field.get_random()
- a subquery is used to find the random field id, then all related data is fetched.
In xprofile functions
xprofile_get_field_data()
- multiple values are intercepted throughbp_xprofile_field_supports_multiple_values()
, but further logic stands as before.xprofile_set_field_data()
- multiple values are sent in as an array of serialized values (in case of multiple array values).bp_unserialize_profile_field()
- now also parses arrays into comma-separated lists.
Further checking on template functions parsing field data is TBD.
Considerations
Since multiple values previously were stored and returned as serialized arrays, should we now return non-serialized row data also as a (re)serialized array - to keep backpat in order? Then data would be stored in the new fashion, but returned as is done previously. Any ideas about that?
#16
@
7 years ago
Reiterating on this, the latest uploaded patch (.05) contains the following fixes to .03:
In BP_XProfile_ProfileData
Empty lists for multiple-values are stored as a serialized empty array. This is to signal that the field is deliberately empty, ignoring the need to suggest defaults in the editing interface. The current patch suggests to continue to do so. The logic is implemented for the following methods comparing array() === maybe_unserialize( $value )
:
populate()
save()
get_data_for_user()
get_value_by_id()
get_value_byfieldname()
get_random()
In xprofile functions
xprofile_sanitize_data_value_before_save()
- don't bail the filtering when the field value is actually valid: 0 in case of a number field. See similar checks inxprofile_set_field_data()
.
This ticket was mentioned in Slack in #buddypress by offereins. View the logs.
7 years ago
#18
@
7 years ago
@Offereins Thanks for your work on this. It is exciting to see it taking shape. Overall, the decisions you've made seem smart to me. The handling of empty/falsey fields tested well for me, and the supports_multiple_values
flag feels correct.
I'm attaching a new patch 6789.06.patch that includes updates to documentation, as well as some changes to work after recent changes to the treatment of serialized data throughout BP. See [11692].
I have some random feedback. In no particular order.
- The serialization dances are complicated, perhaps too complicated. For example: Data coming from multiple-value fields is being passed through
array_map( 'maybe_unserialize' )
in multiple places. In what situation is it possible for a multiselect or checkbox field to store serializable data for its individual options? Certainly it's not possible in BP itself, and I wonder if it's even possible to do it now with a custom field type. Likewise: if a field does *not* support multiple values, what is the value in running it throughmaybe_unserialize()
? Have you seen examples of custom field types that are storing structured data in the profile_data table in this way? (Related: we will need to perform a review to ensure thatmaybe_unserialize()
calls are always symmetrical; for security reasons, we should never run any user-provided data throughmaybe_unserialize()
that didn't go throughmaybe_serialize()
on the way in.) - You've added some missing filter documentation in
BP_XProfile_Data::save()
. It's unrelated to the ticket at hand, so I'd go ahead and commit it separately. No need to wait for this ticket. - When saving multiple values in
BP_XProfile_Data::save()
, you call thedelete()
method. This will trigger thexprofile_data_before_delete
hook, which feels incorrect to me. As much as I like the idea of not duplicating code, I think we should make a direct DB query here (or break the existingDELETE
query into its own hookless method, which would be called from both places). - You suggested earlier to "use the last row's item id" for storing xprofile meta, but you're returning the *first* row's data ID from the
save()
method. There's no technical issue here, especially since your patch doesn't actually use xprofile data meta at all (nor does BP itself? though it would perhaps be nice to move 'visibility' here; see #6350). It might be wise to standardize here. The docs for thesave()
method should probably indicate this fact about return values as well.
Since multiple values previously were stored and returned as serialized arrays, should we now return non-serialized row data also as a (re)serialized array - to keep backpat in order? Then data would be stored in the new fashion, but returned as is done previously. Any ideas about that?
Ugh - I assume here you mean the data
property of BP_XProfile_ProfileData
objects, because xprofile_get_field_data()
and other places do the necessary unserialization. I guess the answer is yes, we should reserialize so as to maintain backward compatibility with this specific field. But it suggests to me that perhaps we could set the data
property with the serialized value, but then stop using the property internally. Instead, introduce something like a get_data()
method, which would return structured data, and then use get_data()
throughout BP. What do you think of an idea like that?
#19
@
7 years ago
@boonebgorges Thanks for taking a look at this. I'm hoping we can work this into a solid solution for the issue of serialized data.
In reply to your random feedback, the following.
- I find it plausible that a custom profile field would store its data as a serialized array. One of such examples would be an address field that comes with a preformatted layout and stores data for street, city, postal code, etcetera. I'm unsure how such a field would store its data otherwise within the current table structure of BuddyPress - or is this a potential case for field data metdata? A second example would be a column type field which allows admins to define their own subfields. I'm basically talking about nested fields. Following from this, one can imagine a multi-type field which supports listing multiple nested fields of the aforementioned types. For this case I'm mainly thinking about a variation on my
BP XProfile Relationship Field
plugin, which would enable you to make a listed version of a single-type field. Whether it is plausible enough to support this in core is up for debate. I might be biased. - Handled in r11866.
- Agreed. I'm leaning towards making a hookless method for the delete logic.
- That should be fixed. Do you prefer using the first value's key or the second? I haven't applied field data metadata before, but we should consider it since it is technically supported.
Concerning your last point, I'm not quite sure I understand your meaning of an optional get_data()
method. Would it serve as a default data getter where the data would always be unserialized/sanitized if applicable - so we can remove the unserializing from any logic outside of the profiledata class?
#20
@
7 years ago
I've attached a new patch, with the following changes:
- Updated after [11866].
- Adds a unit test for the behavior of caching multi-value fields in
get_data_for_user()
- the test helped me understand what you did, so let's throw it in there - Adds a fix for a bug that came up during this test: I've removed the
$this->exists()
check from theif
block in thesave()
method (see line 260 of 6789.06.patch). When setting the data viaxprofile_set_field_data()
, the multi-value logic in that block was not kicking in because the generated user in the test didn't yet have the data set (ie,$this->exists()
failed). It seems to me that the only important part of theexists
check is to delete the existing value, so I moved it down a couple of lines. This could use a review from you, though, as you may understand all these if/else statements better than I do.
Some responses to your comments
I find it plausible that a custom profile field would store its data as a serialized array.
Gotcha. I think this seems fine, especially if you yourself have done something close to this. I just wanted to hear someone else confirm that we weren't overengineering :)
Do you prefer using the first value's key or the second?
Standardizing on the first seems to make sense to me, though it's mostly arbitrary. One very small argument: if you ever did a $wpdb->get_row( "SELECT * FROM ..." )
query to fetch this data, you'd get the row with the lowest id
.
Concerning your last point, I'm not quite sure I understand your meaning of an optional get_data() method.
The way I see it, it was a mistake that serialized data was ever available. I understand why serialization was/is important for *storage*, but that should've been totally invisible to external functions and/or other classes. However, as a matter of historical fact, the $value
property (I said $data
above but I meant $value
, sorry!!) might contain serialized data. So I think we need to keep this behavior for backward compatibility. At the same time, this would be a good time for us to compartmentalize serialization in our standard treatment. So, something like this:
// In `BP_XProfile_Data::populate()`:
...
$raw_value = is_array( $profiledata->value ) ? array_map( 'stripslashes', $profiledata->value ) : stripslashes( $profiledata->value );
$this->raw_value = $raw_value;
$this->value = maybe_serialize( $raw_value );
...
// A new method in `BP_XProfile_Data`:
public function get_value() {
return $this->raw_value;
}
Then, anywhere in BP, we would no longer reference the $value
property directly, only get_value()
. (Probably something parallel for set_value()
?) But, now that I'm looking through the codebase, I don't actually see anywhere where we ever internally reference the $value
property of a BP_XProfile_ProfileData
object; pretty much anywhere where we fetch user profile data, it's done through a direct SQL query that is never actually turned into a BP_XProfile_ProfileData
object.
So, maybe there's no point in doing what I've suggested here, at least not until we come to a point where we rethink some of the internals. If you agree, you can ignore my comment :-D
Next steps
To move forward, I think the next step is to think about data migration. We don't currently have a framework for migration; see #6841. Setting aside the details of that framework for a moment, we should think about whether we need some sort of internal backward compatibility layer for unmigrated data as part of an initial release. Say this change is part of v3.1. Sites running v3.0 will have serialized data for multi-value fields. 3.1 will include a migration routine that'll move the data to the new non-serialized format. But what if I upgrade to 3.1 but have not yet run the 3.1 migration routine (either because it's asynchronous, or because it requires manual intervention, or both)? Queries like the one in get_data_for_user()
will fail for multi-value fields - they'll return a value like array( '{a:some serialized array}' )
.
What can we do to mitigate this? It seems to me that we can't have fallback support for non-migrated, serialized data, because - as you note in your comment above - multi-value fields might have serialized data for *one* of the multi-values (so is_serialized()
will not be a reliable check for non-migrated data). It might be that there's nothing we can do about it, and we'll just need to inform admins that such fields may not work correctly until the migration has been run - but it's something we should think about a bit.
(Sorry for the long comment - trying to be thorough!!)
First, <3. It would be great to do away with this kind of serialized data.
Second, some initial thoughts.
wp_bp_xprofile_data
.xprofile_data
. That is, a given user may have more than one row inxprofile_data
corresponding to a given field.BP_XProfile_ProfileData::populate()
, which does$wpdb->get_row()
. In this case we'll probably need to doget_results()
and then figure out how far up the stack we need to add support for arrays. Another example isBP_XProfile_ProfileData::get_data_for_user()
, which keys data by field_id. Here, we'll either need to do the array conversion right in the query method (so the data structure can remain the same in, eg,BP_XProfile_Group::get()
, or we'll have to change the way data is keyed and then make necessary modifications to consuming methods.xprofile_data
table, especially those that expect serialized data. If we find that lots of folks are doing this, we may decide to provide legacy support for it. We did something similar with userlast_activity
data (you can continue to access/modify the data in usermeta, though BP throws a bunch of_doing_it_wrong()
notices). Ideally, we would simply drop support for this kind of direct access; but this will take some good documentation and publicity.Would love to hear more thoughts on any of the above, or anything I am overlooking :)