Skip to:
Content

Opened 2 years ago

Last modified 9 months ago

#6841 new enhancement

Framework for bulk data handling after updates

Reported by: boonebgorges Owned by:
Milestone: Future Release Priority: high
Severity: normal Version:
Component: New User Experience Keywords:
Cc: imath, espellcaste@…, stephen@…

Description

We have in the past, and will have in the future, cases where large amounts of data needs to be processed after a major BP update. The case I'm currently working on is #6413 - profile field visibility - but it's come up before: moving user last_activity to the activity table, migrating signups to wp_signups, etc. On large (or even medium) installations, handling huge amounts of data in a single pageload can easily exhaust system resources. We need a better way.

On #6413, I suggested one possibility: a background batch processor, based on wp-cron. This is good because it requires no UI, and uses existing tools. It's bad for a bunch of reasons. wp-cron is buggy and prone to race conditions. wp-cron jobs can't run frequently enough for our purposes, leaving us in a state of half-migration for long periods of time. wp-cron is not reliably usable on all installations. Etc.

Another possibility is a loopback batch processor. It could also work in the background, but would directly fire off an asynchronous request (wp_remote_post() or whatever) instead of waiting for the next cron run. So: BP update is pageload A. A fires off a wp_remote_post(), trigger pageload B. B runs a batch process, and when it's finished, fires wp_remote_post() to C. And so forth, until the batches are completed. One problem here is that self-request can sometimes be blocked by webserver configuration. Another is that an out-of-control loop, due to faulty data or a bug in the implementation, could set off an infinite series of requests that would be difficult for the average BP user to debug or stop.

The last option is to use an AJAX-powered interface. This avoids a lot of the complications described above. It'll work on all servers, and a bad migration can be stopped by closing the browser window. But (a) it requires JS, and (b) it requires user action. Item (a) means that we have to build no-JS compatibility of at least a rudimentary kind (I think, though maybe I could be convinced otherwise). Item (b) means that the admin may decide not to run the migrator right away - or at all. And this has ramifications for the way we build data migrations: the post-migration schema must always be able to fall back on the pre-migration schema, since the migration may not be finished for days or weeks or months. (In fact, this is probably true no matter what route we take.)

Does anyone have strong opinions on this? If we're going to go with an AJAX interface, let's have a discussion about the UX. How do we nag the administrator to run the updater after BP update? Do we need a new admin panel? Etc.

Change History (14)

#1 @sbrajesh
2 years ago

Boone, Those are some really good points.
In my personal opinion, I feel that the ajax based solution is best suited.

  1. The tool is aimed at site admins and not for the end users, so there is no obligation to have a no js alternative. No-js alternative is good but seriously, in today's time, It is difficult to find any site that does not use js. Specially, In our case, community building, I don't think, no js alternative is even required. No js is not going to improve the user experience, so why worry about that.

About implementation, we already have a tools Menu with BuddyPress, so this functionality can be moved to that section. About nagging, how about something similar to what we currently do for the component page association?

#2 @johnjamesjacoby
2 years ago

For live sites, I don't think any of these solutions are viable. Any active site will run into partially migrated data resulting in malformed objects, incomplete data-sets, and eventually cache pollution.

For potentially large data migrations, I'd like to propose instead that we copy existing tables into temporary tables, perform an Ajax style migration similar to bbPress 2's converter tool, check the validity of the migration, and then rename tables only once an administrator can confirm that everything looks right.

Smaller changes that can be handled in only a few queries can probably avoid using this tool, using either dbDelta or something one-off.

#3 @boonebgorges
2 years ago

For live sites, I don't think any of these solutions are viable. Any active site will run into partially migrated data resulting in malformed objects, incomplete data-sets, and eventually cache pollution.

I'm not worried about cache pollution per se (at least not at the object cache, which we handle), but I too am concerned about how we're going to deal with partially migrated data on a live site.

Whatever we do, version x.y always has to be prepared for the data schema of version x.y-1 (and maybe further back), because we can't depend on the migration being immediate. johnjamesjacoby's suggestion is appealing because it means that the backward compatibility can look like this:

if ( we have done the migration ) {
    do the new stuff
} else {
    do the old stuff
}

instead of having to accommodate mixed cases.

For potentially large data migrations, I'd like to propose instead that we copy existing tables into temporary tables, perform an Ajax style migration similar to bbPress 2's converter tool, check the validity of the migration, and then rename tables only once an administrator can confirm that everything looks right.

I think this is a good idea, though I'm unsure what the "confirm that everything looks right" part would look like. If we have automated ways of checking data integrity, and the migration passes muster according to these tools, then in what situation would the admin want to reject or postpone the migration? Seems unlikely that you would, say, examine every database row by hand to make sure they are ok?

#4 @imath
2 years ago

+1 to the "last option" : AJAX-powered interface

(b) it requires user action

I'd say not necessarly. We display the Welcome screen without any user action, we could use the same technic to open the Upgrade page asking the Admin to wait :)

It appears i'll need to do some upgrade in the next version of BuddyDrive, so i've been "javascripting" a bit this afternoon. Here's how it's looking like : https://cldup.com/mxxigK1aOo.mov
I've chosen to add a notice but as i've said just above, we could redirect the user to the page. Moreover, you'll see that the upgrade starts as soon as the page is loaded.

When working on it, i realized, we may need to run more than one upgrade task when BuddyPress is updated, because:

  • we may change various things from one version to another (eg #6413 #6482)
  • the user may upgrade from let's say 2.5 to 2.8. In this case, upgrade tasks for 2.6, 2.7 and 2.8 should probably be processed.

@sbrajesh 's comment made me think we could build an "extendable" interface:

  • BuddyPress plugins could use for their need,
  • Repair tools could use,
  • Any other batch process like sending emails for instance.

#5 @johnjamesjacoby
2 years ago

It may be too much work for a v1, but when the migration is complete, I'm imagining some type of visual confirmation that can confidently say "5,836 rows out of 5,836 were successfully migrated." I doubt that data integrity checks are doable, unless we take random samples or something, but it was a big deal with the bbPress converter to be able to say: "your old forums have 893,837 posts, and we successfully moved everything over; go look at it, it's pretty cool."

We also might be able to do highly optimized queries that can move the data around without needing to loop through every single row. It just depends on the nature of the migration.

Then there's the "why not use restful end-points instead" argument. Honestly, I'm much more comfortable handling all of this internally and as close to the metal as we can.

The bbPress converter has hundreds of hours into making it stable and scalable, so if we can repurpose it's heart for BuddyPress, would be pretty cool.

#6 @espellcaste
2 years ago

  • Cc espellcaste@… added

#7 @netweb
2 years ago

I'll add an entirely useless comment so I can add myself to the CC list ¯\_(ツ)_/¯

#8 @netweb
2 years ago

  • Cc stephen@… added

This is weird/broken/whatever trying to get yourself CC'd, added a fresh comment to #meta291 to get the same core notifications things up and running for the bb's

This ticket was mentioned in Slack in #buddypress by imath. View the logs.


2 years ago

#10 @boonebgorges
2 years ago

  • Milestone changed from Awaiting Review to 2.6
  • Priority changed from normal to high

Our lack of a proper upgrader is holding back a couple of important tickets. Let's make this a priority in 2.6.

This ticket was mentioned in Slack in #buddypress by imath. View the logs.


22 months ago

This ticket was mentioned in Slack in #buddypress by mamaduka. View the logs.


22 months ago

#13 @DJPaul
20 months ago

  • Milestone changed from 2.6 to Future Release

This ticket was mentioned in Slack in #buddypress by mamaduka. View the logs.


9 months ago

Note: See TracTickets for help on using tickets.