Message ID | 20240111120505.4135257-1-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | mdadm: stop using 'idle' for sysfs api "sync_action" to wake up sync_thread | expand |
On Thu, 11 Jan 2024 20:05:05 +0800 Yu Kuai <yukuai1@huaweicloud.com> wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Echo 'idle' to "sync_action" is supposed to stop sync_thread while new > sync_thread can still start. However, currently this behaviour is not > correct, echo 'idle' will actually try to stop sync_thread and then > start a new sync_thread. And mdadm relies on this wrong behaviour in > some places. > > In kernel, if resync is not done yet, then recovery/reshape/check/repair > can't not start in the first place, and if resync is done, echo 'resync' > behaves the same as echo 'idle' for now. Hi Kuai, From the last part I understand that in case of resync/reshape frozen thread is unblocked, not restarted. I miss some explanation about that here. So far I understand is: "Setting "resync" or "reshape" allow to continue frozen sync_thread instead restarting it. Setting "resync" if resync is done, has same effect as "idle" so it is safe." Please describe setting "reshape", I can see that you use it in one place, I think that with reshape we need to be more careful but you are the expert here, maybe it is same as "resync"? > > Hence replace echo 'idle' with echo 'resync/reshape' when trying to > continue frozed sync_thread. There should be no functional changes and > prevent regressions after fixing that echo 'idle' will start new > sync_thread in kernel. Ok, so this is kind of preparing for kernel fix. Got it. > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> > --- I think that I understand purpose of the change. You are trying to avoid thread restarting if not needed and remove reference to incorrect "idle" usage of mdadm. Unfortunately, the changes you need to make have strong reference to kernel implementation. It requires to well describe them because blame is volatile. I would like to propose separate enum to not rely on kernel states naming, some proposals: /* So far I understand write "resync" for both cases */ SYNC_ACTION_RESYNC_START SYNC_ACTION_RESYNC_CONTINUE /* So far I understand write "reshape" for both cases * SYNC_ACTION_RESHAPE_START SYNC_ACTION_RESHAPE_CONTINUE /* Highlight known bug in comment and use "resync"? /* SYNC_ACTION_IDLE /* If needed? */ SYNC_ACTION_ABORT It needs to be handled by proper function which will have comments describing what is written to kernel and why. In userspace, I need more user/reader friendly code. I want to know what we exactly requested from kernel. In some cases we would expect to restart thread is some other cases just to continue frozen one. I would like to know what was a purpose of request in the particular case even if now the same action is used behind. Let me know what you think. Thanks, Mariusz
Hi, 在 2024/01/12 19:44, Mariusz Tkaczyk 写道: > On Thu, 11 Jan 2024 20:05:05 +0800 > Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> From: Yu Kuai <yukuai3@huawei.com> >> >> Echo 'idle' to "sync_action" is supposed to stop sync_thread while new >> sync_thread can still start. However, currently this behaviour is not >> correct, echo 'idle' will actually try to stop sync_thread and then >> start a new sync_thread. And mdadm relies on this wrong behaviour in >> some places. >> >> In kernel, if resync is not done yet, then recovery/reshape/check/repair >> can't not start in the first place, and if resync is done, echo 'resync' >> behaves the same as echo 'idle' for now. > > Hi Kuai, >>From the last part I understand that in case of resync/reshape frozen thread is > unblocked, not restarted. > > I miss some explanation about that here. So far I understand is: > > "Setting "resync" or "reshape" allow to continue frozen sync_thread instead > restarting it. Setting "resync" if resync is done, has same effect as "idle" so > it is safe." > > Please describe setting "reshape", I can see that you use it in one place, I > think that with reshape we need to be more careful but you are the expert here, > maybe it is same as "resync"? The only place to use "reshape" is that reshape is frozed before, and echo "reshape" can continue the interrupted reshape. Of coures, echo "idle/resync" can also continue the interrupted reshape. > >> >> Hence replace echo 'idle' with echo 'resync/reshape' when trying to >> continue frozed sync_thread. There should be no functional changes and >> prevent regressions after fixing that echo 'idle' will start new >> sync_thread in kernel. > > Ok, so this is kind of preparing for kernel fix. Got it. >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> >> --- > > I think that I understand purpose of the change. You are trying to avoid thread > restarting if not needed and remove reference to incorrect "idle" usage of > mdadm. > Unfortunately, the changes you need to make have strong reference to kernel > implementation. It requires to well describe them because blame is volatile. > > I would like to propose separate enum to not rely on kernel states naming, some > proposals: > > /* So far I understand write "resync" for both cases */ > SYNC_ACTION_RESYNC_START > SYNC_ACTION_RESYNC_CONTINUE > > /* So far I understand write "reshape" for both cases * > SYNC_ACTION_RESHAPE_START > SYNC_ACTION_RESHAPE_CONTINUE > > /* Highlight known bug in comment and use "resync"? /* > SYNC_ACTION_IDLE > /* If needed? */ > SYNC_ACTION_ABORT The enum sounds good, and I thought about enum in kerenl as well, currently resync/recovery/reshape/... status is determined by combination of flags, which is hard to follow. > > It needs to be handled by proper function which will have comments describing > what is written to kernel and why. In userspace, I need more user/reader > friendly code. > I want to know what we exactly requested from kernel. In some cases we would > expect to restart thread is some other cases just to continue frozen one. I > would like to know what was a purpose of request in the particular case even if > now the same action is used behind. Sounds like a good plan, and looks like the first thing to do is to sort out all the places to use sysfs api 'sync_action' in mdadm, then we'll know to define the new helper(I just grep the case 'idle' for now). Thanks, Kuai > > Let me know what you think. > > Thanks, > Mariusz > > . >
diff --git a/Grow.c b/Grow.c index 8fa97875..05498c6f 100644 --- a/Grow.c +++ b/Grow.c @@ -843,7 +843,7 @@ static void unfreeze(struct supertype *st) if (sra && sysfs_get_str(sra, NULL, "sync_action", buf, 20) > 0 && strcmp(buf, "frozen\n") == 0) - sysfs_set_str(sra, NULL, "sync_action", "idle"); + sysfs_set_str(sra, NULL, "sync_action", "reshape"); sysfs_free(sra); } } diff --git a/Manage.c b/Manage.c index 91532266..91483112 100644 --- a/Manage.c +++ b/Manage.c @@ -344,7 +344,7 @@ int Manage_stop(char *devname, int fd, int verbose, int will_retry) backwards = 1; if (sysfs_get_ll(mdi, NULL, "reshape_position", &position) != 0) { /* reshape must have finished now */ - sysfs_set_str(mdi, NULL, "sync_action", "idle"); + sysfs_set_str(mdi, NULL, "sync_action", "resync"); goto done; } sysfs_get_two(mdi, NULL, "chunk_size", &chunk1, &chunk2); @@ -386,7 +386,7 @@ int Manage_stop(char *devname, int fd, int verbose, int will_retry) * the reshape monitor */ if (sync_max < old_sync_max) sysfs_set_num(mdi, NULL, "sync_max", sync_max); - sysfs_set_str(mdi, NULL, "sync_action", "idle"); + sysfs_set_str(mdi, NULL, "sync_action", "resync"); /* That should have set things going again. Now we * wait a little while (3 second max) for sync_completed @@ -1717,7 +1717,7 @@ int Manage_subdevs(char *devname, int fd, } free(tst); if (frozen > 0) - sysfs_set_str(&info, NULL, "sync_action","idle"); + sysfs_set_str(&info, NULL, "sync_action", "resync"); if (test && count == 0) return 2; return 0; @@ -1725,7 +1725,7 @@ int Manage_subdevs(char *devname, int fd, abort: free(tst); if (frozen > 0) - sysfs_set_str(&info, NULL, "sync_action","idle"); + sysfs_set_str(&info, NULL, "sync_action", "resync"); return !test && busy ? 2 : 1; } diff --git a/msg.c b/msg.c index 45cd4504..d9f08ebf 100644 --- a/msg.c +++ b/msg.c @@ -252,7 +252,7 @@ int unblock_subarray(struct mdinfo *sra, const int unfreeze) sysfs_set_str(sra, NULL, "metadata_version", buf) || (unfreeze && sysfs_attribute_available(sra, NULL, "sync_action") && - sysfs_set_str(sra, NULL, "sync_action", "idle"))) + sysfs_set_str(sra, NULL, "sync_action", "resync"))) rc = -1; return rc; }