Message ID | 20250306094924.1353269-2-zlang@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | isolate privatens way | expand |
On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote: > Currently we have 3 ways to run a test case in _run_seq(): > > if [ -n "${HAVE_PRIVATENS}" ]; then > ./tools/run_privatens "./$seq" > ... > elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then > systemd-run --quiet --unit "${unit}" --scope \ > ./tools/run_setsid "./$seq" & > ... > else > ./tools/run_setsid "./$seq" & > ... > fi > > The "privatens" way brings in some regressions. We need more time > to develop and test this way, it's not time let it to be the > first default choice, so isolate the HAVE_PRIVATENS initialization > by a TRY_PRIVATENS parameter, and disable it by default. > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in > old ways. This patch can be removed after "privatens" way is stable. > > Signed-off-by: Zorro Lang <zlang@kernel.org> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > --- > check | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/check b/check > index ea92b0d62..cb2f19d08 100755 > --- a/check > +++ b/check > @@ -674,10 +674,13 @@ _stash_test_status() { > esac > } > > -# Can we run in a private pid/mount namespace? > -HAVE_PRIVATENS= > -./tools/run_privatens bash -c "exit 77" > -test $? -eq 77 && HAVE_PRIVATENS=yes > +# Don't try "privatens" by default, it's experimental for now. > +if [ "$TRY_PRIVATENS" = "yes" ];then > + # Can we run in a private pid/mount namespace? > + HAVE_PRIVATENS= > + ./tools/run_privatens bash -c "exit 77" > + test $? -eq 77 && HAVE_PRIVATENS=yes > +fi Works for me - I have basically the same patch in my check-parallel stack because this breaks check-parallel mount namespacing for reasons I don't yet understand. Creating a new mount namespace for each test that is run appears to turn the private mount namespace that each check instance is executed in back into a globally shared mount namespace inside each individual test mount namespace. i.e. when check uses private namespaces I can see all the mounts from inside each test namespace from the init namespace and every test can see every mount that every other test runs again. That is the problem I originally used mount namespaces in check-parallel to avoid. I have no idea if this is how mount namespace nesting is actually supposed to work (seems completely broken to me!), but the only solution I've found that works so far is to turn off HAVE_PRIVATENS in check as it is redundant when it is run from check-parallel. FWIW, given that check-parallel now runs in it's own private PID namespace with it's own /proc and /tmp, the original problem of needing to "confine pkill to only the child processes of this test instance" has gone away entirely. i.e. check-parallel does not need check to to restrict pkill to children of the current test being run anymore. Hence I think we can probably remove the new process isolation shenanigans and revert the _pkill wrappers to plain pkill calls again as the private PID namespacing the check-parallel does means pkill does the right thing for both check and check-parallel now. -Dave.
On Thu, Mar 06, 2025 at 10:05:13PM +1100, Dave Chinner wrote: > On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote: > > Currently we have 3 ways to run a test case in _run_seq(): > > > > if [ -n "${HAVE_PRIVATENS}" ]; then > > ./tools/run_privatens "./$seq" > > ... > > elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then > > systemd-run --quiet --unit "${unit}" --scope \ > > ./tools/run_setsid "./$seq" & > > ... > > else > > ./tools/run_setsid "./$seq" & > > ... > > fi > > > > The "privatens" way brings in some regressions. We need more time > > to develop and test this way, it's not time let it to be the > > first default choice, so isolate the HAVE_PRIVATENS initialization > > by a TRY_PRIVATENS parameter, and disable it by default. > > > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in > > old ways. This patch can be removed after "privatens" way is stable. > > > > Signed-off-by: Zorro Lang <zlang@kernel.org> > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > --- > > check | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/check b/check > > index ea92b0d62..cb2f19d08 100755 > > --- a/check > > +++ b/check > > @@ -674,10 +674,13 @@ _stash_test_status() { > > esac > > } > > > > -# Can we run in a private pid/mount namespace? > > -HAVE_PRIVATENS= > > -./tools/run_privatens bash -c "exit 77" > > -test $? -eq 77 && HAVE_PRIVATENS=yes > > +# Don't try "privatens" by default, it's experimental for now. > > +if [ "$TRY_PRIVATENS" = "yes" ];then > > + # Can we run in a private pid/mount namespace? > > + HAVE_PRIVATENS= > > + ./tools/run_privatens bash -c "exit 77" > > + test $? -eq 77 && HAVE_PRIVATENS=yes > > +fi > > Works for me - I have basically the same patch in my check-parallel > stack because this breaks check-parallel mount namespacing for > reasons I don't yet understand. > > Creating a new mount namespace for each test that is run appears > to turn the private mount namespace that each check instance is > executed in back into a globally shared mount namespace inside > each individual test mount namespace. > > i.e. when check uses private namespaces I can see all the mounts > from inside each test namespace from the init namespace and every > test can see every mount that every other test runs again. > > That is the problem I originally used mount namespaces in > check-parallel to avoid. > > I have no idea if this is how mount namespace nesting is actually > supposed to work (seems completely broken to me!), but the only > solution I've found that works so far is to turn off HAVE_PRIVATENS > in check as it is redundant when it is run from check-parallel. > > FWIW, given that check-parallel now runs in it's own private PID > namespace with it's own /proc and /tmp, the original problem of > needing to "confine pkill to only the child processes of this test > instance" has gone away entirely. i.e. check-parallel does not need check to > to restrict pkill to children of the current test being run anymore. > > Hence I think we can probably remove the new process isolation > shenanigans and revert the _pkill wrappers to plain pkill calls > again as the private PID namespacing the check-parallel does means > pkill does the right thing for both check and check-parallel now. Hi Dave, Thanks for your reviewing. Do you mean revert below commit directly? commit 247ab01fa227647c290cf82696f1b0bbf87a0177 Author: Darrick J. Wong <djwong@kernel.org> Date: Mon Feb 3 14:00:28 2025 -0800 check: run tests in a private pid/mount namespace I isolate the HAVE_PRIVATENS rather than reverting it, due to I'm not sure if you or Darrick still would like to improve that test way. Thanks, Zorro > > -Dave. > -- > Dave Chinner > david@fromorbit.com >
On Thu, Mar 06, 2025 at 09:47:23PM +0800, Zorro Lang wrote: > On Thu, Mar 06, 2025 at 10:05:13PM +1100, Dave Chinner wrote: > > On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote: > > > Currently we have 3 ways to run a test case in _run_seq(): > > > > > > if [ -n "${HAVE_PRIVATENS}" ]; then > > > ./tools/run_privatens "./$seq" > > > ... > > > elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then > > > systemd-run --quiet --unit "${unit}" --scope \ > > > ./tools/run_setsid "./$seq" & > > > ... > > > else > > > ./tools/run_setsid "./$seq" & > > > ... > > > fi > > > > > > The "privatens" way brings in some regressions. We need more time > > > to develop and test this way, it's not time let it to be the > > > first default choice, so isolate the HAVE_PRIVATENS initialization > > > by a TRY_PRIVATENS parameter, and disable it by default. > > > > > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in > > > old ways. This patch can be removed after "privatens" way is stable. > > > > > > Signed-off-by: Zorro Lang <zlang@kernel.org> > > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > > --- > > > check | 11 +++++++---- > > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > > > diff --git a/check b/check > > > index ea92b0d62..cb2f19d08 100755 > > > --- a/check > > > +++ b/check > > > @@ -674,10 +674,13 @@ _stash_test_status() { > > > esac > > > } > > > > > > -# Can we run in a private pid/mount namespace? > > > -HAVE_PRIVATENS= > > > -./tools/run_privatens bash -c "exit 77" > > > -test $? -eq 77 && HAVE_PRIVATENS=yes > > > +# Don't try "privatens" by default, it's experimental for now. > > > +if [ "$TRY_PRIVATENS" = "yes" ];then > > > + # Can we run in a private pid/mount namespace? > > > + HAVE_PRIVATENS= > > > + ./tools/run_privatens bash -c "exit 77" > > > + test $? -eq 77 && HAVE_PRIVATENS=yes > > > +fi > > > > Works for me - I have basically the same patch in my check-parallel > > stack because this breaks check-parallel mount namespacing for > > reasons I don't yet understand. > > > > Creating a new mount namespace for each test that is run appears > > to turn the private mount namespace that each check instance is > > executed in back into a globally shared mount namespace inside > > each individual test mount namespace. > > > > i.e. when check uses private namespaces I can see all the mounts > > from inside each test namespace from the init namespace and every > > test can see every mount that every other test runs again. > > > > That is the problem I originally used mount namespaces in > > check-parallel to avoid. > > > > I have no idea if this is how mount namespace nesting is actually > > supposed to work (seems completely broken to me!), but the only > > solution I've found that works so far is to turn off HAVE_PRIVATENS > > in check as it is redundant when it is run from check-parallel. > > > > FWIW, given that check-parallel now runs in it's own private PID > > namespace with it's own /proc and /tmp, the original problem of > > needing to "confine pkill to only the child processes of this test > > instance" has gone away entirely. i.e. check-parallel does not need check to > > to restrict pkill to children of the current test being run anymore. > > > > Hence I think we can probably remove the new process isolation > > shenanigans and revert the _pkill wrappers to plain pkill calls > > again as the private PID namespacing the check-parallel does means > > pkill does the right thing for both check and check-parallel now. > > Hi Dave, > > Thanks for your reviewing. Do you mean revert below commit directly? > > commit 247ab01fa227647c290cf82696f1b0bbf87a0177 > Author: Darrick J. Wong <djwong@kernel.org> > Date: Mon Feb 3 14:00:28 2025 -0800 > > check: run tests in a private pid/mount namespace > > I isolate the HAVE_PRIVATENS rather than reverting it, due to I'm not > sure if you or Darrick still would like to improve that test way. I'm talking about removing all the setsid and namspace bits that we added to try to solve the need for pkill --parent. i.e. this series of patches: 949bdf8ea check: deprecate using process sessions to isolate test instances 247ab01fa check: run tests in a private pid/mount namespace 88d60f434 common: fix pkill by running test program in a separate session c71349150 common/rc: hoist pkill to a helper function 77548e606 common/rc: create a wrapper for the su command Using private PID namespaces in check-parallel has solved the original problem that required the pkill --parent usage and the fsstress binary rename/copy workaround by wrapping process isolation outside of each check instance. Hence check and the internal check infrastructure doesn't need to ever care about process isolation to solve this check-parallel problem anymore. i.e. the isolation fix should have been made to check-parallel, not to check. It is not ideal to have to walk all this stuff back, but this is what happens when an RFC-scope feature prototype is merged without allowing adequate time for review or give the author a chance to address at least one round of review comments and bug fixes before it gets merged.... -Dave.
On Fri, Mar 07, 2025 at 07:56:28AM +1100, Dave Chinner wrote: > On Thu, Mar 06, 2025 at 09:47:23PM +0800, Zorro Lang wrote: > > On Thu, Mar 06, 2025 at 10:05:13PM +1100, Dave Chinner wrote: > > > On Thu, Mar 06, 2025 at 05:49:23PM +0800, Zorro Lang wrote: > > > > Currently we have 3 ways to run a test case in _run_seq(): > > > > > > > > if [ -n "${HAVE_PRIVATENS}" ]; then > > > > ./tools/run_privatens "./$seq" > > > > ... > > > > elif [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then > > > > systemd-run --quiet --unit "${unit}" --scope \ > > > > ./tools/run_setsid "./$seq" & > > > > ... > > > > else > > > > ./tools/run_setsid "./$seq" & > > > > ... > > > > fi > > > > > > > > The "privatens" way brings in some regressions. We need more time > > > > to develop and test this way, it's not time let it to be the > > > > first default choice, so isolate the HAVE_PRIVATENS initialization > > > > by a TRY_PRIVATENS parameter, and disable it by default. > > > > > > > > Set TRY_PRIVATENS=yes to give "privatens" a try, otherwise run in > > > > old ways. This patch can be removed after "privatens" way is stable. > > > > > > > > Signed-off-by: Zorro Lang <zlang@kernel.org> > > > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > > > > --- > > > > check | 11 +++++++---- > > > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/check b/check > > > > index ea92b0d62..cb2f19d08 100755 > > > > --- a/check > > > > +++ b/check > > > > @@ -674,10 +674,13 @@ _stash_test_status() { > > > > esac > > > > } > > > > > > > > -# Can we run in a private pid/mount namespace? > > > > -HAVE_PRIVATENS= > > > > -./tools/run_privatens bash -c "exit 77" > > > > -test $? -eq 77 && HAVE_PRIVATENS=yes > > > > +# Don't try "privatens" by default, it's experimental for now. > > > > +if [ "$TRY_PRIVATENS" = "yes" ];then > > > > + # Can we run in a private pid/mount namespace? > > > > + HAVE_PRIVATENS= > > > > + ./tools/run_privatens bash -c "exit 77" > > > > + test $? -eq 77 && HAVE_PRIVATENS=yes > > > > +fi > > > > > > Works for me - I have basically the same patch in my check-parallel > > > stack because this breaks check-parallel mount namespacing for > > > reasons I don't yet understand. > > > > > > Creating a new mount namespace for each test that is run appears > > > to turn the private mount namespace that each check instance is > > > executed in back into a globally shared mount namespace inside > > > each individual test mount namespace. > > > > > > i.e. when check uses private namespaces I can see all the mounts > > > from inside each test namespace from the init namespace and every > > > test can see every mount that every other test runs again. > > > > > > That is the problem I originally used mount namespaces in > > > check-parallel to avoid. > > > > > > I have no idea if this is how mount namespace nesting is actually > > > supposed to work (seems completely broken to me!), but the only > > > solution I've found that works so far is to turn off HAVE_PRIVATENS > > > in check as it is redundant when it is run from check-parallel. > > > > > > FWIW, given that check-parallel now runs in it's own private PID > > > namespace with it's own /proc and /tmp, the original problem of > > > needing to "confine pkill to only the child processes of this test > > > instance" has gone away entirely. i.e. check-parallel does not need check to > > > to restrict pkill to children of the current test being run anymore. > > > > > > Hence I think we can probably remove the new process isolation > > > shenanigans and revert the _pkill wrappers to plain pkill calls > > > again as the private PID namespacing the check-parallel does means > > > pkill does the right thing for both check and check-parallel now. > > > > Hi Dave, > > > > Thanks for your reviewing. Do you mean revert below commit directly? > > > > commit 247ab01fa227647c290cf82696f1b0bbf87a0177 > > Author: Darrick J. Wong <djwong@kernel.org> > > Date: Mon Feb 3 14:00:28 2025 -0800 > > > > check: run tests in a private pid/mount namespace > > > > I isolate the HAVE_PRIVATENS rather than reverting it, due to I'm not > > sure if you or Darrick still would like to improve that test way. > > I'm talking about removing all the setsid and namspace bits that we > added to try to solve the need for pkill --parent. i.e. this series > of patches: > > 949bdf8ea check: deprecate using process sessions to isolate test instances > 247ab01fa check: run tests in a private pid/mount namespace > 88d60f434 common: fix pkill by running test program in a separate session > c71349150 common/rc: hoist pkill to a helper function > 77548e606 common/rc: create a wrapper for the su command > > Using private PID namespaces in check-parallel has solved the > original problem that required the pkill --parent usage and the > fsstress binary rename/copy workaround by wrapping process isolation > outside of each check instance. Hence check and the internal > check infrastructure doesn't need to ever care about process > isolation to solve this check-parallel problem anymore. If "check" doesn't need these namespace isolation things, that will be good to me. That'll help to isolate the issues on check or on check-parallel. And we won't bring in regressions to check when try to fix check-parallel. Now let me merge this patch at first, to avoid the regressions on btrfs. Then please feel free to send patches to remove/reduce the coupling of two testing ways (check and check-parallel:) > > i.e. the isolation fix should have been made to check-parallel, > not to check. > > It is not ideal to have to walk all this stuff back, but this is > what happens when an RFC-scope feature prototype is merged without > allowing adequate time for review or give the author a chance to > address at least one round of review comments and bug fixes before > it gets merged.... Sorry, I've learned from this. I thought I've given it lots of test, and fixed all bug reports at that time. And I asked several times if there's any objection about merging it in next release. As there's not objection from anyone in 2 weeks, so I decided to give it a chance, to save the super complicated rebasing workload... Anyway, next time, for each complex patchset on infrastructure, I'll always give it 3+ release cycles time. And never merge it if without 3 acks from xfs, ext4 and btrfs, as there're too many different test ways I can't test cover. Sorry again for the mess this time, hope we can back to track soon. Thanks, Zorro > > -Dave. > -- > Dave Chinner > david@fromorbit.com >
On Fri, Mar 07, 2025 at 11:47:22PM +0800, Zorro Lang wrote: > On Fri, Mar 07, 2025 at 07:56:28AM +1100, Dave Chinner wrote: > > I'm talking about removing all the setsid and namspace bits that we > > added to try to solve the need for pkill --parent. i.e. this series > > of patches: > > > > 949bdf8ea check: deprecate using process sessions to isolate test instances > > 247ab01fa check: run tests in a private pid/mount namespace > > 88d60f434 common: fix pkill by running test program in a separate session > > c71349150 common/rc: hoist pkill to a helper function > > 77548e606 common/rc: create a wrapper for the su command > > > > Using private PID namespaces in check-parallel has solved the > > original problem that required the pkill --parent usage and the > > fsstress binary rename/copy workaround by wrapping process isolation > > outside of each check instance. Hence check and the internal > > check infrastructure doesn't need to ever care about process > > isolation to solve this check-parallel problem anymore. > > If "check" doesn't need these namespace isolation things, that will be good > to me. That'll help to isolate the issues on check or on check-parallel. > And we won't bring in regressions to check when try to fix check-parallel. > > Now let me merge this patch at first, to avoid the regressions on btrfs. > Then please feel free to send patches to remove/reduce the coupling of > two testing ways (check and check-parallel:) Sure. FWIW, it's not necessary for you to test whether the merged version of check-parallel runs correctly right now. If some other change breaks it, I will fix it and send patches to fix it. I'm much more concerned that changes made to support check-parallel don't break check... > > i.e. the isolation fix should have been made to check-parallel, > > not to check. > > > > It is not ideal to have to walk all this stuff back, but this is > > what happens when an RFC-scope feature prototype is merged without > > allowing adequate time for review or give the author a chance to > > address at least one round of review comments and bug fixes before > > it gets merged.... > > Sorry, I've learned from this. I thought I've given it lots of test, > and fixed all bug reports at that time. And I asked several times > if there's any objection about merging it in next release. As there's > not objection from anyone in 2 weeks, so I decided to give it a chance, > to save the super complicated rebasing workload... Sure, not having anyone review a prototype after a couple of weeks is pretty common - it does not mean it is ready for merge. It typically means that the author will then post a newer, more complete version when they are ready, and then maybe people will look at it. Hence my comment on where I wanted people to focus their review - not on the check-parallel stuff, but on the generic infrastructure changes (i.e. "get the process underway") because they were much more important to get right than the check-parallel script. IOWs, I was not asking anyone to bypass the normal cyclic review-repost process - I was simply doing what ever experienced developer does when posting a large patchset: guiding reviewers to the important areas of the patch set to focus on first. > Anyway, next time, > for each complex patchset on infrastructure, I'll always give it 3+ > release cycles time. And never merge it if without 3 acks from xfs, > ext4 and btrfs, And that, IMO, is swinging too far the other way. If the code is complete and ready to go and nobody will review it, then the above rule makes it impossible for the author to get their code merged. now we have the opposite problem: people have to carry large infrastructure changes for a long time because they cannot get anyone to review it. The normal process is to let review-and-repost cycles occur naturally (e.g. every couple of weeks for a few versions), with the author indicating what the important things to focus review on in the patchset are. When that cycle reaches it's end, a request for merge will be made (e.g. a pull request or a repost with all patches containing RVBs). This shouldn't take months, and if the reviewers focus on the things that the author asks them to look at first then the important things end up being ready for merge earlier rather than later. IOWs, there is no need for placing arbitrary bars to jump over if the process is allowed to follow it's natural path. Problems typically only arise when the process is not followed, so there needs to be a very good reason for stepping outside normal processes or having arbitrary rules that nobody really knows when/if they might trigger... -Dave.
diff --git a/check b/check index ea92b0d62..cb2f19d08 100755 --- a/check +++ b/check @@ -674,10 +674,13 @@ _stash_test_status() { esac } -# Can we run in a private pid/mount namespace? -HAVE_PRIVATENS= -./tools/run_privatens bash -c "exit 77" -test $? -eq 77 && HAVE_PRIVATENS=yes +# Don't try "privatens" by default, it's experimental for now. +if [ "$TRY_PRIVATENS" = "yes" ];then + # Can we run in a private pid/mount namespace? + HAVE_PRIVATENS= + ./tools/run_privatens bash -c "exit 77" + test $? -eq 77 && HAVE_PRIVATENS=yes +fi # Can we run systemd scopes? HAVE_SYSTEMD_SCOPES=