Message ID | 20191021133215.26518-1-ian.jackson@eu.citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [OSSTEST] guest_prepare_disk: Only do the umount if we set an env var | expand |
On Mon, Oct 21, 2019 at 02:32:15PM +0100, Ian Jackson wrote: > This call to guest_umount_lv is here for the benefit of ad-hoc reruns > of (eg) ts-guest-start tidy up any ad-hoc messing about (eg from > earlier runs of ts-debian-fixup or something). It is not needed in > production runs. > > Serendipitously, this osstest code discovered a bug in the Linux > blkback: when tearing down, it sets the backend state to 6 before it > has closed the underlying block devices. This ultimately means that > after "xl destroy" or "xl shutdown -w" there is a period when the > guest's open handle onto its storage is still open. This is wrong. > > This detection depends on us winning a tricky race. So it shows up in > osstest as a very low probability heisenbug. The bug is currently in > all versions of Linux and causing a bit of a nuisance. > > It would be best to add a proper check for this bug. However, this is > quite fiddly: really, it ought to be done as close to the xl command > completion as possible, in the same ssh invocation. That would > involve a fair bit of plumbing and ad-hocery. I don't think that > would be proportionate for such a low-impact bug. > > So instead in this patch I just disable this cleanup code in the > troublesome case, unless it is explicitly requested by the user > setting OSSTEST_GUEST_DISK_MOUNT_CLEANUP to a trueish value. (This > would be reasonably convenient for the ad-hoc testing that this call > serves.) > > Thanks to Roger for diagnosing the Linux kernel bug. > > CC: Roger Pau Monne <roger.pau@citrix.com> > CC: Jürgen Groß <jgross@suse.com> > Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Thanks: Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 78f47480..6b0ee7a2 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -1938,7 +1938,8 @@ sub guest_create_paused ($) { sub guest_prepare_disk ($) { my ($gho) = @_; - guest_umount_lv($gho->{Host}, $gho); + guest_umount_lv($gho->{Host}, $gho) + if $ENV{'OSSTEST_GUEST_DISK_MOUNT_CLEANUP'}; return if ($gho->{Diskfmt} // 'none') eq "none";
This call to guest_umount_lv is here for the benefit of ad-hoc reruns of (eg) ts-guest-start tidy up any ad-hoc messing about (eg from earlier runs of ts-debian-fixup or something). It is not needed in production runs. Serendipitously, this osstest code discovered a bug in the Linux blkback: when tearing down, it sets the backend state to 6 before it has closed the underlying block devices. This ultimately means that after "xl destroy" or "xl shutdown -w" there is a period when the guest's open handle onto its storage is still open. This is wrong. This detection depends on us winning a tricky race. So it shows up in osstest as a very low probability heisenbug. The bug is currently in all versions of Linux and causing a bit of a nuisance. It would be best to add a proper check for this bug. However, this is quite fiddly: really, it ought to be done as close to the xl command completion as possible, in the same ssh invocation. That would involve a fair bit of plumbing and ad-hocery. I don't think that would be proportionate for such a low-impact bug. So instead in this patch I just disable this cleanup code in the troublesome case, unless it is explicitly requested by the user setting OSSTEST_GUEST_DISK_MOUNT_CLEANUP to a trueish value. (This would be reasonably convenient for the ad-hoc testing that this call serves.) Thanks to Roger for diagnosing the Linux kernel bug. CC: Roger Pau Monne <roger.pau@citrix.com> CC: Jürgen Groß <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/TestSupport.pm | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)