Message ID | 20190109054845.93810-1-yangerkun@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | common/scsi_debug: make sure scsi_debug been removed correctly | expand |
On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > Case generic/108 sometimes will fail while testing ext2, and the reson > is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > add the loop to do rmmod to make sure scsi_debug can be removed > correctly. Why does 'rmmod scsi_debug' randomly fail? What bug does ext2 have that prevents the scsi debug module from being released and hence removed? Cheers, Dave.
Dave Chinner wrote on 2019/1/10 5:02: > On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: >> Case generic/108 sometimes will fail while testing ext2, and the reson >> is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now >> add the loop to do rmmod to make sure scsi_debug can be removed >> correctly. > > Why does 'rmmod scsi_debug' randomly fail? > > What bug does ext2 have that prevents the scsi debug module from > being released and hence removed? It's not a bug with ext2, ever been existing in ext4 too. This patch is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why need this is that the behavior of udev cannot be speculated, so scsi_debug may rmmod failed since udev scan open the device and take the reference of module scsi_debug. Thanks, Kun. > > Cheers, > > Dave. >
On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/10 5:02: > > On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > > > Case generic/108 sometimes will fail while testing ext2, and the reson > > > is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > > > add the loop to do rmmod to make sure scsi_debug can be removed > > > correctly. > > > > Why does 'rmmod scsi_debug' randomly fail? > > > > What bug does ext2 have that prevents the scsi debug module from > > being released and hence removed? > > It's not a bug with ext2, ever been existing in ext4 too. This patch is a > reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' commit > d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why need this is > that the behavior of udev cannot be speculated, so scsi_debug may rmmod > failed since udev scan open the device and take the reference of module > scsi_debug. It's probably a good thing to have the specific reason in commit log too. Thanks, Eryu
On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/10 5:02: > >On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > >>Case generic/108 sometimes will fail while testing ext2, and the reson > >>is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > >>add the loop to do rmmod to make sure scsi_debug can be removed > >>correctly. > > > >Why does 'rmmod scsi_debug' randomly fail? > > > >What bug does ext2 have that prevents the scsi debug module from > >being released and hence removed? > > It's not a bug with ext2, ever been existing in ext4 too. This patch > is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' > commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why > need this is that the behavior of udev cannot be speculated, so > scsi_debug may rmmod failed since udev scan open the device and take > the reference of module scsi_debug. IOWs, you copied a hack from cryptsetup tests because you didn't know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew a solution to this problem? Cheers, Dave.
Dave Chinner wrote on 2019/1/15 11:47: > On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: >> >> >> Dave Chinner wrote on 2019/1/10 5:02: >>> On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: >>>> Case generic/108 sometimes will fail while testing ext2, and the reson >>>> is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now >>>> add the loop to do rmmod to make sure scsi_debug can be removed >>>> correctly. >>> >>> Why does 'rmmod scsi_debug' randomly fail? >>> >>> What bug does ext2 have that prevents the scsi debug module from >>> being released and hence removed? >> >> It's not a bug with ext2, ever been existing in ext4 too. This patch >> is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' >> commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why >> need this is that the behavior of udev cannot be speculated, so >> scsi_debug may rmmod failed since udev scan open the device and take >> the reference of module scsi_debug. > > IOWs, you copied a hack from cryptsetup tests because you didn't > know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew > a solution to this problem? I have test to add udev settle before really do 'rmmod scsi_debug', and the result is the same, sometimes 'rmmod scsi_debug' may failed too. For this question, i have ask Ondrej Konzina, the question and answer are as follow: Question: "Hi, While we are testing fs with xfstests, generic/108 will fail at some times with the reson that rmmod find the refcnt of scsi_debug module isn't 0 by checking /proc/modules. And this your patch is really describe the same thing. But, i am confusing about the scene as you said: cleanup() { if [ -d "$MNT_DIR" ] ; then umount -f $MNT_DIR 2>/dev/null rmdir $MNT_DIR 2>/dev/null fi rm -f $LOCAL_FILE 2> /dev/null - udevadm settle >/dev/null 2>&1 - rmmod scsi_debug 2> /dev/null + scsi_debug_teardown "$DEV" || exit 100 } After the settle, all the event has been finish, so who will trigger the scan, and how to scan? Can you help to explain this? Thanks a lot! " Answer: "Well, almost true. But by our experience udev settle command behaviour is not strictly defined (and also discouraged from using even by udev maintainers themselves IIRC). For example it will not synchronize you against events coming after you trigger the settle command. And you have zero control over when the event is added. So the long-term goal is to get rid of it from all tests in cryptsetup (not being on top of our todo list as you can clearly see if you grep scripts across tests subdirectory :)). Due to many existing udev versions among many distributions we found out there's no more reliable way than trying to remove scsi_debug in retry loop. And that loop also handles some weird versions (legacy versions but we do sometimes test even old distros) where udev fails to remove device nodes from /dev from time to time." Since settle can not resolve the problem, so there has no choice add the loop to do these. Maybe it is not a good way for this problem, do you have any suggestion? Cheer, Kun. > > Cheers, > > Dave. >
On Tue, Jan 15, 2019 at 01:31:24PM +0800, yangerkun wrote: > > > Dave Chinner wrote on 2019/1/15 11:47: > >On Sat, Jan 12, 2019 at 10:20:26AM +0800, yangerkun wrote: > >> > >> > >>Dave Chinner wrote on 2019/1/10 5:02: > >>>On Wed, Jan 09, 2019 at 01:48:45PM +0800, yangerkun wrote: > >>>>Case generic/108 sometimes will fail while testing ext2, and the reson > >>>>is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now > >>>>add the loop to do rmmod to make sure scsi_debug can be removed > >>>>correctly. > >>> > >>>Why does 'rmmod scsi_debug' randomly fail? > >>> > >>>What bug does ext2 have that prevents the scsi debug module from > >>>being released and hence removed? > >> > >>It's not a bug with ext2, ever been existing in ext4 too. This patch > >>is a reference to 'https://gitlab.com/cryptsetup/cryptsetup.git' > >>commit d7b9ed05f0931b416c33c8eb2ff1e6efa39270ff, and the reason why > >>need this is that the behavior of udev cannot be speculated, so > >>scsi_debug may rmmod failed since udev scan open the device and take > >>the reference of module scsi_debug. > > > >IOWs, you copied a hack from cryptsetup tests because you didn't > >know about $UDEV_SETTLE_PROG and didn't think to ask if anyone knew > >a solution to this problem? > > I have test to add udev settle before really do 'rmmod scsi_debug', > and the result is the same, sometimes 'rmmod scsi_debug' may failed > too.i So what is generating the events that udev is running that pins the scsi debug device given that it is no longer in use? > For this question, i have ask Ondrej Konzina, the question and > answer are as follow: > > Question: > "Hi, > > While we are testing fs with xfstests, generic/108 will fail at some > times with the reson that rmmod find the refcnt of scsi_debug module > isn't 0 by checking /proc/modules. And this your patch is really > describe the same thing. But, i am confusing about the scene as you said: > cleanup() { > if [ -d "$MNT_DIR" ] ; then > umount -f $MNT_DIR 2>/dev/null > rmdir $MNT_DIR 2>/dev/null > fi > rm -f $LOCAL_FILE 2> /dev/null > - udevadm settle >/dev/null 2>&1 > - rmmod scsi_debug 2> /dev/null > + scsi_debug_teardown "$DEV" || exit 100 > } > > After the settle, all the event has been finish, so who will trigger the > scan, and how to scan? Can you help to explain this? Thanks a lot! > " > Answer: > "Well, almost true. But by our experience udev settle command > behaviour is not strictly defined (and also discouraged from using > even by udev maintainers themselves IIRC). For example it will not > synchronize you against events coming after you trigger the settle > command. So, in summary: We used to have wait loops like you are proposing, then the udevadm settle command was added and we replaced all the wait loops with settles because that worked reliably. Now we're at the stage where udevadm settle doesn't work any more, the maintainers have effectively deprecated it without a replacement, and so we have to go back to hard coding hacky wait loops everywhere to retry until udev releases whatever random devices it holds references to? Forget I said anything, I'm rapidly coming to the conclusion that I have far too high standards for writing "working" software these days. -Dave.
diff --git a/common/scsi_debug b/common/scsi_debug index d9aa0bd..890ec68 100644 --- a/common/scsi_debug +++ b/common/scsi_debug @@ -45,11 +45,13 @@ _put_scsi_debug_dev() { lsmod | grep -wq scsi_debug || return - n=2 + n=15 # use redirection not -q option of modprobe here, because -q of old # modprobe is only quiet when the module is not found, not when the # module is in use. - while [ $n -ge 0 ] && ! modprobe -nr scsi_debug >/dev/null 2>&1; do + while [ $n -ge 0 ]; do + modprobe -r scsi_debug >/dev/null 2>&1 + [ $? -eq 0 ] && return 0 sleep 1 n=$((n-1)) done
Case generic/108 sometimes will fail while testing ext2, and the reson is that rmmod scsi_debug in _put_scsi_debug_dev may randomly fail. Now add the loop to do rmmod to make sure scsi_debug can be removed correctly. Signed-off-by: yangerkun <yangerkun@huawei.com> --- common/scsi_debug | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)