diff mbox

dmsetup hangs forever

Message ID 97CEEE909D5FE84999281EA4585EB0ED0165DE36@DGGEMA505-MBS.china.huawei.com (mailing list archive)
State Rejected, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Zhangyanfei (YF) Oct. 26, 2017, 8:07 a.m. UTC
Hello

I find an issue when use  dmsetup in the situation udev event timeout.

Dmsetup use the dm_udev_wait function sync with udev event.When use the dmsetup generate a new dm-disk, if the raw disk is abnormal(for example ,a ipsan disk hung IO request), the udevd daemon handle the dm-disk udev event maybe timeout, and will not notify the dmsetup  by semaphore. And because the  dm_udev_wait use the semop to sync with udevd, if udevd event timeout, the dmsetup will hung forever even when the raw disk be recovery.

I wonder if we could use the semtimedop instead semop to add the timeout in function  dm_udev_wait. If the udevd daemon timeout when handle the dm event, the dm_udev_wait could timeout too, and the dmsetup could return error.

This is my patch base lvm2-2.02.115-3:

>From 6961f824abe8eb15f8328f2f4bd39f4cdbf65d4f Mon Sep 17 00:00:00 2001
From: fengtiantian <fengtiantian@huawei.com<mailto:fengtiantian@huawei.com>>
Date: Mon, 23 Oct 2017 19:29:55 +0800
Subject: [PATCH] add timeout when fail to wait udev

---
libdm/libdm-common.c |  7 ++++++-
tools/dmsetup.c      | 20 ++++++++++++++++----
2 files changed, 22 insertions(+), 5 deletions(-)


       out:
        if (!_udev_cookie)
-               (void) dm_udev_wait(cookie);
+               udev_wait_r =  dm_udev_wait(cookie);

        dm_task_destroy(dmt);

+       if (!udev_wait_r)
+               return 0;
+
        return r;
}

@@ -1267,7 +1275,8 @@ static int _simple(int task, const char *name, uint32_t event_nr, int display)
        int udev_wait_flag = task == DM_DEVICE_RESUME ||
                             task == DM_DEVICE_REMOVE;
        int r = 0;
-
+       int udev_wait_r = 1;
+
        struct dm_task *dmt;

        if (!(dmt = dm_task_create(task)))
@@ -1326,13 +1335,16 @@ static int _simple(int task, const char *name, uint32_t event_nr, int display)

       out:
        if (!_udev_cookie && udev_wait_flag)
-               (void) dm_udev_wait(cookie);
+               udev_wait_r = dm_udev_wait(cookie);

        if (r && display && _switches[VERBOSE_ARG])
                r = _display_info(dmt);

        dm_task_destroy(dmt);

+       if(!udev_wait_r)
+               return 0;
+
        return r;
}

--
1.8.3.1
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Zdenek Kabelac Oct. 26, 2017, 8:39 a.m. UTC | #1
Dne 26.10.2017 v 10:07 Zhangyanfei (YF) napsal(a):
> Hello
> 
> I find an issue when use  dmsetup in the situation udev event timeout.
> 
> Dmsetup use the dm_udev_wait function sync with udev event.When use the 
> dmsetup generate a new dm-disk, if the raw disk is abnormal(for example ,a 
> ipsan disk hung IO request), the udevd daemon handle the dm-disk udev event 
> maybe timeout, and will not notify the dmsetup  by semaphore. And because the 
>   dm_udev_wait use the semop to sync with udevd, if udevd event timeout, the 
> dmsetup will hung forever even when the raw disk be recovery.
> 
> I wonder if we could use the semtimedop instead semop to add the timeout in 
> function  dm_udev_wait. If the udevd daemon timeout when handle the dm event, 
> the dm_udev_wait could timeout too, and the dmsetup could return error.
> 
> This is my patch base lvm2-2.02.115-3:


Hi


Unfortunately the same argument why this can't really work still applies.

If the  dm will start to timeout on it's own - without coordination with udev,
your system's logic will end-up with one big mess.

So if the dm would handle timeout - you would also need to provide mechanism 
to correct associated services around it.

The main case here is - it's mandatory it's udev finalizing any timeouts so 
it's in sync with db content.

Moreover if you start to timeout - you typically mask some system failure. In 
majority of cases I've ever seen - it's been always a bug from this category 
(buggy udev rule, or service). So it's always better to fix the bug then keep 
it masked.

AFAIK I'd like to see the semaphore to go away - but it needs wider cooperation.


Regards

Zdenek


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Zhangyanfei (YF) Oct. 27, 2017, 8 a.m. UTC | #2
Hello

If the udevd daemon would not timeout, I think dmsetup mandatory wait udev finalizing any timeouts is good idea.
But udevd would timeout in 180 sencond and kill the event process( systemd-udevd[39029]: timeout: killing). In this situation, I think mandatory wait udev finalizing is useless, because udev has been killed and can't coordination dmsetup forever. So I think it's better to tell the one who call the dmsetup, the process error return, than let the process wait forever. 

If not add the dmsetup timeout mechanism, which strategy to solve this issue better?
1、guarantee the udev never timeout.(but I think it is difficult to make sure any udev event will finish in 180 sencond in any abnormal situation)
2、modify the udev daemon, if udev event timeout,also notify the dmsetup it's done.
3、the one who call the command dmsetup needed timeout itself.

Thanks

-----邮件原件-----
发件人: Zdenek Kabelac [mailto:zkabelac@redhat.com] 
发送时间: 2017年10月26日 16:39
收件人: Zhangyanfei (YF) <yanfei.zhang@huawei.com>; agk@redhat.com; christophe.varoqui@opensvc.com
抄送: dm-devel@redhat.com; guijianfeng <guijianfeng@huawei.com>; Fengtiantian <fengtiantian@huawei.com>; linux-lvm@redhat.com
主题: Re: [dm-devel] dmsetup hangs forever

Dne 26.10.2017 v 10:07 Zhangyanfei (YF) napsal(a):
> Hello
> 
> I find an issue when use  dmsetup in the situation udev event timeout.
> 
> Dmsetup use the dm_udev_wait function sync with udev event.When use 
> the dmsetup generate a new dm-disk, if the raw disk is abnormal(for 
> example ,a ipsan disk hung IO request), the udevd daemon handle the 
> dm-disk udev event maybe timeout, and will not notify the dmsetup  by 
> semaphore. And because the
>   dm_udev_wait use the semop to sync with udevd, if udevd event 
> timeout, the dmsetup will hung forever even when the raw disk be recovery.
> 
> I wonder if we could use the semtimedop instead semop to add the 
> timeout in function  dm_udev_wait. If the udevd daemon timeout when 
> handle the dm event, the dm_udev_wait could timeout too, and the dmsetup could return error.
> 
> This is my patch base lvm2-2.02.115-3:


Hi


Unfortunately the same argument why this can't really work still applies.

If the  dm will start to timeout on it's own - without coordination with udev, your system's logic will end-up with one big mess.

So if the dm would handle timeout - you would also need to provide mechanism to correct associated services around it.

The main case here is - it's mandatory it's udev finalizing any timeouts so it's in sync with db content.

Moreover if you start to timeout - you typically mask some system failure. In majority of cases I've ever seen - it's been always a bug from this category (buggy udev rule, or service). So it's always better to fix the bug then keep it masked.

AFAIK I'd like to see the semaphore to go away - but it needs wider cooperation.


Regards

Zdenek



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Zdenek Kabelac Oct. 27, 2017, 12:28 p.m. UTC | #3
Dne 27.10.2017 v 10:00 Zhangyanfei (YF) napsal(a):
> Hello
> 
> If the udevd daemon would not timeout, I think dmsetup mandatory wait udev finalizing any timeouts is good idea.
> But udevd would timeout in 180 sencond and kill the event process( systemd-udevd[39029]: timeout: killing). In this situation, I think 

The point here is - udev should not be 'randomly' killing workers - and if it 
needs so - it should provide a 'recovery' path mechanism, where such path
after killing worker is supported.

If this mechanism is missing - you end-up with a device which is NOT present 
in udev DB - yet - there is some 'esoteric' dm device which is there
and you don't know if it's usable or not.

mandatory wait udev finalizing is useless, because udev has been killed and 
can't coordination dmsetup forever. So I think it's better to tell the one who 
call the dmsetup, the process error return, than let the process wait forever.
> 
> If not add the dmsetup timeout mechanism, which strategy to solve this issue better?

You can always use i.e. cron job and every 15 minutes run:

'dmsetup udevcomplete_all 15'

To complete any blocked dmsetup/lvm command on a cookie...

The keypoint here is - you have system with broken device list - so any next 
i.e. lvm2 command may 'freeze'  while it would try to read a device that can 
freeze the reading task - this is certainly bad case.

> 1、guarantee the udev never timeout.(but I think it is difficult to make sure any udev event will finish in 180 sencond in any abnormal situation)
> 2、modify the udev daemon, if udev event timeout,also notify the dmsetup it's done.
> 3、the one who call the command dmsetup needed timeout itself.

In principle timeouts are BAD when we talk about storage.
There needs to be some clear 'state-machine' mechanism.

It doesn't make much sense to kill worker which actually might not even be 
possible if the worker freezes inside kernel.


And then there is even 2nd. category of 'kills' of workers - this happens on 
overloaded machines running gazillion 'virtual guest' instances without any 
resource check mechanism with assumption "OOM" killer is beautiful resource 
manager in kernel - however such systems are destined for a near-time reboot - 
  anyone who tries to 'recover' from  an OOM doesn't realize how complex that 
would be - so this case is not worth dmsetup timeout either.


Regards

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/libdm/libdm-common.c b/libdm/libdm-common.c
index bd51645..ede2fea 100644
--- a/libdm/libdm-common.c
+++ b/libdm/libdm-common.c
@@ -59,6 +59,8 @@  union semun
#endif
#endif

+#define UDEV_SEM_TIMEOUT 300
+
static char _dm_dir[PATH_MAX] = DEV_DIR DM_DIR;
static char _sysfs_dir[PATH_MAX] = "/sys/";
static char _path0[PATH_MAX];           /* path buffer, safe 4kB on stack */
@@ -2476,6 +2478,9 @@  static int _udev_wait(uint32_t cookie)
{
        int semid;
        struct sembuf sb = {0, 0, 0};
+       struct timespec timeout;
+       timeout.tv_sec = UDEV_SEM_TIMEOUT;
+       timeout.tv_nsec = 0;

        if (!cookie || !dm_udev_get_sync_support())
                return 1;
@@ -2496,7 +2501,7 @@  static int _udev_wait(uint32_t cookie)
                             cookie, semid);

repeat_wait:
-       if (semop(semid, &sb, 1) < 0) {
+       if (semtimedop(semid, &sb, 1,&timeout) < 0) {
                if (errno == EINTR)
                        goto repeat_wait;
                else if (errno == EIDRM)
diff --git a/tools/dmsetup.c b/tools/dmsetup.c
index 4202dbb..0840031 100644
--- a/tools/dmsetup.c
+++ b/tools/dmsetup.c
@@ -619,6 +619,7 @@  static int _load(CMD_ARGS)
static int _create(CMD_ARGS)
{
        int r = 0;
+       int udev_wait_r = 1;
        struct dm_task *dmt;
        const char *file = NULL;
        uint32_t cookie = 0;
@@ -695,18 +696,22 @@  static int _create(CMD_ARGS)

       out:
        if (!_udev_cookie)
-               (void) dm_udev_wait(cookie);
+               udev_wait_r = dm_udev_wait(cookie);

        if (r && _switches[VERBOSE_ARG])
                r = _display_info(dmt);

        dm_task_destroy(dmt);

+       if(!udev_wait_r)
+               return 0;
+
        return r;
}
static int _do_rename(const char *name, const char *new_name, const char *new_uuid) {
        int r = 0;
+       int udev_wait_r = 1;
        struct dm_task *dmt;
        uint32_t cookie = 0;
        uint16_t udev_flags = 0;
@@ -751,10 +756,13 @@  static int _do_rename(const char *name, const char *new_name, const char *new_uu