Message ID | 5A914B45.8080200@xtaotech.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 2/24/18 5:23 AM, Yang Joseph wrote: > hello, > > Before the repair process, xfs_repair will check if user specified device already > has a writable mountpoint. And it will stat all the mountpoints of the system. If there > is a dead mountpoint, this checking will be blocked and xfs_repair will enter 'D' state. > > So, if user can make sure there is no writable mountpoint, xfs_repair can skip > this checking. This PR add a new option '-F' to do this job. > > Your suggestions are appreciated! For starters, this patch seems whitespace mangled and does not apply for me. However, repairing a filesystem that is actually mounted read-write could cause serious damage. Giving the admin tools to do this carries significant risk; we already call it "dangerous" to repair even an ro-mounted filesystem. Do you have a testcase to produce a "dead" mountpoint which causes xfs_repair to go into "D" state? Where was xfs_repair stuck? That sounds like a bug worth fixing, but I am much less excited about adding options which could do serious damage to a filesystem. Thanks, -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 24, 2018 at 11:56:44AM -0600, Eric Sandeen wrote: > On 2/24/18 5:23 AM, Yang Joseph wrote: > > hello, > > > > Before the repair process, xfs_repair will check if user specified device already > > has a writable mountpoint. And it will stat all the mountpoints of the system. If there > > is a dead mountpoint, this checking will be blocked and xfs_repair will enter 'D' state. So why is the mount point dead? That kinda means that the filesystem is still mounted, but something has hung somewhere and the filesystem may still have active references to the underlying device and be doing stuff that is modifying the filesystem.... And if the device is still busy, then you aren't going to be able to mount the repaired device, anyway, because the block device is still busy... > That sounds like a bug worth fixing, but I am much > less excited about adding options which could do serious damage > to a filesystem. TO me it sounds like something that should be fixed by a reboot, not by adding dangerous options to xfs_repair... Cheers, Dave.
On 2/24/18 4:04 PM, Dave Chinner wrote: >> That sounds like a bug worth fixing, but I am much >> less excited about adding options which could do serious damage >> to a filesystem. > TO me it sounds like something that should be fixed by a reboot, not > by adding dangerous options to xfs_repair... Yes, that was my point - if it's hung there is a bug somewhere, that should be fixed - not necessarily in xfsprogs ;) But without more info about the situation, I don't know. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In our case, there is a mountpoint of ceph-fuse type and this mountpoint is abnormal. I execute 'xfs_repair -n /dev/nbd4' cmd. Then xfs_repair is blocked in stat() systemcall. '/dev/nbd4' has no relationship with the ceph-fuse mountpoint. [root@compute5 ~]# ps aux | grep xfs_repair root 16469 0.0 0.0 114744 564 ? D 10:50 0:00 xfs_repair -n /dev/nbd4 [root@compute5 ~]# cat /proc/16469/stack [<ffffffffa04b953d>] __fuse_request_send+0x13d/0x2c0 [fuse] [<ffffffffa04b96d2>] fuse_request_send+0x12/0x20 [fuse] [<ffffffffa04be67a>] fuse_do_getattr+0x11a/0x2e0 [fuse] [<ffffffffa04bfba5>] fuse_update_attributes+0x75/0x80 [fuse] [<ffffffffa04bfbf3>] fuse_getattr+0x43/0x50 [fuse] [<ffffffff81203976>] vfs_getattr+0x46/0x80 [<ffffffff81203aa5>] vfs_fstatat+0x75/0xc0 [<ffffffff81203ffe>] SYSC_newstat+0x2e/0x60 [<ffffffff812042de>] SyS_newstat+0xe/0x10 [<ffffffff81697809>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The stat() is from the following code: // libxfs/linux.c:platform_check_mount() while ((mnt = getmntent(f)) != NULL) { if (stat64(mnt->mnt_fsname, &mst) < 0) <---------<<<< unconditionally stat all mountpoints continue; xfs_repair have to check all mountpoints of the system to make sure there is no writable mount point of user specified device. If there is one abnormal mountpoint, event it not related to user specified device, xfs_repair will be blocked. I can make sure there is no writable mountpoint of /dev/nbd4, so xfs_repair don't need to check all mountpoints of the system. This is why I want to add this '-F' option. Because there are lots of other services on this node, I can't reboot the machine. thx Yang Honggang On 02/25/2018 06:04 AM, Dave Chinner wrote: > On Sat, Feb 24, 2018 at 11:56:44AM -0600, Eric Sandeen wrote: >> On 2/24/18 5:23 AM, Yang Joseph wrote: >>> hello, >>> >>> Before the repair process, xfs_repair will check if user specified device already >>> has a writable mountpoint. And it will stat all the mountpoints of the system. If there >>> is a dead mountpoint, this checking will be blocked and xfs_repair will enter 'D' state. > So why is the mount point dead? > > That kinda means that the filesystem is still mounted, but something > has hung somewhere and the filesystem may still have active > references to the underlying device and be doing stuff that is > modifying the filesystem.... > > And if the device is still busy, then you aren't going to be able to > mount the repaired device, anyway, because the block device is still > busy... > >> That sounds like a bug worth fixing, but I am much >> less excited about adding options which could do serious damage >> to a filesystem. > TO me it sounds like something that should be fixed by a reboot, not > by adding dangerous options to xfs_repair... > > Cheers, > > Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 26, 2018 at 10:59:07AM +0800, Yang Joseph wrote: > In our case, there is a mountpoint of ceph-fuse type and this mountpoint is > abnormal. > I execute 'xfs_repair -n /dev/nbd4' cmd. Then xfs_repair is blocked in > stat() > systemcall. '/dev/nbd4' has no relationship with the ceph-fuse mountpoint. > > [root@compute5 ~]# ps aux | grep xfs_repair > root 16469 0.0 0.0 114744 564 ? D 10:50 0:00 xfs_repair > -n /dev/nbd4 > > [root@compute5 ~]# cat /proc/16469/stack > [<ffffffffa04b953d>] __fuse_request_send+0x13d/0x2c0 [fuse] > [<ffffffffa04b96d2>] fuse_request_send+0x12/0x20 [fuse] > [<ffffffffa04be67a>] fuse_do_getattr+0x11a/0x2e0 [fuse] > [<ffffffffa04bfba5>] fuse_update_attributes+0x75/0x80 [fuse] > [<ffffffffa04bfbf3>] fuse_getattr+0x43/0x50 [fuse] > [<ffffffff81203976>] vfs_getattr+0x46/0x80 > [<ffffffff81203aa5>] vfs_fstatat+0x75/0xc0 > [<ffffffff81203ffe>] SYSC_newstat+0x2e/0x60 > [<ffffffff812042de>] SyS_newstat+0xe/0x10 > [<ffffffff81697809>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > So, you have a mount point stuck because fuse can't connect. Why should xfs_repair workaround this issue? > The stat() is from the following code: > > // libxfs/linux.c:platform_check_mount() > while ((mnt = getmntent(f)) != NULL) { > if (stat64(mnt->mnt_fsname, &mst) < 0) <---------<<<< unconditionally > stat all mountpoints > continue; > > xfs_repair have to check all mountpoints of the system to make sure there is > no writable mount point of user specified device. If there is one abnormal > mountpoint, event it not related to user specified device, xfs_repair will > be blocked. > > I can make sure there is no writable mountpoint of /dev/nbd4, so xfs_repair > don't need to check all mountpoints of the system. This is why I want to add > this '-F' option. > While I understand your point, I wonder why you can't close the specific fuse connection here, and, if the right approach for you wouldn't be able to close this fuse connection, instead of hack xfs_repair to bypass mount point checks. In any way, I think '-F' is really not a good argument for such force, it could easily be used by mistake in place of, let's say '-f', if such option is ever to be implemented, it should be typo-safe, something like --force. But still, I think the right approach here would be fuse to provide a way to force a close on the specific connection. > Because there are lots of other services on this node, I can't reboot the > machine. > > thx > > Yang Honggang > > > > > hello, > > > > > > > > Before the repair process, xfs_repair will check if user specified device already > > > > has a writable mountpoint. And it will stat all the mountpoints of the system. If there > > > > is a dead mountpoint, this checking will be blocked and xfs_repair will enter 'D' state. > > So why is the mount point dead? > > > > That kinda means that the filesystem is still mounted, but something > > has hung somewhere and the filesystem may still have active > > references to the underlying device and be doing stuff that is > > modifying the filesystem.... > > > > And if the device is still busy, then you aren't going to be able to > > mount the repaired device, anyway, because the block device is still > > busy... > > > > > That sounds like a bug worth fixing, but I am much > > > less excited about adding options which could do serious damage > > > to a filesystem. > > TO me it sounds like something that should be fixed by a reboot, not > > by adding dangerous options to xfs_repair... > > > > Cheers, > > > > Dave. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 26, 2018 at 10:59:07AM +0800, Yang Joseph wrote: > In our case, there is a mountpoint of ceph-fuse type and this mountpoint is > abnormal. > I execute 'xfs_repair -n /dev/nbd4' cmd. Then xfs_repair is blocked in > stat() > systemcall. '/dev/nbd4' has no relationship with the ceph-fuse mountpoint. > > [root@compute5 ~]# ps aux | grep xfs_repair > root 16469 0.0 0.0 114744 564 ? D 10:50 0:00 xfs_repair > -n /dev/nbd4 > > [root@compute5 ~]# cat /proc/16469/stack > [<ffffffffa04b953d>] __fuse_request_send+0x13d/0x2c0 [fuse] > [<ffffffffa04b96d2>] fuse_request_send+0x12/0x20 [fuse] > [<ffffffffa04be67a>] fuse_do_getattr+0x11a/0x2e0 [fuse] > [<ffffffffa04bfba5>] fuse_update_attributes+0x75/0x80 [fuse] > [<ffffffffa04bfbf3>] fuse_getattr+0x43/0x50 [fuse] > [<ffffffff81203976>] vfs_getattr+0x46/0x80 > [<ffffffff81203aa5>] vfs_fstatat+0x75/0xc0 > [<ffffffff81203ffe>] SYSC_newstat+0x2e/0x60 > [<ffffffff812042de>] SyS_newstat+0xe/0x10 > [<ffffffff81697809>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > The stat() is from the following code: > > // libxfs/linux.c:platform_check_mount() > while ((mnt = getmntent(f)) != NULL) { > if (stat64(mnt->mnt_fsname, &mst) < 0) <---------<<<< unconditionally > stat all mountpoints > continue; > > xfs_repair have to check all mountpoints of the system to make sure there is > no writable mount point of user specified device. If there is one abnormal > mountpoint, event it not related to user specified device, xfs_repair will > be blocked. > > I can make sure there is no writable mountpoint of /dev/nbd4, so xfs_repair > don't need to check all mountpoints of the system. This is why I want to add > this '-F' option. > > Because there are lots of other services on this node, I can't reboot the > machine. > Suffice it to say that I agree with the other comments that this probably isn't something we want to "fix" in xfs_repair... but given your particular circumstances, would a lazy unmount of the borked mount allow the repair to proceed? Brian > thx > > Yang Honggang > > On 02/25/2018 06:04 AM, Dave Chinner wrote: > > On Sat, Feb 24, 2018 at 11:56:44AM -0600, Eric Sandeen wrote: > > > On 2/24/18 5:23 AM, Yang Joseph wrote: > > > > hello, > > > > > > > > Before the repair process, xfs_repair will check if user specified device already > > > > has a writable mountpoint. And it will stat all the mountpoints of the system. If there > > > > is a dead mountpoint, this checking will be blocked and xfs_repair will enter 'D' state. > > So why is the mount point dead? > > > > That kinda means that the filesystem is still mounted, but something > > has hung somewhere and the filesystem may still have active > > references to the underlying device and be doing stuff that is > > modifying the filesystem.... > > > > And if the device is still busy, then you aren't going to be able to > > mount the repaired device, anyway, because the block device is still > > busy... > > > > > That sounds like a bug worth fixing, but I am much > > > less excited about adding options which could do serious damage > > > to a filesystem. > > TO me it sounds like something that should be fixed by a reboot, not > > by adding dangerous options to xfs_repair... > > > > Cheers, > > > > Dave. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/libxfs.h b/include/libxfs.h index c5fb396..9d5d3ee 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -112,6 +112,8 @@ typedef struct libxfs_xinit { int rcreat; /* try to create realtime subvolume */ int setblksize; /* attempt to set device blksize */ int usebuflock; /* lock xfs_buf_t's - for MT usage */ + int force_repair; /* ignore writable mount checking */ + /* output results */ dev_t ddev; /* device for data subvolume */ dev_t logdev; /* device for log subvolume */ diff --git a/libxfs/init.c b/libxfs/init.c index 7bde8b7..8c0b418 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -199,7 +199,8 @@ libxfs_device_close(dev_t dev) } static int -check_open(char *path, int flags, char **rawfile, char **blockfile) +check_open(char *path, int flags, char **rawfile, char **blockfile, + int force_repair) { int readonly = (flags & LIBXFS_ISREADONLY); int inactive = (flags & LIBXFS_ISINACTIVE); @@ -225,7 +226,8 @@ check_open(char *path, int flags, char **rawfile, char **blockfile) if (!readonly && !inactive && platform_check_ismounted(path, *blockfile, NULL, 1)) return 0; - if (inactive && check_isactive(path, *blockfile, ((readonly|dangerously)?1:0))) + if (inactive && !force_repair && check_isactive(path, *blockfile, + ((readonly|dangerously)?1:0))) return 0; return 1; @@ -270,7 +272,7 @@ libxfs_init(libxfs_init_t *a) radix_tree_init(); if (a->volname) { - if(!check_open(a->volname,flags,&rawfile,&blockfile)) + if(!check_open(a->volname, flags, &rawfile, &blockfile, a->force_repair)) goto done; fd = open(rawfile, O_RDONLY); dname = a->dname = a->volname; @@ -284,7 +286,7 @@ libxfs_init(libxfs_init_t *a) platform_findsizes(dname, a->dfd, &a->dsize, &a->dbsize); } else { - if (!check_open(dname, flags, &rawfile, &blockfile)) + if (!check_open(dname, flags, &rawfile, &blockfile, a->force_repair)) goto done; a->ddev = libxfs_device_open(rawfile, a->dcreat, flags, a->setblksize); @@ -302,7 +304,7 @@ libxfs_init(libxfs_init_t *a) platform_findsizes(dname, a->logfd, &a->logBBsize, &a->lbsize); } else { - if (!check_open(logname, flags, &rawfile, &blockfile)) + if (!check_open(logname, flags, &rawfile, &blockfile, a->force_repair)) goto done; a->logdev = libxfs_device_open(rawfile, a->lcreat, flags, a->setblksize); @@ -320,7 +322,7 @@ libxfs_init(libxfs_init_t *a) platform_findsizes(dname, a->rtfd, &a->rtsize, &a->rtbsize); } else { - if (!check_open(rtname, flags, &rawfile, &blockfile)) + if (!check_open(rtname, flags, &rawfile, &blockfile, a->force_repair)) goto done; a->rtdev = libxfs_device_open(rawfile, a->rcreat, flags, a->setblksize); diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8 index 85e4dc9..78e1125 100644 --- a/man/man8/xfs_repair.8 +++ b/man/man8/xfs_repair.8 @@ -4,7 +4,7 @@ xfs_repair \- repair an XFS filesystem .SH SYNOPSIS .B xfs_repair [ -.B \-dfLnPv +.B \-dfLnPvF ] [ .B \-m .I maxmem @@ -162,6 +162,9 @@ ag_stride is enabled. .B \-v Verbose output. May be specified multiple times to increase verbosity. .TP +.B \-F +Force to repair, ignore writable mount checking. +.TP .B \-d Repair dangerously. Allow .B xfs_repair diff --git a/repair/globals.h b/repair/globals.h index c7bbe6f..085ea0f 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -104,6 +104,7 @@ EXTERN char *rt_name; /* Name of realtime device */ EXTERN int rt_spec; /* Realtime dev specified as option */ EXTERN int convert_lazy_count; /* Convert lazy-count mode on/off */ EXTERN int lazy_count; /* What to set if to if converting */ +EXTERN int force_repair; /* ignore writable mount checking */ /* misc status variables */ diff --git a/repair/init.c b/repair/init.c index 609229c..a3b4539 100644 --- a/repair/init.c +++ b/repair/init.c @@ -90,6 +90,8 @@ xfs_init(libxfs_init_t *args) args->usebuflock = do_prefetch; args->setblksize = 0; args->isdirect = LIBXFS_DIRECT; + args->force_repair = force_repair; + if (no_modify) args->isreadonly = (LIBXFS_ISREADONLY | LIBXFS_ISINACTIVE); else if (dangerously) diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index b2dd91b..81864c3 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -97,6 +97,7 @@ usage(void) " -o subopts Override default behaviour, refer to man page.\n" " -t interval Reporting interval in seconds.\n" " -d Repair dangerously.\n" +" -F Force to repair, ignore writable mount checking.\n" " -V Reports version and exits.\n"), progname); exit(1); } @@ -214,12 +215,13 @@ process_args(int argc, char **argv) ag_stride = 0; thread_count = 1; report_interval = PROG_RPT_DEFAULT; + force_repair = 0; /* * XXX have to add suboption processing here * attributes, quotas, nlinks, aligned_inos, sb_fbits */ - while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:")) != EOF) { + while ((c = getopt(argc, argv, "c:o:fl:m:r:LnDvVdPt:F")) != EOF) { switch (c) { case 'D': dumpcore = 1; @@ -329,6 +331,9 @@ process_args(int argc, char **argv) case 't': report_interval = (int)strtol(optarg, NULL, 0); break; + case 'F': + force_repair = 1; + break; case '?': usage(); }