[1/1] xfs: fallback to readonly during recovery

Message ID	20200210211037.1930-1-vfazio@xes-inc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=yyqk=36=vger.kernel.org=linux-xfs-owner@kernel.org> From: Vincent Fazio <vfazio@xes-inc.com> To: linux-xfs@vger.kernel.org Cc: Vincent Fazio <vfazio@xes-inc.com>, Aaron Sierra <asierra@xes-inc.com> Subject: [PATCH 1/1] xfs: fallback to readonly during recovery Date: Mon, 10 Feb 2020 15:10:37 -0600 Message-Id: <20200210211037.1930-1-vfazio@xes-inc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk
Series	[1/1] xfs: fallback to readonly during recovery \| expand [1/1] xfs: fallback to readonly during recovery

Vincent Fazio Feb. 10, 2020, 9:10 p.m. UTC

Previously, XFS would fail to mount if there was an error during log
recovery. This can occur as a result of inevitable I/O errors when
trying to apply the log on read-only ATA devices since the ATA layer
does not support reporting a device as read-only.

Now, if there's an error during log recovery, fall back to norecovery
mode and mark the filesystem as read-only in the XFS and VFS layers.

This roughly approximates the 'errors=remount-ro' mount option in ext4
but is implicit and the scope only covers errors during log recovery.
Since XFS is the default filesystem for some distributions, this change
allows users to continue to use XFS on these read-only ATA devices.

Reviewed-by: Aaron Sierra <asierra@xes-inc.com>
Signed-off-by: Vincent Fazio <vfazio@xes-inc.com>
---
 fs/xfs/xfs_log.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Eric Sandeen Feb. 10, 2020, 9:43 p.m. UTC | #1

On 2/10/20 3:10 PM, Vincent Fazio wrote:
> Previously, XFS would fail to mount if there was an error during log
> recovery. This can occur as a result of inevitable I/O errors when
> trying to apply the log on read-only ATA devices since the ATA layer
> does not support reporting a device as read-only.
> 
> Now, if there's an error during log recovery, fall back to norecovery
> mode and mark the filesystem as read-only in the XFS and VFS layers.
> 
> This roughly approximates the 'errors=remount-ro' mount option in ext4
> but is implicit and the scope only covers errors during log recovery.
> Since XFS is the default filesystem for some distributions, this change
> allows users to continue to use XFS on these read-only ATA devices.

What is the workload or scenario where you need this behavior?

I'm not a big fan of ~silently mounting a filesystem with latent errors,
tbh, but maybe you can explain a bit more about the problem you're solving
here?

Thanks,
-Eric

> Reviewed-by: Aaron Sierra <asierra@xes-inc.com>
> Signed-off-by: Vincent Fazio <vfazio@xes-inc.com>
> ---
>  fs/xfs/xfs_log.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index f6006d94a581..f5b3528ee028 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -739,7 +739,6 @@ xfs_log_mount(
>  			xfs_warn(mp, "log mount/recovery failed: error %d",
>  				error);
>  			xlog_recover_cancel(mp->m_log);
> -			goto out_destroy_ail;
>  		}
>  	}
>  
> @@ -3873,10 +3872,17 @@ xfs_log_force_umount(
>  	/*
>  	 * If this happens during log recovery, don't worry about
>  	 * locking; the log isn't open for business yet.
> +	 *
> +	 * Attempt a read-only, norecovery mount. Ensure the VFS layer is updated.
>  	 */
>  	if (!log ||
>  	    log->l_flags & XLOG_ACTIVE_RECOVERY) {
> -		mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
> +
> +		xfs_notice(mp,
> +"Falling back to no-recovery mode. Filesystem will be inconsistent.");
> +		mp->m_flags |= (XFS_MOUNT_RDONLY | XFS_MOUNT_NORECOVERY);
> +		mp->m_super->s_flags |= SB_RDONLY;
> +
>  		if (mp->m_sb_bp)
>  			mp->m_sb_bp->b_flags |= XBF_DONE;
>  		return 0;
>

Aaron Sierra Feb. 10, 2020, 10:31 p.m. UTC | #2

> From: "Eric Sandeen" <sandeen@sandeen.net>
> Sent: Monday, February 10, 2020 3:43:50 PM

> On 2/10/20 3:10 PM, Vincent Fazio wrote:
>> Previously, XFS would fail to mount if there was an error during log
>> recovery. This can occur as a result of inevitable I/O errors when
>> trying to apply the log on read-only ATA devices since the ATA layer
>> does not support reporting a device as read-only.
>> 
>> Now, if there's an error during log recovery, fall back to norecovery
>> mode and mark the filesystem as read-only in the XFS and VFS layers.
>> 
>> This roughly approximates the 'errors=remount-ro' mount option in ext4
>> but is implicit and the scope only covers errors during log recovery.
>> Since XFS is the default filesystem for some distributions, this change
>> allows users to continue to use XFS on these read-only ATA devices.
> 
> What is the workload or scenario where you need this behavior?
> 
> I'm not a big fan of ~silently mounting a filesystem with latent errors,
> tbh, but maybe you can explain a bit more about the problem you're solving
> here?

Hi Eric,

We use SSDs from multiple vendors that can be configured at power-on (via
GPIO) to be read-write or write-protected. When write-protected we get I/O
errors for any writes that reach the device. We believe that behavior is
correct.

We have found that XFS fails during log recovery even when the log is clean
(apparently due to metadata writes immediately before actual recovery).
Vincent and I believe that mounting read-only without recovery should be
fine even when the log is not clean, since the filesystem will be consistent,
even if out-of-date.

Our customers' use often requires nonvolatile memory to be write-protected
or not based on the device being installed in a development or deployed
system. It is ideal for them to be able to mount their filesystems read-
write when possible and read-only when not without having to alter mount
options.

Aaron

> Thanks,
> -Eric
> 
>> Reviewed-by: Aaron Sierra <asierra@xes-inc.com>
>> Signed-off-by: Vincent Fazio <vfazio@xes-inc.com>
>> ---
>>  fs/xfs/xfs_log.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index f6006d94a581..f5b3528ee028 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -739,7 +739,6 @@ xfs_log_mount(
>>  			xfs_warn(mp, "log mount/recovery failed: error %d",
>>  				error);
>>  			xlog_recover_cancel(mp->m_log);
>> -			goto out_destroy_ail;
>>  		}
>>  	}
>>  
>> @@ -3873,10 +3872,17 @@ xfs_log_force_umount(
>>  	/*
>>  	 * If this happens during log recovery, don't worry about
>>  	 * locking; the log isn't open for business yet.
>> +	 *
>> +	 * Attempt a read-only, norecovery mount. Ensure the VFS layer is updated.
>>  	 */
>>  	if (!log ||
>>  	    log->l_flags & XLOG_ACTIVE_RECOVERY) {
>> -		mp->m_flags |= XFS_MOUNT_FS_SHUTDOWN;
>> +
>> +		xfs_notice(mp,
>> +"Falling back to no-recovery mode. Filesystem will be inconsistent.");
>> +		mp->m_flags |= (XFS_MOUNT_RDONLY | XFS_MOUNT_NORECOVERY);
>> +		mp->m_super->s_flags |= SB_RDONLY;
>> +
>>  		if (mp->m_sb_bp)
>>  			mp->m_sb_bp->b_flags |= XBF_DONE;
>>  		return 0;

Eric Sandeen Feb. 10, 2020, 11:40 p.m. UTC | #3

On 2/10/20 4:31 PM, Aaron Sierra wrote:
>> From: "Eric Sandeen" <sandeen@sandeen.net>
>> Sent: Monday, February 10, 2020 3:43:50 PM
> 
>> On 2/10/20 3:10 PM, Vincent Fazio wrote:
>>> Previously, XFS would fail to mount if there was an error during log
>>> recovery. This can occur as a result of inevitable I/O errors when
>>> trying to apply the log on read-only ATA devices since the ATA layer
>>> does not support reporting a device as read-only.
>>>
>>> Now, if there's an error during log recovery, fall back to norecovery
>>> mode and mark the filesystem as read-only in the XFS and VFS layers.
>>>
>>> This roughly approximates the 'errors=remount-ro' mount option in ext4
>>> but is implicit and the scope only covers errors during log recovery.
>>> Since XFS is the default filesystem for some distributions, this change
>>> allows users to continue to use XFS on these read-only ATA devices.
>>
>> What is the workload or scenario where you need this behavior?
>>
>> I'm not a big fan of ~silently mounting a filesystem with latent errors,
>> tbh, but maybe you can explain a bit more about the problem you're solving
>> here?
> 
> Hi Eric,
> 
> We use SSDs from multiple vendors that can be configured at power-on (via
> GPIO) to be read-write or write-protected. When write-protected we get I/O
> errors for any writes that reach the device. We believe that behavior is
> correct.
> 
> We have found that XFS fails during log recovery even when the log is clean
> (apparently due to metadata writes immediately before actual recovery).

There should be no log recovery if it's clean ...

And I don't see that here - a clean log on a readonly device simply mounts
RO for me by default, with no muss, no fuss.

# mkfs.xfs -f fsfile
...
# losetup /dev/loop0 fsfile
# mount /dev/loop0 mnt
# touch mnt/blah
# umount mnt
# blockdev --setro /dev/loop0
# dd if=/dev/zero of=/dev/loop0 bs=4k count=1
dd: error writing ‘/dev/loop0’: Operation not permitted
# mount /dev/loop0 mnt
mount: /dev/loop0 is write-protected, mounting read-only
# dmesg
[  419.941649] /dev/loop0: Can't open blockdev
[  419.947106] XFS (loop0): Mounting V5 Filesystem
[  419.952895] XFS (loop0): Ending clean mount
# uname -r
5.5.0

> Vincent and I believe that mounting read-only without recovery should be
> fine even when the log is not clean, since the filesystem will be consistent,
> even if out-of-date.

I think that you may be making too many assumptions here, i.e. that "log
recovery failure leaves the filesystem in a consistent state" - and that
may not be true in all cases.

IOWS, transitioning to a new RO state for your particular case may be safe,
but I'm not sure that's universally true for all log replay failures.

> Our customers' use often requires nonvolatile memory to be write-protected
> or not based on the device being installed in a development or deployed
> system. It is ideal for them to be able to mount their filesystems read-
> write when possible and read-only when not without having to alter mount
> options.

From my example above, I'd like to understand more why/how you have a
clean log that fails to mount by default on a readonly block device...
in my testing, no writes get sent to the device when mounting a clean
log.

-Eric

Brian Foster Feb. 11, 2020, 12:55 p.m. UTC | #4

On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
> On 2/10/20 4:31 PM, Aaron Sierra wrote:
> >> From: "Eric Sandeen" <sandeen@sandeen.net>
> >> Sent: Monday, February 10, 2020 3:43:50 PM
> > 
> >> On 2/10/20 3:10 PM, Vincent Fazio wrote:
> >>> Previously, XFS would fail to mount if there was an error during log
> >>> recovery. This can occur as a result of inevitable I/O errors when
> >>> trying to apply the log on read-only ATA devices since the ATA layer
> >>> does not support reporting a device as read-only.
> >>>
> >>> Now, if there's an error during log recovery, fall back to norecovery
> >>> mode and mark the filesystem as read-only in the XFS and VFS layers.
> >>>
> >>> This roughly approximates the 'errors=remount-ro' mount option in ext4
> >>> but is implicit and the scope only covers errors during log recovery.
> >>> Since XFS is the default filesystem for some distributions, this change
> >>> allows users to continue to use XFS on these read-only ATA devices.
> >>
> >> What is the workload or scenario where you need this behavior?
> >>
> >> I'm not a big fan of ~silently mounting a filesystem with latent errors,
> >> tbh, but maybe you can explain a bit more about the problem you're solving
> >> here?
> > 
> > Hi Eric,
> > 
> > We use SSDs from multiple vendors that can be configured at power-on (via
> > GPIO) to be read-write or write-protected. When write-protected we get I/O
> > errors for any writes that reach the device. We believe that behavior is
> > correct.
> > 
> > We have found that XFS fails during log recovery even when the log is clean
> > (apparently due to metadata writes immediately before actual recovery).
> 
> There should be no log recovery if it's clean ...
> 
> And I don't see that here - a clean log on a readonly device simply mounts
> RO for me by default, with no muss, no fuss.
> 
> # mkfs.xfs -f fsfile
> ...
> # losetup /dev/loop0 fsfile
> # mount /dev/loop0 mnt
> # touch mnt/blah
> # umount mnt
> # blockdev --setro /dev/loop0
> # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
> dd: error writing ‘/dev/loop0’: Operation not permitted
> # mount /dev/loop0 mnt
> mount: /dev/loop0 is write-protected, mounting read-only
> # dmesg
> [  419.941649] /dev/loop0: Can't open blockdev
> [  419.947106] XFS (loop0): Mounting V5 Filesystem
> [  419.952895] XFS (loop0): Ending clean mount
> # uname -r
> 5.5.0
> 
> > Vincent and I believe that mounting read-only without recovery should be
> > fine even when the log is not clean, since the filesystem will be consistent,
> > even if out-of-date.
> 
> I think that you may be making too many assumptions here, i.e. that "log
> recovery failure leaves the filesystem in a consistent state" - and that
> may not be true in all cases.
> 
> IOWS, transitioning to a new RO state for your particular case may be safe,
> but I'm not sure that's universally true for all log replay failures.
> 

Agreed. Just to double down on this bit, this is definitely a misguided
assumption. Generally speaking, XFS logging places ordering rules on
metadata writes to the filesystem such that we can guarantee we can
always recover to a consistent point after a crash. By skipping recovery
of a dirty log, you are actively bypassing that mechanism.

For example, if a filesystem transaction modifies several objects, those
objects are logged in a transaction and committed to the physical log.
Once the transaction is committed to the physical log, the individual
objects are free to be written back in any arbitrary order because of
the transactional guarantee that log recovery provides. So nothing
prevents one object from being written back while another is reused (and
re-pinned) before a crash that leaves the filesystem in a corrupted
state. Log recovery is required to update the associated metadata
objects and make the fs consistent again.

In short, it's probably safer to assume any filesystem mounted with a
dirty log and norecovery is in fact corrupted as opposed to the other
way around.

Brian

> > Our customers' use often requires nonvolatile memory to be write-protected
> > or not based on the device being installed in a development or deployed
> > system. It is ideal for them to be able to mount their filesystems read-
> > write when possible and read-only when not without having to alter mount
> > options.
> 
> From my example above, I'd like to understand more why/how you have a
> clean log that fails to mount by default on a readonly block device...
> in my testing, no writes get sent to the device when mounting a clean
> log.
> 
> -Eric
>

Vincent Fazio Feb. 11, 2020, 2:04 p.m. UTC | #5

All,

On 2/11/20 6:55 AM, Brian Foster wrote:
> On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
>> On 2/10/20 4:31 PM, Aaron Sierra wrote:
>>>> From: "Eric Sandeen" <sandeen@sandeen.net>
>>>> Sent: Monday, February 10, 2020 3:43:50 PM
>>>> On 2/10/20 3:10 PM, Vincent Fazio wrote:
>>>>> Previously, XFS would fail to mount if there was an error during log
>>>>> recovery. This can occur as a result of inevitable I/O errors when
>>>>> trying to apply the log on read-only ATA devices since the ATA layer
>>>>> does not support reporting a device as read-only.
>>>>>
>>>>> Now, if there's an error during log recovery, fall back to norecovery
>>>>> mode and mark the filesystem as read-only in the XFS and VFS layers.
>>>>>
>>>>> This roughly approximates the 'errors=remount-ro' mount option in ext4
>>>>> but is implicit and the scope only covers errors during log recovery.
>>>>> Since XFS is the default filesystem for some distributions, this change
>>>>> allows users to continue to use XFS on these read-only ATA devices.
>>>> What is the workload or scenario where you need this behavior?
>>>>
>>>> I'm not a big fan of ~silently mounting a filesystem with latent errors,
>>>> tbh, but maybe you can explain a bit more about the problem you're solving
>>>> here?
>>> Hi Eric,
>>>
>>> We use SSDs from multiple vendors that can be configured at power-on (via
>>> GPIO) to be read-write or write-protected. When write-protected we get I/O
>>> errors for any writes that reach the device. We believe that behavior is
>>> correct.
>>>
>>> We have found that XFS fails during log recovery even when the log is clean
>>> (apparently due to metadata writes immediately before actual recovery).
>> There should be no log recovery if it's clean ...
>>
>> And I don't see that here - a clean log on a readonly device simply mounts
>> RO for me by default, with no muss, no fuss.
>>
>> # mkfs.xfs -f fsfile
>> ...
>> # losetup /dev/loop0 fsfile
>> # mount /dev/loop0 mnt
>> # touch mnt/blah
>> # umount mnt
>> # blockdev --setro /dev/loop0
>> # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
>> dd: error writing ‘/dev/loop0’: Operation not permitted
>> # mount /dev/loop0 mnt
>> mount: /dev/loop0 is write-protected, mounting read-only
>> # dmesg
>> [  419.941649] /dev/loop0: Can't open blockdev
>> [  419.947106] XFS (loop0): Mounting V5 Filesystem
>> [  419.952895] XFS (loop0): Ending clean mount
>> # uname -r
>> 5.5.0
>>
I think it's important to note that you're calling `blockdev --setro` 
here, which sets the device RO at the block layer...

As mentioned in the commit message, the SSDs we work with are ATA 
devices and there is no such mechanism in the ATA spec to report to the 
block layer that the device is RO. What we run into is this:

xfs_log_mount
     xfs_log_recover
         xfs_find_tail
             xfs_clear_stale_blocks
                 xlog_write_log_records
                     xlog_bwrite

the xlog_bwrite fails and triggers the call to xfs_force_shutdown. In 
this specific scenario, we know the log is clean as XFS_MOUNT_WAS_CLEAN 
is set in the log flags, however the stale blocks cannot be removed due 
to the device being write-protected. the call to xfs_clear_stale_blocks 
cannot be obviated because, as mentioned before, ATA devices do not have 
a mechanism to report that they're read-only.

>>> Vincent and I believe that mounting read-only without recovery should be
>>> fine even when the log is not clean, since the filesystem will be consistent,
>>> even if out-of-date.
>> I think that you may be making too many assumptions here, i.e. that "log
>> recovery failure leaves the filesystem in a consistent state" - and that
>> may not be true in all cases.
>>
>> IOWS, transitioning to a new RO state for your particular case may be safe,
>> but I'm not sure that's universally true for all log replay failures.
>>
> Agreed. Just to double down on this bit, this is definitely a misguided
> assumption. Generally speaking, XFS logging places ordering rules on
> metadata writes to the filesystem such that we can guarantee we can
> always recover to a consistent point after a crash. By skipping recovery
> of a dirty log, you are actively bypassing that mechanism.
>
> For example, if a filesystem transaction modifies several objects, those
> objects are logged in a transaction and committed to the physical log.
> Once the transaction is committed to the physical log, the individual
> objects are free to be written back in any arbitrary order because of
> the transactional guarantee that log recovery provides. So nothing
> prevents one object from being written back while another is reused (and
> re-pinned) before a crash that leaves the filesystem in a corrupted
> state. Log recovery is required to update the associated metadata
> objects and make the fs consistent again.
>
> In short, it's probably safer to assume any filesystem mounted with a
> dirty log and norecovery is in fact corrupted as opposed to the other
> way around.
>
> Brian
>
>>> Our customers' use often requires nonvolatile memory to be write-protected
>>> or not based on the device being installed in a development or deployed
>>> system. It is ideal for them to be able to mount their filesystems read-
>>> write when possible and read-only when not without having to alter mount
>>> options.
>>  From my example above, I'd like to understand more why/how you have a
>> clean log that fails to mount by default on a readonly block device...
>> in my testing, no writes get sent to the device when mounting a clean
>> log.
>>
>> -Eric
>>

Eric Sandeen Feb. 11, 2020, 2:29 p.m. UTC | #6

On 2/11/20 8:04 AM, Vincent Fazio wrote:
> All,

...

> As mentioned in the commit message, the SSDs we work with are ATA devices and there is no such mechanism in the ATA spec to report to the block layer that the device is RO. What we run into is this:
> 
> xfs_log_mount
>     xfs_log_recover
>         xfs_find_tail
>             xfs_clear_stale_blocks
>                 xlog_write_log_records
>                     xlog_bwrite
> 
> the xlog_bwrite fails and triggers the call to xfs_force_shutdown. In this specific scenario, we know the log is clean as XFS_MOUNT_WAS_CLEAN is set in the log flags, however the stale blocks cannot be removed due to the device being write-protected. the call to xfs_clear_stale_blocks cannot be obviated because, as mentioned before, ATA devices do not have a mechanism to report that they're read-only.

Ok, at least now we see where the writes are coming from.  A device that
is /marked/ readonly won't get into xfs_clear_stale_blocks.  I'm not sure
if we could just skip the xfs_clear_stale_blocks call if XFS_MOUNT_WAS_CLEAN
is set, or if head == tail and no recovery is needed.  If so, then maybe
rearranging the call to xfs_clear_stale_blocks could help.  I'll let people
who know more log details than I do chime in on that though.

-Eric

Darrick J. Wong Feb. 11, 2020, 3:10 p.m. UTC | #7

On Tue, Feb 11, 2020 at 08:29:59AM -0600, Eric Sandeen wrote:
> On 2/11/20 8:04 AM, Vincent Fazio wrote:
> > All,
> 
> ...
> 
> > As mentioned in the commit message, the SSDs we work with are ATA devices and there is no such mechanism in the ATA spec to report to the block layer that the device is RO. What we run into is this:
> > 
> > xfs_log_mount
> >     xfs_log_recover
> >         xfs_find_tail
> >             xfs_clear_stale_blocks
> >                 xlog_write_log_records
> >                     xlog_bwrite
> > 
> > the xlog_bwrite fails and triggers the call to xfs_force_shutdown.

If your device doesn't accept write requests then mark it read only.

--D

> > In this specific scenario, we know the log is clean as
> > XFS_MOUNT_WAS_CLEAN is set in the log flags, however the stale
> > blocks cannot be removed due to the device being write-protected.
> > the call to xfs_clear_stale_blocks cannot be obviated because, as
> > mentioned before, ATA devices do not have a mechanism to report that
> > they're read-only.
> 
> Ok, at least now we see where the writes are coming from.  A device that
> is /marked/ readonly won't get into xfs_clear_stale_blocks.  I'm not sure
> if we could just skip the xfs_clear_stale_blocks call if XFS_MOUNT_WAS_CLEAN
> is set, or if head == tail and no recovery is needed.  If so, then maybe
> rearranging the call to xfs_clear_stale_blocks could help.  I'll let people
> who know more log details than I do chime in on that though.
> 
> -Eric

Dave Chinner Feb. 11, 2020, 8:04 p.m. UTC | #8

On Tue, Feb 11, 2020 at 08:04:01AM -0600, Vincent Fazio wrote:
> All,
> 
> On 2/11/20 6:55 AM, Brian Foster wrote:
> > On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
> > > On 2/10/20 4:31 PM, Aaron Sierra wrote:
> > > > > From: "Eric Sandeen" <sandeen@sandeen.net>
> > > > > Sent: Monday, February 10, 2020 3:43:50 PM
> > > > > On 2/10/20 3:10 PM, Vincent Fazio wrote:
> > > > > > Previously, XFS would fail to mount if there was an error during log
> > > > > > recovery. This can occur as a result of inevitable I/O errors when
> > > > > > trying to apply the log on read-only ATA devices since the ATA layer
> > > > > > does not support reporting a device as read-only.
> > > > > > 
> > > > > > Now, if there's an error during log recovery, fall back to norecovery
> > > > > > mode and mark the filesystem as read-only in the XFS and VFS layers.
> > > > > > 
> > > > > > This roughly approximates the 'errors=remount-ro' mount option in ext4
> > > > > > but is implicit and the scope only covers errors during log recovery.
> > > > > > Since XFS is the default filesystem for some distributions, this change
> > > > > > allows users to continue to use XFS on these read-only ATA devices.
> > > > > What is the workload or scenario where you need this behavior?
> > > > > 
> > > > > I'm not a big fan of ~silently mounting a filesystem with latent errors,
> > > > > tbh, but maybe you can explain a bit more about the problem you're solving
> > > > > here?
> > > > Hi Eric,
> > > > 
> > > > We use SSDs from multiple vendors that can be configured at power-on (via
> > > > GPIO) to be read-write or write-protected. When write-protected we get I/O
> > > > errors for any writes that reach the device. We believe that behavior is
> > > > correct.
> > > > 
> > > > We have found that XFS fails during log recovery even when the log is clean
> > > > (apparently due to metadata writes immediately before actual recovery).
> > > There should be no log recovery if it's clean ...
> > > 
> > > And I don't see that here - a clean log on a readonly device simply mounts
> > > RO for me by default, with no muss, no fuss.
> > > 
> > > # mkfs.xfs -f fsfile
> > > ...
> > > # losetup /dev/loop0 fsfile
> > > # mount /dev/loop0 mnt
> > > # touch mnt/blah
> > > # umount mnt
> > > # blockdev --setro /dev/loop0
> > > # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
> > > dd: error writing ‘/dev/loop0’: Operation not permitted
> > > # mount /dev/loop0 mnt
> > > mount: /dev/loop0 is write-protected, mounting read-only
> > > # dmesg
> > > [  419.941649] /dev/loop0: Can't open blockdev
> > > [  419.947106] XFS (loop0): Mounting V5 Filesystem
> > > [  419.952895] XFS (loop0): Ending clean mount
> > > # uname -r
> > > 5.5.0
> > > 
> I think it's important to note that you're calling `blockdev --setro` here,
> which sets the device RO at the block layer...
> 
> As mentioned in the commit message, the SSDs we work with are ATA devices
> and there is no such mechanism in the ATA spec to report to the block layer
> that the device is RO. What we run into is this:

This sounds like you are trying to solve the wrong problem - this
isn't actually a filesystem issue. The fundamental problem is you
have a read-only device that isn't being marked by the kernel as
read-only, and everything goes wrong after that.

Write a udev rule to catch these SSDs at instantation time and mark
them read only via software. That way everything understands the
device is read only and behaves correctly, rather than need to make
every layer above the block device understand that a read-write
device is actually read-only...

Cheers,

Dave.

[1/1] xfs: fallback to readonly during recovery

Commit Message

Comments

Patch