diff mbox series

[v2] exfat: check disk status during buffer write

Message ID 20240723105412.3615926-1-dongliang.cui@unisoc.com (mailing list archive)
State New
Headers show
Series [v2] exfat: check disk status during buffer write | expand

Commit Message

Dongliang Cui July 23, 2024, 10:54 a.m. UTC
We found that when writing a large file through buffer write, if the
disk is inaccessible, exFAT does not return an error normally, which
leads to the writing process not stopping properly.

To easily reproduce this issue, you can follow the steps below:

1. format a device to exFAT and then mount (with a full disk erase)
2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192
3. eject the device

You may find that the dd process does not stop immediately and may
continue for a long time.

The root cause of this issue is that during buffer write process,
exFAT does not need to access the disk to look up directory entries
or the FAT table (whereas FAT would do) every time data is written.
Instead, exFAT simply marks the buffer as dirty and returns,
delegating the writeback operation to the writeback process.

If the disk cannot be accessed at this time, the error will only be
returned to the writeback process, and the original process will not
receive the error, so it cannot be returned to the user side.

When the disk cannot be accessed normally, an error should be returned
to stop the writing process.

Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
---
Changes in v2:
 - Refer to the block_device_ejected in ext4 for determining the
   device status.
 - Change the disk_check process to exfat_get_block to cover all
   buffer write scenarios.
---
---
 fs/exfat/inode.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Sungjong Seo July 24, 2024, 7:03 a.m. UTC | #1
> We found that when writing a large file through buffer write, if the
> disk is inaccessible, exFAT does not return an error normally, which
> leads to the writing process not stopping properly.
> 
> To easily reproduce this issue, you can follow the steps below:
> 
> 1. format a device to exFAT and then mount (with a full disk erase)
> 2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192
> 3. eject the device
> 
> You may find that the dd process does not stop immediately and may
> continue for a long time.
> 
> The root cause of this issue is that during buffer write process,
> exFAT does not need to access the disk to look up directory entries
> or the FAT table (whereas FAT would do) every time data is written.
> Instead, exFAT simply marks the buffer as dirty and returns,
> delegating the writeback operation to the writeback process.
> 
> If the disk cannot be accessed at this time, the error will only be
> returned to the writeback process, and the original process will not
> receive the error, so it cannot be returned to the user side.
> 
> When the disk cannot be accessed normally, an error should be returned
> to stop the writing process.
> 
> Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com>
> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> ---
> Changes in v2:
>  - Refer to the block_device_ejected in ext4 for determining the
>    device status.
>  - Change the disk_check process to exfat_get_block to cover all
>    buffer write scenarios.
> ---
> ---
>  fs/exfat/inode.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
> index dd894e558c91..463cebb19852 100644
> --- a/fs/exfat/inode.c
> +++ b/fs/exfat/inode.c
> @@ -8,6 +8,7 @@
>  #include <linux/mpage.h>
>  #include <linux/bio.h>
>  #include <linux/blkdev.h>
> +#include <linux/backing-dev-defs.h>
>  #include <linux/time.h>
>  #include <linux/writeback.h>
>  #include <linux/uio.h>
> @@ -275,6 +276,13 @@ static int exfat_map_new_buffer(struct
> exfat_inode_info *ei,
>  	return 0;
>  }
> 
> +static int exfat_block_device_ejected(struct super_block *sb)
> +{
> +	struct backing_dev_info *bdi = sb->s_bdi;
> +
> +	return bdi->dev == NULL;
> +}
Have you tested with this again?

> +
>  static int exfat_get_block(struct inode *inode, sector_t iblock,
>  		struct buffer_head *bh_result, int create)
>  {
> @@ -290,6 +298,9 @@ static int exfat_get_block(struct inode *inode,
> sector_t iblock,
>  	sector_t valid_blks;
>  	loff_t pos;
> 
> +	if (exfat_block_device_ejected(sb))
This looks better than the modified location in the last patch.
However, the caller of this function may not be interested in exfat
error handling, so here we should call exfat_fs_error_ratelimit()
with an appropriate error message.

> +		return -ENODEV;
> +
>  	mutex_lock(&sbi->s_lock);
>  	last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size_read(inode), sb);
>  	if (iblock >= last_block && !create)
> --
> 2.25.1
dongliang cui July 24, 2024, 7:24 a.m. UTC | #2
On Wed, Jul 24, 2024 at 3:03 PM Sungjong Seo <sj1557.seo@samsung.com> wrote:
>
> > We found that when writing a large file through buffer write, if the
> > disk is inaccessible, exFAT does not return an error normally, which
> > leads to the writing process not stopping properly.
> >
> > To easily reproduce this issue, you can follow the steps below:
> >
> > 1. format a device to exFAT and then mount (with a full disk erase)
> > 2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192
> > 3. eject the device
> >
> > You may find that the dd process does not stop immediately and may
> > continue for a long time.
> >
> > The root cause of this issue is that during buffer write process,
> > exFAT does not need to access the disk to look up directory entries
> > or the FAT table (whereas FAT would do) every time data is written.
> > Instead, exFAT simply marks the buffer as dirty and returns,
> > delegating the writeback operation to the writeback process.
> >
> > If the disk cannot be accessed at this time, the error will only be
> > returned to the writeback process, and the original process will not
> > receive the error, so it cannot be returned to the user side.
> >
> > When the disk cannot be accessed normally, an error should be returned
> > to stop the writing process.
> >
> > Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com>
> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > ---
> > Changes in v2:
> >  - Refer to the block_device_ejected in ext4 for determining the
> >    device status.
> >  - Change the disk_check process to exfat_get_block to cover all
> >    buffer write scenarios.
> > ---
> > ---
> >  fs/exfat/inode.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
> > index dd894e558c91..463cebb19852 100644
> > --- a/fs/exfat/inode.c
> > +++ b/fs/exfat/inode.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/mpage.h>
> >  #include <linux/bio.h>
> >  #include <linux/blkdev.h>
> > +#include <linux/backing-dev-defs.h>
> >  #include <linux/time.h>
> >  #include <linux/writeback.h>
> >  #include <linux/uio.h>
> > @@ -275,6 +276,13 @@ static int exfat_map_new_buffer(struct
> > exfat_inode_info *ei,
> >       return 0;
> >  }
> >
> > +static int exfat_block_device_ejected(struct super_block *sb)
> > +{
> > +     struct backing_dev_info *bdi = sb->s_bdi;
> > +
> > +     return bdi->dev == NULL;
> > +}
> Have you tested with this again?
Yes, I tested it in this way. The user side can receive the -ENODEV error
after the device is ejected.
dongliang.cui@deivice:/data/tmp # dd if=/dev/zero of=test.img bs=1M count=10240
dd: test.img: write error: No such device
1274+0 records in
1273+1 records out
1335635968 bytes (1.2 G) copied, 8.060 s, 158 M/s

>
> > +
> >  static int exfat_get_block(struct inode *inode, sector_t iblock,
> >               struct buffer_head *bh_result, int create)
> >  {
> > @@ -290,6 +298,9 @@ static int exfat_get_block(struct inode *inode,
> > sector_t iblock,
> >       sector_t valid_blks;
> >       loff_t pos;
> >
> > +     if (exfat_block_device_ejected(sb))
> This looks better than the modified location in the last patch.
> However, the caller of this function may not be interested in exfat
> error handling, so here we should call exfat_fs_error_ratelimit()
> with an appropriate error message.
Thank you for the reminder. I will make the changes in the next version.

>
> > +             return -ENODEV;
> > +
> >       mutex_lock(&sbi->s_lock);
> >       last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size_read(inode), sb);
> >       if (iblock >= last_block && !create)
> > --
> > 2.25.1
>
>
Sungjong Seo July 24, 2024, 7:50 a.m. UTC | #3
> On Wed, Jul 24, 2024 at 3:03 PM Sungjong Seo <sj1557.seo@samsung.com>
> wrote:
> >
[snip]
> > >
> > > +static int exfat_block_device_ejected(struct super_block *sb)
> > > +{
> > > +     struct backing_dev_info *bdi = sb->s_bdi;
> > > +
> > > +     return bdi->dev == NULL;
> > > +}
> > Have you tested with this again?
> Yes, I tested it in this way. The user side can receive the -ENODEV error
> after the device is ejected.
> dongliang.cui@deivice:/data/tmp # dd if=/dev/zero of=test.img bs=1M
> count=10240
> dd: test.img: write error: No such device
> 1274+0 records in
> 1273+1 records out
> 1335635968 bytes (1.2 G) copied, 8.060 s, 158 M/s
Oops!, write() seems to return ENODEV that man page does not have.
In exfat_map_cluster, it was necessary to distinguish and return error
values, but now that explicitly differentiated error messages will be
printed. So, why not return EIO again? It seem appropriate to return EIO
instead of ENODEV from the read/write syscall.

> 
> >
> > > +
> > >  static int exfat_get_block(struct inode *inode, sector_t iblock,
> > >               struct buffer_head *bh_result, int create)
> > >  {
> > > @@ -290,6 +298,9 @@ static int exfat_get_block(struct inode *inode,
> > > sector_t iblock,
> > >       sector_t valid_blks;
> > >       loff_t pos;
> > >
> > > +     if (exfat_block_device_ejected(sb))
> > This looks better than the modified location in the last patch.
> > However, the caller of this function may not be interested in exfat
> > error handling, so here we should call exfat_fs_error_ratelimit()
> > with an appropriate error message.
> Thank you for the reminder. I will make the changes in the next version.
Sounds good!

> 
> >
> > > +             return -ENODEV;
> > > +
> > >       mutex_lock(&sbi->s_lock);
> > >       last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size_read(inode), sb);
> > >       if (iblock >= last_block && !create)
> > > --
> > > 2.25.1
> >
> >
dongliang cui July 24, 2024, 8:27 a.m. UTC | #4
On Wed, Jul 24, 2024 at 3:50 PM Sungjong Seo <sj1557.seo@samsung.com> wrote:
>
> > On Wed, Jul 24, 2024 at 3:03 PM Sungjong Seo <sj1557.seo@samsung.com>
> > wrote:
> > >
> [snip]
> > > >
> > > > +static int exfat_block_device_ejected(struct super_block *sb)
> > > > +{
> > > > +     struct backing_dev_info *bdi = sb->s_bdi;
> > > > +
> > > > +     return bdi->dev == NULL;
> > > > +}
> > > Have you tested with this again?
> > Yes, I tested it in this way. The user side can receive the -ENODEV error
> > after the device is ejected.
> > dongliang.cui@deivice:/data/tmp # dd if=/dev/zero of=test.img bs=1M
> > count=10240
> > dd: test.img: write error: No such device
> > 1274+0 records in
> > 1273+1 records out
> > 1335635968 bytes (1.2 G) copied, 8.060 s, 158 M/s
> Oops!, write() seems to return ENODEV that man page does not have.
> In exfat_map_cluster, it was necessary to distinguish and return error
> values, but now that explicitly differentiated error messages will be
> printed. So, why not return EIO again? It seem appropriate to return EIO
> instead of ENODEV from the read/write syscall.
Yes, indeed.
I will make the changes all together in the next version.
Thanks!
>
> >
> > >
> > > > +
> > > >  static int exfat_get_block(struct inode *inode, sector_t iblock,
> > > >               struct buffer_head *bh_result, int create)
> > > >  {
> > > > @@ -290,6 +298,9 @@ static int exfat_get_block(struct inode *inode,
> > > > sector_t iblock,
> > > >       sector_t valid_blks;
> > > >       loff_t pos;
> > > >
> > > > +     if (exfat_block_device_ejected(sb))
> > > This looks better than the modified location in the last patch.
> > > However, the caller of this function may not be interested in exfat
> > > error handling, so here we should call exfat_fs_error_ratelimit()
> > > with an appropriate error message.
> > Thank you for the reminder. I will make the changes in the next version.
> Sounds good!
>
> >
> > >
> > > > +             return -ENODEV;
> > > > +
> > > >       mutex_lock(&sbi->s_lock);
> > > >       last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size_read(inode), sb);
> > > >       if (iblock >= last_block && !create)
> > > > --
> > > > 2.25.1
> > >
> > >
>
>
Hellwig, Christoph July 24, 2024, 1:07 p.m. UTC | #5
> +static int exfat_block_device_ejected(struct super_block *sb)
> +{
> +	struct backing_dev_info *bdi = sb->s_bdi;
> +
> +	return bdi->dev == NULL;
> +}

NAK, file systems have no business looking at this.  What you probably
really want is to implement the ->shutdown method for exfat so it gets
called on device removal.
Sungjong Seo July 25, 2024, 6 a.m. UTC | #6
> > +static int exfat_block_device_ejected(struct super_block *sb)
> > +{
> > +	struct backing_dev_info *bdi = sb->s_bdi;
> > +
> > +	return bdi->dev == NULL;
> > +}
> 
> NAK, file systems have no business looking at this.  What you probably
> really want is to implement the ->shutdown method for exfat so it gets
> called on device removal.

Oh! Thank you for your additional comments. I completely missed this part.
I agree with what you said. Implementing ->shutdown seems to be the
right decision.
dongliang cui July 26, 2024, 6:18 a.m. UTC | #7
On Thu, Jul 25, 2024 at 2:00 PM Sungjong Seo <sj1557.seo@samsung.com> wrote:
>
> > > +static int exfat_block_device_ejected(struct super_block *sb)
> > > +{
> > > +   struct backing_dev_info *bdi = sb->s_bdi;
> > > +
> > > +   return bdi->dev == NULL;
> > > +}
> >
> > NAK, file systems have no business looking at this.  What you probably
> > really want is to implement the ->shutdown method for exfat so it gets
> > called on device removal.
>
> Oh! Thank you for your additional comments. I completely missed this part.
> I agree with what you said. Implementing ->shutdown seems to be the
> right decision.
>
Thank you for your suggestions. I'll test it out this way.
diff mbox series

Patch

diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index dd894e558c91..463cebb19852 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -8,6 +8,7 @@ 
 #include <linux/mpage.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
+#include <linux/backing-dev-defs.h>
 #include <linux/time.h>
 #include <linux/writeback.h>
 #include <linux/uio.h>
@@ -275,6 +276,13 @@  static int exfat_map_new_buffer(struct exfat_inode_info *ei,
 	return 0;
 }
 
+static int exfat_block_device_ejected(struct super_block *sb)
+{
+	struct backing_dev_info *bdi = sb->s_bdi;
+
+	return bdi->dev == NULL;
+}
+
 static int exfat_get_block(struct inode *inode, sector_t iblock,
 		struct buffer_head *bh_result, int create)
 {
@@ -290,6 +298,9 @@  static int exfat_get_block(struct inode *inode, sector_t iblock,
 	sector_t valid_blks;
 	loff_t pos;
 
+	if (exfat_block_device_ejected(sb))
+		return -ENODEV;
+
 	mutex_lock(&sbi->s_lock);
 	last_block = EXFAT_B_TO_BLK_ROUND_UP(i_size_read(inode), sb);
 	if (iblock >= last_block && !create)