Message ID | 20220411213346.762302-4-trondmy@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Ensure mapping errors are reported only once | expand |
在 2022/4/12 5:33, trondmy@kernel.org 写道: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > Any errors reported by the write() system call need to be cleared from > the file descriptor's error tracking. The current call to nfs_wb_all() > causes the error to be reported, but since it doesn't call > file_check_and_advance_wb_err(), we can end up reporting the same error > a second time when the application calls fsync(). > > Note that since Linux 4.13, the rule is that EIO may be reported for > write(), but it must be reported by a subsequent fsync(), so let's just > drop reporting it in write. > > The check for nfs_ctx_key_to_expire() is just a duplicate to the one > already in nfs_write_end(), so let's drop that too. > > Reported-by: ChenXiaoSong <chenxiaosong2@huawei.com> > Fixes: ce368536dd61 ("nfs: nfs_file_write() should check for writeback errors") > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/file.c | 33 +++++++++++++-------------------- > 1 file changed, 13 insertions(+), 20 deletions(-) # 1. wb error mechanism of other filesystem Other filesystem only clear the wb error when calling fsync(), async write will not clear the wb error. # 2. still report unexpected error ... again After merging this patchset(5 patches), second `dd` of the following reproducer will still report unexpected error: No space left on device. Reproducer: nfs server | nfs client ------------------------------|--------------------------------------------- # No space left on server | fallocate -l 100G /svr/nospc | | mount -t nfs $nfs_server_ip:/ /mnt | | # Expected error: No space left on device | dd if=/dev/zero of=/mnt/file count=1 ibs=10K | | # Release space on mountpoint | rm /mnt/nospc | | # Unexpected error: No space left on device | dd if=/dev/zero of=/mnt/file count=1 ibs=10K # 3. my patchset https://patchwork.kernel.org/project/linux-nfs/list/?series=628066 My patchset can fix bug of above reproducer. filemap_sample_wb_err() always return 0 if old writeback error have not been consumed. filemap_check_wb_err() will return the old error even if there is no new writeback error between filemap_sample_wb_err() and filemap_check_wb_err(). ```c since = filemap_sample_wb_err() = 0 errseq_sample if (!(old & ERRSEQ_SEEN)) // nobody see the error return 0; nfs_wb_all // no new error error = filemap_check_wb_err(..., since) != 0 // unexpected error ``` So we still need record writeback error in address_space flags. The writeback error in address_space flags is not used to be reported to userspace, it is just used to detect if there is new error while writeback. if we want to report nuanced writeback error, it is better to detect wb error from filemap_check_errors(), and then return -(file->f_mapping->wb_err & MAX_ERRNO) to userspace without consume it. ```c nfs_mapping_set_error mapping_set_error __filemap_set_wb_err // record error sequence errseq_set set_bit(..., &mapping->flags) // record address_space flag // it is not used to be reported, just used to detect error = filemap_check_errors // -ENOSPC or -EIO test_and_clear_bit(..., &mapping->flags) // error bit cleared // now we try to return nuanced writeback error if (error) return filemap_check_wb_err(file->f_mapping, 0); return -(file->f_mapping->wb_err & MAX_ERRNO) ```
On Tue, 2022-04-12 at 14:24 +0800, chenxiaosong (A) wrote: > 在 2022/4/12 5:33, trondmy@kernel.org 写道: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > Any errors reported by the write() system call need to be cleared > > from > > the file descriptor's error tracking. The current call to > > nfs_wb_all() > > causes the error to be reported, but since it doesn't call > > file_check_and_advance_wb_err(), we can end up reporting the same > > error > > a second time when the application calls fsync(). > > > > Note that since Linux 4.13, the rule is that EIO may be reported > > for > > write(), but it must be reported by a subsequent fsync(), so let's > > just > > drop reporting it in write. > > > > The check for nfs_ctx_key_to_expire() is just a duplicate to the > > one > > already in nfs_write_end(), so let's drop that too. > > > > Reported-by: ChenXiaoSong <chenxiaosong2@huawei.com> > > Fixes: ce368536dd61 ("nfs: nfs_file_write() should check for > > writeback errors") > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/file.c | 33 +++++++++++++-------------------- > > 1 file changed, 13 insertions(+), 20 deletions(-) > > > # 1. wb error mechanism of other filesystem > > Other filesystem only clear the wb error when calling fsync(), async > write will not clear the wb error. > > > # 2. still report unexpected error ... again > > After merging this patchset(5 patches), second `dd` of the following > reproducer will still report unexpected error: No space left on > device. > > Reproducer: > nfs server | nfs client > ------------------------------|-------------------------------------- > ------- > # No space left on server | > fallocate -l 100G /svr/nospc | > | mount -t nfs $nfs_server_ip:/ /mnt > | > | # Expected error: No space left on > device > | dd if=/dev/zero of=/mnt/file count=1 > ibs=10K > | > | # Release space on mountpoint > | rm /mnt/nospc > | > | # Unexpected error: No space left on > device > | dd if=/dev/zero of=/mnt/file count=1 > ibs=10K > > > # 3. my patchset > > https://patchwork.kernel.org/project/linux-nfs/list/?series=628066 > > My patchset can fix bug of above reproducer. > > filemap_sample_wb_err() always return 0 if old writeback error > have not been consumed. filemap_check_wb_err() will return the old > error > even if there is no new writeback error between > filemap_sample_wb_err() and > filemap_check_wb_err(). > > ```c > since = filemap_sample_wb_err() = 0 > errseq_sample > if (!(old & ERRSEQ_SEEN)) // nobody see the error > return 0; > nfs_wb_all // no new error > error = filemap_check_wb_err(..., since) != 0 // unexpected error > ``` > > So we still need record writeback error in address_space flags. The > writeback > error in address_space flags is not used to be reported to userspace, > it > is just > used to detect if there is new error while writeback. I understand all that. The point you appear to be missing is that this is in fact in agreement with the documented behaviour in the write(2) and fsync(2) manpages. These errors are supposed to be reported once, even if they were caused by a write to a different file descriptor. In fact, even with your change, if you make the second 'dd' call fsync (by adding a conv=fsync), I would expect it to report the exact same ENOSPC error, and that would be correct behaviour! So my patches are more concerned with the fact that we appear to be reporting the same error more than once, rather than the fact that we're reporting them in the second attempt at I/O. As far as I'm concerned, that is the main change that is needed to meet the behaviour that is documented in the manpages.
在 2022/4/12 20:16, Trond Myklebust 写道: > I understand all that. The point you appear to be missing is that this > is in fact in agreement with the documented behaviour in the write(2) > and fsync(2) manpages. These errors are supposed to be reported once, > even if they were caused by a write to a different file descriptor. > > In fact, even with your change, if you make the second 'dd' call fsync > (by adding a conv=fsync), I would expect it to report the exact same > ENOSPC error, and that would be correct behaviour! > > So my patches are more concerned with the fact that we appear to be > reporting the same error more than once, rather than the fact that > we're reporting them in the second attempt at I/O. As far as I'm > concerned, that is the main change that is needed to meet the behaviour > that is documented in the manpages. After merging my patchset, when make the second 'dd' call fsync (by adding a conv=fsync), it can report ENOSPC error by calling fsync() syscall. And when make the second 'dd' sync write (by adding a oflag=sync), it can report ENOSPC error too: ```c write ksys_write vfs_write new_sync_write call_write_iter nfs_file_write generic_write_sync vfs_fsync_range nfs_file_fsync ```
在 2022/4/12 21:13, chenxiaosong (A) 写道: > 在 2022/4/12 20:16, Trond Myklebust 写道: >> I understand all that. The point you appear to be missing is that this >> is in fact in agreement with the documented behaviour in the write(2) >> and fsync(2) manpages. These errors are supposed to be reported once, >> even if they were caused by a write to a different file descriptor. >> >> In fact, even with your change, if you make the second 'dd' call fsync >> (by adding a conv=fsync), I would expect it to report the exact same >> ENOSPC error, and that would be correct behaviour! >> >> So my patches are more concerned with the fact that we appear to be >> reporting the same error more than once, rather than the fact that >> we're reporting them in the second attempt at I/O. As far as I'm >> concerned, that is the main change that is needed to meet the behaviour >> that is documented in the manpages. > > > After merging my patchset, when make the second 'dd' call fsync (by > adding a conv=fsync), it can report ENOSPC error by calling fsync() > syscall. > > And when make the second 'dd' sync write (by adding a oflag=sync), it > can report ENOSPC error too: > > ```c > write > ksys_write > vfs_write > new_sync_write > call_write_iter > nfs_file_write > generic_write_sync > vfs_fsync_range > nfs_file_fsync > ``` > On other filesystem, wb error is only cleared when doing fsync() or sync write(), it should not clear the wb error when doing async write(). Your patch will clear the wb error when doing async write(). And my patchset mainly fix following problem: filemap_sample_wb_err() always return 0 if old writeback error have not been consumed. filemap_check_wb_err() will return the old error even if there is no new writeback error between filemap_sample_wb_err() and filemap_check_wb_err(). ```c since = filemap_sample_wb_err() = 0 errseq_sample if (!(old & ERRSEQ_SEEN)) // nobody see the error return 0; nfs_wb_all // no new error error = filemap_check_wb_err(..., since) != 0 // unexpected error ```
在 2022/4/12 21:29, chenxiaosong (A) 写道: > 在 2022/4/12 21:13, chenxiaosong (A) 写道: > On other filesystem, wb error is only cleared when doing fsync() or sync > write(), it should not clear the wb error when doing async write(). Your > patch will clear the wb error when doing async write(). > > And my patchset mainly fix following problem: > > filemap_sample_wb_err() always return 0 if old writeback error > have not been consumed. filemap_check_wb_err() will return the old error > even if there is no new writeback error between filemap_sample_wb_err() and > filemap_check_wb_err(). > > ```c > since = filemap_sample_wb_err() = 0 > errseq_sample > if (!(old & ERRSEQ_SEEN)) // nobody see the error > return 0; > nfs_wb_all // no new error > error = filemap_check_wb_err(..., since) != 0 // unexpected error > ``` > After merging your patchset, NFS will report and clear wb error on _async_ write(), even there is no new wb error while nfs_wb_all, is this reasonable? And more importantly, we can not detect new error by using filemap_sample_wb_err()/filemap_check_wb_err() while nfs_wb_all(),just as I described: ```c since = filemap_sample_wb_err() = 0 errseq_sample if (!(old & ERRSEQ_SEEN)) // nobody see the error return 0; nfs_wb_all // no new error error = filemap_check_wb_err(..., since) != 0 // unexpected error ``` .
diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 95e1236d95c5..8211a7aa799c 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -598,18 +598,6 @@ static const struct vm_operations_struct nfs_file_vm_ops = { .page_mkwrite = nfs_vm_page_mkwrite, }; -static int nfs_need_check_write(struct file *filp, struct inode *inode, - int error) -{ - struct nfs_open_context *ctx; - - ctx = nfs_file_open_context(filp); - if (nfs_error_is_fatal_on_server(error) || - nfs_ctx_key_to_expire(ctx, inode)) - return 1; - return 0; -} - ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; @@ -637,7 +625,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) if (iocb->ki_flags & IOCB_APPEND || iocb->ki_pos > i_size_read(inode)) { result = nfs_revalidate_file_size(inode, file); if (result) - goto out; + return result; } nfs_clear_invalid_mapping(file->f_mapping); @@ -673,17 +661,22 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) } result = generic_write_sync(iocb, written); if (result < 0) - goto out; - + return result; +out: /* Return error values */ error = filemap_check_wb_err(file->f_mapping, since); - if (nfs_need_check_write(file, inode, error)) { - int err = nfs_wb_all(inode); - if (err < 0) - result = err; + switch (error) { + default: + break; + case -EDQUOT: + case -EFBIG: + case -ENOSPC: + nfs_wb_all(inode); + error = file_check_and_advance_wb_err(file); + if (error < 0) + result = error; } nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written); -out: return result; out_swapfile: