Message ID | 5A6E8092.8090701@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
It looks good to me. Reviewed-by: Changwei Ge <ge.changwei@h3c.com> On 2018/1/29 10:02, piaojun wrote: > We should not reuse the dirty bh in jbd2 directly due to the following > situation: > > 1. When removing extent rec, we will dirty the bhs of extent rec and > truncate log at the same time, and hand them over to jbd2. > 2. The bhs are submitted to jbd2 area successfully. > 3. The write-back thread of device help flush the bhs to disk but > encounter write error due to abnormal storage link. > 4. After a while the storage link become normal. Truncate log flush > worker triggered by the next space reclaiming found the dirty bh of > truncate log and clear its 'BH_Write_EIO' and then set it uptodate in > __ocfs2_journal_access(): > > ocfs2_truncate_log_worker > ocfs2_flush_truncate_log > __ocfs2_flush_truncate_log > ocfs2_replay_truncate_records > ocfs2_journal_access_di > __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata. > > 5. Then jbd2 will flush the bh of truncate log to disk, but the bh of > extent rec is still in error state, and unfortunately nobody will > take care of it. > 6. At last the space of extent rec was not reduced, but truncate log > flush worker have given it back to globalalloc. That will cause > duplicate cluster problem which could be identified by fsck.ocfs2. > > Sadlly we can hardly revert this but set fs read-only in case of > ruining atomicity and consistency of space reclaim. > > Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access") > > Signed-off-by: Jun Piao <piaojun@huawei.com> > Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> > --- > fs/ocfs2/journal.c | 23 ++++++++++++----------- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c > index 3630443..e5dcea6 100644 > --- a/fs/ocfs2/journal.c > +++ b/fs/ocfs2/journal.c > @@ -666,23 +666,24 @@ static int __ocfs2_journal_access(handle_t *handle, > /* we can safely remove this assertion after testing. */ > if (!buffer_uptodate(bh)) { > mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n"); > - mlog(ML_ERROR, "b_blocknr=%llu\n", > - (unsigned long long)bh->b_blocknr); > + mlog(ML_ERROR, "b_blocknr=%llu, b_state=0x%lx\n", > + (unsigned long long)bh->b_blocknr, bh->b_state); > > lock_buffer(bh); > /* > - * A previous attempt to write this buffer head failed. > - * Nothing we can do but to retry the write and hope for > - * the best. > + * A previous transaction with a couple of buffer heads fail > + * to checkpoint, so all the bhs are marked as BH_Write_EIO. > + * For current transaction, the bh is just among those error > + * bhs which previous transaction handle. We can't just clear > + * its BH_Write_EIO and reuse directly, since other bhs are > + * not written to disk yet and that will cause metadata > + * inconsistency. So we should set fs read-only to avoid > + * further damage. > */ > if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) { > - clear_buffer_write_io_error(bh); > - set_buffer_uptodate(bh); > - } > - > - if (!buffer_uptodate(bh)) { > unlock_buffer(bh); > - return -EIO; > + return ocfs2_error(osb->sb, "A previous attempt to " > + "write this buffer head failed\n"); > } > unlock_buffer(bh); > } >
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 3630443..e5dcea6 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -666,23 +666,24 @@ static int __ocfs2_journal_access(handle_t *handle, /* we can safely remove this assertion after testing. */ if (!buffer_uptodate(bh)) { mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n"); - mlog(ML_ERROR, "b_blocknr=%llu\n", - (unsigned long long)bh->b_blocknr); + mlog(ML_ERROR, "b_blocknr=%llu, b_state=0x%lx\n", + (unsigned long long)bh->b_blocknr, bh->b_state); lock_buffer(bh); /* - * A previous attempt to write this buffer head failed. - * Nothing we can do but to retry the write and hope for - * the best. + * A previous transaction with a couple of buffer heads fail + * to checkpoint, so all the bhs are marked as BH_Write_EIO. + * For current transaction, the bh is just among those error + * bhs which previous transaction handle. We can't just clear + * its BH_Write_EIO and reuse directly, since other bhs are + * not written to disk yet and that will cause metadata + * inconsistency. So we should set fs read-only to avoid + * further damage. */ if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) { - clear_buffer_write_io_error(bh); - set_buffer_uptodate(bh); - } - - if (!buffer_uptodate(bh)) { unlock_buffer(bh); - return -EIO; + return ocfs2_error(osb->sb, "A previous attempt to " + "write this buffer head failed\n"); } unlock_buffer(bh); }