Message ID | 20180816112426.12218-1-lchen@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix crash on ocfs2_duplicate_clusters_by_page | expand |
Hi Larry, Do you have a method to reproduce this issue? I would be nice that such a method is provided :-) On 2018/8/16 19:24, Larry Chen wrote: > ocfs2_duplicate_clusters_by_page may crash if an extent's page is dirty. > When a page has not been written back, it is still in dirty state. If at > that moment, ocfs2_duplicate_clusters_by_page is called against this > page, the crash happens. > > To fix this bug, we can just unlock the page and wait the page until > it's not dirty. > > I don't know whether the patch is appropriate, so I need comments, > thanks. > > The following is the core dump: > > kernel BUG at /root/code/ocfs2/refcounttree.c:2961! > __ocfs2_move_extent+0x80/0x450 [ocfs2] Because we may work on different base of git tree. Can you paste in what function the crash occurs? I guess you might miss something with your backtrace. Function __ocfs2_move_extent shouldn't be in file refcounttree.c Thanks, Changwei > ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] > ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] > __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] > ocfs2_move_extents+0x180/0x3b0 [ocfs2] > ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] > ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] > ocfs2_ioctl+0x253/0x640 [ocfs2] > do_vfs_ioctl+0x90/0x5f0 > SyS_ioctl+0x74/0x80 > do_syscall_64+0x74/0x140 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > To: mfasheh@versity.com, > jlbec@evilplan.org > Cc: linux-kernel@vger.kernel.org, > ocfs2-devel@oss.oracle.com, > akpm@linux-foundation.org > > Signed-off-by: Larry Chen <lchen@suse.com> > --- > fs/ocfs2/refcounttree.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c > index 7869622af22a..ee3b9dbbc310 100644 > --- a/fs/ocfs2/refcounttree.c > +++ b/fs/ocfs2/refcounttree.c > @@ -2946,6 +2946,7 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, > if (map_end & (PAGE_SIZE - 1)) > to = map_end & (PAGE_SIZE - 1); > > +retry: > page = find_or_create_page(mapping, page_index, GFP_NOFS); > if (!page) { > ret = -ENOMEM; > @@ -2957,8 +2958,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, > * In case PAGE_SIZE <= CLUSTER_SIZE, This page > * can't be dirtied before we CoW it out. > */ > - if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) > - BUG_ON(PageDirty(page)); > + if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { > + if (PageDirty(page)) { > + unlock_page(page); > + cond_resched(); > + goto retry; > + } > + } > > if (!PageUptodate(page)) { > ret = block_read_full_page(page, ocfs2_get_block);
Hi Changwei, It's easy to reproduce the bug. The bug was found during developing a defragmentation tool for ocfs2. To achieve that, I use ioctl interface against each file like: struct ocfs2_move_extents me = { .me_start = 0, .me_len = buf->st_size, .me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, }; fd = open64(file, O_RDWR, 0400); ret = ioctl(fd, OCFS2_IOC_MOVE_EXT, &me); Then I setup a cluster with only 1 node, and keep writing and deleting a file while another process keep using the ioctl to defragment the file. After the patch applied, the bug disappeared, but I'm not sure whether it has been totally fixed. Thanks Larry On 08/16/2018 07:43 PM, Changwei Ge wrote: > Hi Larry, > > Do you have a method to reproduce this issue? > > I would be nice that such a method is provided :-) > > > On 2018/8/16 19:24, Larry Chen wrote: >> ocfs2_duplicate_clusters_by_page may crash if an extent's page is dirty. >> When a page has not been written back, it is still in dirty state. If at >> that moment, ocfs2_duplicate_clusters_by_page is called against this >> page, the crash happens. >> >> To fix this bug, we can just unlock the page and wait the page until >> it's not dirty. >> >> I don't know whether the patch is appropriate, so I need comments, >> thanks. >> >> The following is the core dump: >> >> kernel BUG at /root/code/ocfs2/refcounttree.c:2961! >> __ocfs2_move_extent+0x80/0x450 [ocfs2] > > Because we may work on different base of git tree. > Can you paste in what function the crash occurs? > I guess you might miss something with your backtrace. Function > __ocfs2_move_extent shouldn't be in > file refcounttree.c > > Thanks, > Changwei > >> ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] >> ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] >> __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] >> ocfs2_move_extents+0x180/0x3b0 [ocfs2] >> ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] >> ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] >> ocfs2_ioctl+0x253/0x640 [ocfs2] >> do_vfs_ioctl+0x90/0x5f0 >> SyS_ioctl+0x74/0x80 >> do_syscall_64+0x74/0x140 >> entry_SYSCALL_64_after_hwframe+0x3d/0xa2 >> >> To: mfasheh@versity.com, >> jlbec@evilplan.org >> Cc: linux-kernel@vger.kernel.org, >> ocfs2-devel@oss.oracle.com, >> akpm@linux-foundation.org >> >> Signed-off-by: Larry Chen <lchen@suse.com> >> --- >> fs/ocfs2/refcounttree.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c >> index 7869622af22a..ee3b9dbbc310 100644 >> --- a/fs/ocfs2/refcounttree.c >> +++ b/fs/ocfs2/refcounttree.c >> @@ -2946,6 +2946,7 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, >> if (map_end & (PAGE_SIZE - 1)) >> to = map_end & (PAGE_SIZE - 1); >> >> +retry: >> page = find_or_create_page(mapping, page_index, GFP_NOFS); >> if (!page) { >> ret = -ENOMEM; >> @@ -2957,8 +2958,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, >> * In case PAGE_SIZE <= CLUSTER_SIZE, This page >> * can't be dirtied before we CoW it out. >> */ >> - if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) >> - BUG_ON(PageDirty(page)); >> + if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { >> + if (PageDirty(page)) { >> + unlock_page(page); >> + cond_resched(); >> + goto retry; >> + } >> + } >> >> if (!PageUptodate(page)) { >> ret = block_read_full_page(page, ocfs2_get_block); > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
This is backtrace printed from crash bt command crash> bt PID: 7655 TASK: ffff8f2b44e70640 CPU: 0 COMMAND: "defragfs.ocfs2" #0 [ffff9fbdc887f830] machine_kexec at ffffffff8405bd81 #1 [ffff9fbdc887f880] __crash_kexec at ffffffff8411be5e #2 [ffff9fbdc887f938] crash_kexec at ffffffff8411caed #3 [ffff9fbdc887f950] oops_end at ffffffff8402e0fe #4 [ffff9fbdc887f970] do_trap at ffffffff8402c196 #5 [ffff9fbdc887f9b0] do_error_trap at ffffffff8402c210 #6 [ffff9fbdc887fa60] invalid_op at ffffffff84800fa1 [exception RIP: ocfs2_duplicate_clusters_by_page+822] RIP: ffffffffc08fb256 RSP: ffff9fbdc887fb10 RFLAGS: 00010202 RAX: 000fffffc000107d RBX: 0000000000301000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffd4ea010a9440 RDI: ffffd4ea010a9440 RBP: ffffd4ea010a9440 R8: ffffd4ea010a945c R9: ffffd4ea010a9440 R10: 0000000000000000 R11: 0000000000000040 R12: 00000000003e0000 R13: ffff8f2b76559800 R14: 0000000000300000 R15: 000000000019ed8f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9fbdc887fb78] __ocfs2_move_extent at ffffffffc0901e70 [ocfs2] #8 [ffff9fbdc887fc08] ocfs2_defrag_extent at ffffffffc0902a18 [ocfs2] #9 [ffff9fbdc887fc88] __ocfs2_move_extents_range at ffffffffc0903814 [ocfs2] #10 [ffff9fbdc887fd20] ocfs2_move_extents at ffffffffc0903b60 [ocfs2] #11 [ffff9fbdc887fd80] ocfs2_ioctl_move_extents at ffffffffc0903ec3 [ocfs2] #12 [ffff9fbdc887fe10] ocfs2_ioctl at ffffffffc08e5a53 [ocfs2] #13 [ffff9fbdc887fe88] do_vfs_ioctl at ffffffff84256480 #14 [ffff9fbdc887fef8] sys_ioctl at ffffffff84256a54 #15 [ffff9fbdc887ff30] do_syscall_64 at ffffffff84003ae4 #16 [ffff9fbdc887ff50] entry_SYSCALL_64_after_hwframe at ffffffff8480009a RIP: 00007f1442027477 RSP: 00007ffd295e0268 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 00007ffd295e4350 RCX: 00007f1442027477 RDX: 00007ffd295e0290 RSI: 0000000040406f06 RDI: 0000000000000004 RBP: 00007ffd295e02e0 R8: 000000000000ffff R9: 00007f1442500500 R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffd295e0450 R13: 0000000000000008 R14: 00007ffd295e0350 R15: 00000000ffffffff ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b On 08/17/2018 11:05 AM, Larry Chen wrote: > Hi Changwei, > > It's easy to reproduce the bug. > > The bug was found during developing a defragmentation tool for ocfs2. > > To achieve that, I use ioctl interface against each file like: > > > struct ocfs2_move_extents me = { > .me_start = 0, > .me_len = buf->st_size, > .me_flags = OCFS2_MOVE_EXT_FL_AUTO_DEFRAG, > }; > > fd = open64(file, O_RDWR, 0400); > > > ret = ioctl(fd, OCFS2_IOC_MOVE_EXT, &me); > > Then I setup a cluster with only 1 node, and keep writing and deleting a > file while another process keep using the ioctl to defragment the file. > > After the patch applied, the bug disappeared, but I'm not sure whether > it has been totally fixed. > > Thanks > Larry > > > On 08/16/2018 07:43 PM, Changwei Ge wrote: >> Hi Larry, >> >> Do you have a method to reproduce this issue? >> >> I would be nice that such a method is provided :-) >> >> >> On 2018/8/16 19:24, Larry Chen wrote: >>> ocfs2_duplicate_clusters_by_page may crash if an extent's page is dirty. >>> When a page has not been written back, it is still in dirty state. If at >>> that moment, ocfs2_duplicate_clusters_by_page is called against this >>> page, the crash happens. >>> >>> To fix this bug, we can just unlock the page and wait the page until >>> it's not dirty. >>> >>> I don't know whether the patch is appropriate, so I need comments, >>> thanks. >>> >>> The following is the core dump: >>> >>> kernel BUG at /root/code/ocfs2/refcounttree.c:2961! >>> __ocfs2_move_extent+0x80/0x450 [ocfs2] >> >> Because we may work on different base of git tree. >> Can you paste in what function the crash occurs? >> I guess you might miss something with your backtrace. Function >> __ocfs2_move_extent shouldn't be in >> file refcounttree.c >> >> Thanks, >> Changwei >> >>> ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] >>> ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] >>> __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] >>> ocfs2_move_extents+0x180/0x3b0 [ocfs2] >>> ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] >>> ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] >>> ocfs2_ioctl+0x253/0x640 [ocfs2] >>> do_vfs_ioctl+0x90/0x5f0 >>> SyS_ioctl+0x74/0x80 >>> do_syscall_64+0x74/0x140 >>> entry_SYSCALL_64_after_hwframe+0x3d/0xa2 >>> >>> To: mfasheh@versity.com, >>> jlbec@evilplan.org >>> Cc: linux-kernel@vger.kernel.org, >>> ocfs2-devel@oss.oracle.com, >>> akpm@linux-foundation.org >>> >>> Signed-off-by: Larry Chen <lchen@suse.com> >>> --- >>> fs/ocfs2/refcounttree.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c >>> index 7869622af22a..ee3b9dbbc310 100644 >>> --- a/fs/ocfs2/refcounttree.c >>> +++ b/fs/ocfs2/refcounttree.c >>> @@ -2946,6 +2946,7 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, >>> if (map_end & (PAGE_SIZE - 1)) >>> to = map_end & (PAGE_SIZE - 1); >>> >>> +retry: >>> page = find_or_create_page(mapping, page_index, GFP_NOFS); >>> if (!page) { >>> ret = -ENOMEM; >>> @@ -2957,8 +2958,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, >>> * In case PAGE_SIZE <= CLUSTER_SIZE, This page >>> * can't be dirtied before we CoW it out. >>> */ >>> - if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) >>> - BUG_ON(PageDirty(page)); >>> + if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { >>> + if (PageDirty(page)) { >>> + unlock_page(page); >>> + cond_resched(); >>> + goto retry; >>> + } >>> + } >>> >>> if (!PageUptodate(page)) { >>> ret = block_read_full_page(page, ocfs2_get_block); >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Hello Larry, >>> On 2018/8/16 at 19:24, in message <20180816112426.12218-1-lchen@suse.com>, Larry Chen <lchen@suse.com> wrote: > ocfs2_duplicate_clusters_by_page may crash if an extent's page is dirty. > When a page has not been written back, it is still in dirty state. If at > that moment, ocfs2_duplicate_clusters_by_page is called against this > page, the crash happens. > > To fix this bug, we can just unlock the page and wait the page until > it's not dirty. > > I don't know whether the patch is appropriate, so I need comments, > thanks. > > The following is the core dump: > > kernel BUG at /root/code/ocfs2/refcounttree.c:2961! > __ocfs2_move_extent+0x80/0x450 [ocfs2] > ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] > ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] > __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] > ocfs2_move_extents+0x180/0x3b0 [ocfs2] > ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] > ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] > ocfs2_ioctl+0x253/0x640 [ocfs2] > do_vfs_ioctl+0x90/0x5f0 > SyS_ioctl+0x74/0x80 > do_syscall_64+0x74/0x140 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > To: mfasheh@versity.com, > jlbec@evilplan.org > Cc: linux-kernel@vger.kernel.org, > ocfs2-devel@oss.oracle.com, > akpm@linux-foundation.org > > Signed-off-by: Larry Chen <lchen@suse.com> > --- > fs/ocfs2/refcounttree.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c > index 7869622af22a..ee3b9dbbc310 100644 > --- a/fs/ocfs2/refcounttree.c > +++ b/fs/ocfs2/refcounttree.c > @@ -2946,6 +2946,7 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, > if (map_end & (PAGE_SIZE - 1)) > to = map_end & (PAGE_SIZE - 1); > > +retry: > page = find_or_create_page(mapping, page_index, GFP_NOFS); > if (!page) { > ret = -ENOMEM; > @@ -2957,8 +2958,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, > * In case PAGE_SIZE <= CLUSTER_SIZE, This page > * can't be dirtied before we CoW it out. > */ > - if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) > - BUG_ON(PageDirty(page)); > + if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { > + if (PageDirty(page)) { > + unlock_page(page); Here, if we find this page is dirty, could we write this page to the disk initiatively? rather than wait for the page become clean by VM mechanism. Thanks Gang > + cond_resched(); > + goto retry; > + } > + } > > if (!PageUptodate(page)) { > ret = block_read_full_page(page, ocfs2_get_block); > -- > 2.13.7 > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 7869622af22a..ee3b9dbbc310 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -2946,6 +2946,7 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, if (map_end & (PAGE_SIZE - 1)) to = map_end & (PAGE_SIZE - 1); +retry: page = find_or_create_page(mapping, page_index, GFP_NOFS); if (!page) { ret = -ENOMEM; @@ -2957,8 +2958,13 @@ int ocfs2_duplicate_clusters_by_page(handle_t *handle, * In case PAGE_SIZE <= CLUSTER_SIZE, This page * can't be dirtied before we CoW it out. */ - if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) - BUG_ON(PageDirty(page)); + if (PAGE_SIZE <= OCFS2_SB(sb)->s_clustersize) { + if (PageDirty(page)) { + unlock_page(page); + cond_resched(); + goto retry; + } + } if (!PageUptodate(page)) { ret = block_read_full_page(page, ocfs2_get_block);
ocfs2_duplicate_clusters_by_page may crash if an extent's page is dirty. When a page has not been written back, it is still in dirty state. If at that moment, ocfs2_duplicate_clusters_by_page is called against this page, the crash happens. To fix this bug, we can just unlock the page and wait the page until it's not dirty. I don't know whether the patch is appropriate, so I need comments, thanks. The following is the core dump: kernel BUG at /root/code/ocfs2/refcounttree.c:2961! __ocfs2_move_extent+0x80/0x450 [ocfs2] ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2] ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2] __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2] ocfs2_move_extents+0x180/0x3b0 [ocfs2] ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2] ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2] ocfs2_ioctl+0x253/0x640 [ocfs2] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 To: mfasheh@versity.com, jlbec@evilplan.org Cc: linux-kernel@vger.kernel.org, ocfs2-devel@oss.oracle.com, akpm@linux-foundation.org Signed-off-by: Larry Chen <lchen@suse.com> --- fs/ocfs2/refcounttree.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)