| Submitter | Nick Piggin |
|---|---|
| Date | Jan. 5, 2009, 6:48 a.m. |
| Message ID | <20090105064838.GA5209@wotan.suse.de> |
| Download | mbox | patch |
| Permalink | /patch/789/ |
| State | New, archived |
| Headers | show |
Comments
On 2009-01-05 00:48 Nick Piggin said the following: > I believe this patch should solve it. Please test and confirm before > I send it upstream. 3 systems with 2.6.27.10 have worked overnight with dd running continuously. They all failed within 20 minutes without the patch. Thank you, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Nick Piggin wrote: > I believe this patch should solve it. Please test and confirm before > I send it upstream. My test successfully completed two times, writing a 900GB file in each run. I used a patched 2.6.27.10 x86_64 kernel. On an unpatched system this test usually fails before reaching 100GB. Thank you for fixing this issue that quick. Regards, Peter. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hello, I'm having a problem with a system having an XFS filesystem on RAID locking up fairly consistently when writing large amounts of data to it, with several kernels, including 2.6.38.2 and 2.6.39.3, on both AMD and Intel multi-core processors. The kernel always logs this several times: BUG: soft lockup - CPU#2 stuck for 67s! [kswapd0:33] With different CPU# numbers, but always in kswapd0. Eventually the system will really lock up, requiring a reset. During soft lockups (when file transfer apparently stalled), merely typing "ps aux" would often cause the lockup to end immediately. After googling I found this page: https://patchwork.kernel.org/patch/789/ An unpatched vanilla 2.6.39.3 consistently locked up, however after patching it (adding a barrier() after all 4 instances of if (!page_cache_get_speculative(page))) the lockups never happened anymore, and file transfer has been steady. I also tested it with ext4, which doesn't give lockups on unpatched kernels, but unfortunately mkfs.ext4 cannot create filesystems larger than 16TB yet, so I have to use XFS instead. On Mon, Jan 05, 2009 at 06:48:38AM -0000, Nick Piggin wrote: > I believe this patch should solve it. Please test and confirm before > I send it upstream. Further comments on that thread in 2009 indicated the patch was very useful, but it doesn't seem to have been applied upstream. Is there any reason this patch should not be applied? If necessary I can submit a reworked patch for 2.6.39.3 or 3.0 when that comes out. > --- > An XFS workload showed up a bug in the lockless pagecache patch. Basically it > would go into an "infinite" loop, although it would sometimes be able to break > out of the loop! The reason is a missing compiler barrier in the "increment > reference count unless it was zero" case of the lockless pagecache protocol in > the gang lookup functions. > > This would cause the compiler to use a cached value of struct page pointer to > retry the operation with, rather than reload it. So the page might have been > removed from pagecache and freed (refcount==0) but the lookup would not correctly > notice the page is no longer in pagecache, and keep attempting to increment the > refcount and failing, until the page gets reallocated for something else. This > isn't a data corruption because the condition will be detected if the page has > been reallocated. However it can result in a lockup. > > Add a the required compiler barrier and comment to fix this. [...] > Index: linux-2.6/mm/filemap.c > =================================================================== > --- linux-2.6.orig/mm/filemap.c 2009-01-05 17:22:57.000000000 +1100 > +++ linux-2.6/mm/filemap.c 2009-01-05 17:28:40.000000000 +1100 > @@ -794,8 +794,19 @@ repeat: > if (unlikely(page == RADIX_TREE_RETRY)) > goto restart; > > - if (!page_cache_get_speculative(page)) > + if (!page_cache_get_speculative(page)) { > + /* > + * A failed page_cache_get_speculative operation does > + * not imply any barriers (Documentation/atomic_ops.txt), > + * and as such, we must force the compiler to deref the > + * radix-tree slot again rather than using the cached > + * value (because we need to give up if the page has been > + * removed from the radix-tree, rather than looping until > + * it gets reused for something else). > + */ > + barrier(); > goto repeat; > + } > > /* Has the page moved? */ > if (unlikely(page != *((void **)pages[i]))) { > @@ -850,8 +861,11 @@ repeat: > if (page->mapping == NULL || page->index != index) > break; > > - if (!page_cache_get_speculative(page)) > + if (!page_cache_get_speculative(page)) { > + /* barrier: see find_get_pages() */ > + barrier(); > goto repeat; > + } > > /* Has the page moved? */ > if (unlikely(page != *((void **)pages[i]))) { > @@ -904,8 +918,11 @@ repeat: > if (unlikely(page == RADIX_TREE_RETRY)) > goto restart; > > - if (!page_cache_get_speculative(page)) > + if (!page_cache_get_speculative(page)) { > + /* barrier: see find_get_pages() */ > + barrier(); > goto repeat; > + } > > /* Has the page moved? */ > if (unlikely(page != *((void **)pages[i]))) {
On 07/14/2011 01:23 PM, Guus Sliepen wrote: > I'm having a problem with a system having an XFS filesystem on RAID locking up > fairly consistently when writing large amounts of data to it, with several > kernels, including 2.6.38.2 and 2.6.39.3, on both AMD and Intel multi-core > processors. The kernel always logs this several times: > > BUG: soft lockup - CPU#2 stuck for 67s! [kswapd0:33] ... >> I believe this patch should solve it. Please test and confirm before >> I send it upstream. > > Further comments on that thread in 2009 indicated the patch was very useful, > but it doesn't seem to have been applied upstream. Is there any reason this > patch should not be applied? Hello Guus This Bugzilla entry documents the XFS bug from 2009 in detail including links: http://oss.sgi.com/bugzilla/show_bug.cgi?id=805 The problem was finally solved by a patch proposed by Linus. This is the reason the original patch developed by Nick never made it into the kernel. My tests back then showed that both patches fixed the problem. It seems you have found a test case where just Nick's patch helps. Regards, Peter. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jul 14, 2011 at 08:03:09PM +0200, Peter Klotz wrote: > On 07/14/2011 01:23 PM, Guus Sliepen wrote: > > >I'm having a problem with a system having an XFS filesystem on RAID locking up > >fairly consistently when writing large amounts of data to it, with several > >kernels, including 2.6.38.2 and 2.6.39.3, on both AMD and Intel multi-core > >processors. The kernel always logs this several times: > > > >BUG: soft lockup - CPU#2 stuck for 67s! [kswapd0:33] [...] > This Bugzilla entry documents the XFS bug from 2009 in detail > including links: > > http://oss.sgi.com/bugzilla/show_bug.cgi?id=805 Aha, I did not look at that before. > The problem was finally solved by a patch proposed by Linus. This is > the reason the original patch developed by Nick never made it into > the kernel. > > My tests back then showed that both patches fixed the problem. > > It seems you have found a test case where just Nick's patch helps. Yes. I agree with Linus that the root cause should be fixed, not the symptoms. I don't have time to dive in the kernel code myself, but I do have several nearly identical machines where I can test things on. I will be happy to test out patches and/or different kernel versions or kernel configurations, and I can provide dmesg output and perhaps other information if necessary.
Patch
Index: linux-2.6/mm/filemap.c =================================================================== --- linux-2.6.orig/mm/filemap.c 2009-01-05 17:22:57.000000000 +1100 +++ linux-2.6/mm/filemap.c 2009-01-05 17:28:40.000000000 +1100 @@ -794,8 +794,19 @@ repeat: if (unlikely(page == RADIX_TREE_RETRY)) goto restart; - if (!page_cache_get_speculative(page)) + if (!page_cache_get_speculative(page)) { + /* + * A failed page_cache_get_speculative operation does + * not imply any barriers (Documentation/atomic_ops.txt), + * and as such, we must force the compiler to deref the + * radix-tree slot again rather than using the cached + * value (because we need to give up if the page has been + * removed from the radix-tree, rather than looping until + * it gets reused for something else). + */ + barrier(); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *((void **)pages[i]))) { @@ -850,8 +861,11 @@ repeat: if (page->mapping == NULL || page->index != index) break; - if (!page_cache_get_speculative(page)) + if (!page_cache_get_speculative(page)) { + /* barrier: see find_get_pages() */ + barrier(); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *((void **)pages[i]))) { @@ -904,8 +918,11 @@ repeat: if (unlikely(page == RADIX_TREE_RETRY)) goto restart; - if (!page_cache_get_speculative(page)) + if (!page_cache_get_speculative(page)) { + /* barrier: see find_get_pages() */ + barrier(); goto repeat; + } /* Has the page moved? */ if (unlikely(page != *((void **)pages[i]))) {