mbox series

[0/2] Preamture OOM fixes

Message ID 20190401082958.26470-1-nborisov@suse.com (mailing list archive)
Headers show
Series Preamture OOM fixes | expand

Message

Nikolay Borisov April 1, 2019, 8:29 a.m. UTC
David reported that following the async cow path refactoring that got into 
misc-next he started seeing premature OOM kills due to large page allocations 
(order 4). This was due to now the whole async context (alongisde ancillary 
structures, describing chunks) being allocated in a single kmalloc call. The 
first patch of this series aims to fix that by converting the kmalloc call to 
kvmalloc. This will have virtually no perfromance impact for small writes and 
for larger ones will fallback to vmalloc. 

However, while testing the first patch I observed BUG_ONs in the async
csum calculation path to trigger. This is clearly caused by the fact that 
the first patch introduces a fairly large memory allocation. Nevertheless, 
having large allocations in the csum path without a fallback to some sort of 
order 0 page allocation or at least graceful handling of memory errors is gross.
Patch 2 aims to rectify this by switching the allocation in btrfs_csum_one_bio
to using kvmalloc, giving greated chance for success. Even if patch 1 is ignored
nothing prevents the csum calculation to trigger the bug on depending on the memory 
usage/fragmentation of the server so IMO it's beneficial either ways.

Nikolay Borisov (2):
  btrfs: Use kvmalloc for allocating compressed path context
  btrfs: Switch memory allocations in async csum calculation path to
    kvmalloc

 fs/btrfs/file-item.c    | 15 +++++++++++----
 fs/btrfs/inode.c        |  9 +++++++--
 fs/btrfs/ordered-data.c |  3 ++-
 3 files changed, 20 insertions(+), 7 deletions(-)

Comments

David Sterba April 2, 2019, 11:39 p.m. UTC | #1
On Mon, Apr 01, 2019 at 11:29:56AM +0300, Nikolay Borisov wrote:
> David reported that following the async cow path refactoring that got into 
> misc-next he started seeing premature OOM kills due to large page allocations 
> (order 4). This was due to now the whole async context (alongisde ancillary 
> structures, describing chunks) being allocated in a single kmalloc call. The 
> first patch of this series aims to fix that by converting the kmalloc call to 
> kvmalloc. This will have virtually no perfromance impact for small writes and 
> for larger ones will fallback to vmalloc. 
> 
> However, while testing the first patch I observed BUG_ONs in the async
> csum calculation path to trigger. This is clearly caused by the fact that 
> the first patch introduces a fairly large memory allocation. Nevertheless, 
> having large allocations in the csum path without a fallback to some sort of 
> order 0 page allocation or at least graceful handling of memory errors is gross.
> Patch 2 aims to rectify this by switching the allocation in btrfs_csum_one_bio
> to using kvmalloc, giving greated chance for success. Even if patch 1 is ignored
> nothing prevents the csum calculation to trigger the bug on depending on the memory 
> usage/fragmentation of the server so IMO it's beneficial either ways.

For the record: the patches that introduced the preallocation + followup
cleanups have been removed from misc-next until we're sure that the
supposed change in IO submission path does not introduce further
surprises.

The potential performance impact is about to be evaluated. The error
not-handling currently depends on the kmalloc behaviour that does not
fail for small allocations, so current code is safer for users.

The preallocation provides a nicer way to handle errors early and fails
gracefully, unfortunately this is out of the safe zone of kmalloc so
this introduces a new problem.

The io submission path is probably the biggest and worst BUG_ON error
handling misuse, but also the hardest one to fix. We'll get there, but
as the prealloc patches showed we need to be careful.