btrfs: extent_io: Handle memory allocation failure in __clear_extent_bit()
diff mbox series

Message ID 20190418072114.4573-1-wqu@suse.com
State New
Headers show
Series
  • btrfs: extent_io: Handle memory allocation failure in __clear_extent_bit()
Related show

Commit Message

Qu Wenruo April 18, 2019, 7:21 a.m. UTC
There is a BUG_ON() in __clear_extent_bit() for memory allocation
failure.

While comment of __clear_extent_bit() says it can return error, but we
always return 0.

Some __clear_extent_bit() callers just ignore the return value, while
some still expect error.

Let's return proper error for this memory allocation anyway, to remove
that BUG_ON() as a first step, so at least we can continue test.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Nikolay Borisov April 18, 2019, 7:24 a.m. UTC | #1
On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
> There is a BUG_ON() in __clear_extent_bit() for memory allocation
> failure.
> 
> While comment of __clear_extent_bit() says it can return error, but we
> always return 0.
> 
> Some __clear_extent_bit() callers just ignore the return value, while
> some still expect error.
> 
> Let's return proper error for this memory allocation anyway, to remove
> that BUG_ON() as a first step, so at least we can continue test.

I remember Josef did some changes into this code and said that prealloc
shouldn't fail because this will cause mayhem down the road i.e. proper
error handling is missing. If anything I think it should be added first
and then remove the BUG_ONs.

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 828708f6510c..f1bfc632ef7b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -743,7 +743,10 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>  
>  	if (state->start < start) {
>  		prealloc = alloc_extent_state_atomic(prealloc);
> -		BUG_ON(!prealloc);
> +		if (!prealloc) {
> +			err = -ENOMEM;
> +			goto out;
> +		}
>  		err = split_state(tree, state, prealloc, start);
>  		if (err)
>  			extent_io_tree_panic(tree, err);
> @@ -766,7 +769,10 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>  	 */
>  	if (state->start <= end && state->end > end) {
>  		prealloc = alloc_extent_state_atomic(prealloc);
> -		BUG_ON(!prealloc);
> +		if (!prealloc) {
> +			err = -ENOMEM;
> +			goto out;
> +		}
>  		err = split_state(tree, state, prealloc, end + 1);
>  		if (err)
>  			extent_io_tree_panic(tree, err);
> @@ -801,6 +807,8 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>  	if (prealloc)
>  		free_extent_state(prealloc);
>  
> +	if (err)
> +		return err;
>  	return 0;
>  
>  }
>
Qu Wenruo April 18, 2019, 7:30 a.m. UTC | #2
On 2019/4/18 下午3:24, Nikolay Borisov wrote:
> 
> 
> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
>> There is a BUG_ON() in __clear_extent_bit() for memory allocation
>> failure.
>>
>> While comment of __clear_extent_bit() says it can return error, but we
>> always return 0.
>>
>> Some __clear_extent_bit() callers just ignore the return value, while
>> some still expect error.
>>
>> Let's return proper error for this memory allocation anyway, to remove
>> that BUG_ON() as a first step, so at least we can continue test.
> 
> I remember Josef did some changes into this code and said that prealloc
> shouldn't fail because this will cause mayhem down the road i.e. proper
> error handling is missing. If anything I think it should be added first
> and then remove the BUG_ONs.

That's true, we could have some strange lockup due to
lock_extent_bits(), as if some clear_extent_bits() failed due to ENOMEM
and caller just ignore the error, we could have a lockup.

I'll try to pre-allocate certain amount of extent_state as the last
chance of redemption.

Anyway, such BUG_ON() right after kmalloc() is really a blockage for
error injection test.

Thanks,
Qu

> 
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 828708f6510c..f1bfc632ef7b 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -743,7 +743,10 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>>  
>>  	if (state->start < start) {
>>  		prealloc = alloc_extent_state_atomic(prealloc);
>> -		BUG_ON(!prealloc);
>> +		if (!prealloc) {
>> +			err = -ENOMEM;
>> +			goto out;
>> +		}
>>  		err = split_state(tree, state, prealloc, start);
>>  		if (err)
>>  			extent_io_tree_panic(tree, err);
>> @@ -766,7 +769,10 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>>  	 */
>>  	if (state->start <= end && state->end > end) {
>>  		prealloc = alloc_extent_state_atomic(prealloc);
>> -		BUG_ON(!prealloc);
>> +		if (!prealloc) {
>> +			err = -ENOMEM;
>> +			goto out;
>> +		}
>>  		err = split_state(tree, state, prealloc, end + 1);
>>  		if (err)
>>  			extent_io_tree_panic(tree, err);
>> @@ -801,6 +807,8 @@ int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
>>  	if (prealloc)
>>  		free_extent_state(prealloc);
>>  
>> +	if (err)
>> +		return err;
>>  	return 0;
>>  
>>  }
>>
David Sterba April 18, 2019, 11:38 a.m. UTC | #3
On Thu, Apr 18, 2019 at 03:30:20PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/4/18 下午3:24, Nikolay Borisov wrote:
> > 
> > 
> > On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
> >> There is a BUG_ON() in __clear_extent_bit() for memory allocation
> >> failure.
> >>
> >> While comment of __clear_extent_bit() says it can return error, but we
> >> always return 0.
> >>
> >> Some __clear_extent_bit() callers just ignore the return value, while
> >> some still expect error.
> >>
> >> Let's return proper error for this memory allocation anyway, to remove
> >> that BUG_ON() as a first step, so at least we can continue test.
> > 
> > I remember Josef did some changes into this code and said that prealloc
> > shouldn't fail because this will cause mayhem down the road i.e. proper
> > error handling is missing. If anything I think it should be added first
> > and then remove the BUG_ONs.
> 
> That's true, we could have some strange lockup due to
> lock_extent_bits(), as if some clear_extent_bits() failed due to ENOMEM
> and caller just ignore the error, we could have a lockup.

Not only lockup but unhandled failed extent range locking totally breaks
assumptions that the following code makes and this would lead to
unpredictable corruptions. Just count how many lock_extent_bits calls
are there. And any caller of __set_extent_bit. There are so many that
the BUG_ON is the measure of last resort to prevent worse problems.

> I'll try to pre-allocate certain amount of extent_state as the last
> chance of redemption.

This only lowers the chances to hit the allocation error but there's
always a case when certain amount + 1 would be needed.

> Anyway, such BUG_ON() right after kmalloc() is really a blockage for
> error injection test.

Maybe, but the code is not yet in the state to inject memory allocation
faiulres to that particular path (ie. the state changes).
Qu Wenruo April 18, 2019, 11:51 a.m. UTC | #4
On 2019/4/18 下午7:38, David Sterba wrote:
> On Thu, Apr 18, 2019 at 03:30:20PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/4/18 下午3:24, Nikolay Borisov wrote:
>>>
>>>
>>> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
>>>> There is a BUG_ON() in __clear_extent_bit() for memory allocation
>>>> failure.
>>>>
>>>> While comment of __clear_extent_bit() says it can return error, but we
>>>> always return 0.
>>>>
>>>> Some __clear_extent_bit() callers just ignore the return value, while
>>>> some still expect error.
>>>>
>>>> Let's return proper error for this memory allocation anyway, to remove
>>>> that BUG_ON() as a first step, so at least we can continue test.
>>>
>>> I remember Josef did some changes into this code and said that prealloc
>>> shouldn't fail because this will cause mayhem down the road i.e. proper
>>> error handling is missing. If anything I think it should be added first
>>> and then remove the BUG_ONs.
>>
>> That's true, we could have some strange lockup due to
>> lock_extent_bits(), as if some clear_extent_bits() failed due to ENOMEM
>> and caller just ignore the error, we could have a lockup.
> 
> Not only lockup but unhandled failed extent range locking totally breaks
> assumptions that the following code makes and this would lead to
> unpredictable corruptions. Just count how many lock_extent_bits calls
> are there. And any caller of __set_extent_bit. There are so many that
> the BUG_ON is the measure of last resort to prevent worse problems.
> 
>> I'll try to pre-allocate certain amount of extent_state as the last
>> chance of redemption.
> 
> This only lowers the chances to hit the allocation error but there's
> always a case when certain amount + 1 would be needed.

Lower chance is already good enough (TM) for low possibility (0.001)
error injection.

And, for real world low memory case, lower chance in btrfs means higher
chance in other subsystem, less chance user will blame btrfs. :)

> 
>> Anyway, such BUG_ON() right after kmalloc() is really a blockage for
>> error injection test.
> 
> Maybe, but the code is not yet in the state to inject memory allocation
> faiulres to that particular path (ie. the state changes).

With last-chance reservation, we can make state related memory
allocation almost always to success even memory allocation failure
injected (if the possibility is low and low concurrency)
And the last-chance reservation can be configured at compile/module load
time, making it flex enough for most cases.

The main reason I'm doing such error injection test is to ensure write
time tree checker is not the cause of the lockup.

Of course I can directly inject error into btrfs_check_leaf_full() and
btrfs_check_node(), and filter the stack to ensure it only happen in
write time, and that's already what I'm crafting, based on the bcc error
inject example and kprobe return value overriding.

But it will never be a bad idea to explore what can go wrong.
And "always BUG_ON()" -> "good enough (TM)" already looks like a
improvement to me.

Thanks,
Qu
Qu Wenruo April 18, 2019, 11:54 a.m. UTC | #5
On 2019/4/18 下午7:51, Qu Wenruo wrote:
>
>
> On 2019/4/18 下午7:38, David Sterba wrote:
>> On Thu, Apr 18, 2019 at 03:30:20PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2019/4/18 下午3:24, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
>>>>> There is a BUG_ON() in __clear_extent_bit() for memory allocation
>>>>> failure.
>>>>>
>>>>> While comment of __clear_extent_bit() says it can return error, but we
>>>>> always return 0.
>>>>>
>>>>> Some __clear_extent_bit() callers just ignore the return value, while
>>>>> some still expect error.
>>>>>
>>>>> Let's return proper error for this memory allocation anyway, to remove
>>>>> that BUG_ON() as a first step, so at least we can continue test.
>>>>
>>>> I remember Josef did some changes into this code and said that prealloc
>>>> shouldn't fail because this will cause mayhem down the road i.e. proper
>>>> error handling is missing. If anything I think it should be added first
>>>> and then remove the BUG_ONs.
>>>
>>> That's true, we could have some strange lockup due to
>>> lock_extent_bits(), as if some clear_extent_bits() failed due to ENOMEM
>>> and caller just ignore the error, we could have a lockup.
>>
>> Not only lockup but unhandled failed extent range locking totally breaks
>> assumptions that the following code makes and this would lead to
>> unpredictable corruptions. Just count how many lock_extent_bits calls
>> are there. And any caller of __set_extent_bit. There are so many that
>> the BUG_ON is the measure of last resort to prevent worse problems.
>>
>>> I'll try to pre-allocate certain amount of extent_state as the last
>>> chance of redemption.
>>
>> This only lowers the chances to hit the allocation error but there's
>> always a case when certain amount + 1 would be needed.
>
> Lower chance is already good enough (TM) for low possibility (0.001)
> error injection.
>
> And, for real world low memory case, lower chance in btrfs means higher
> chance in other subsystem, less chance user will blame btrfs. :)
>
>>
>>> Anyway, such BUG_ON() right after kmalloc() is really a blockage for
>>> error injection test.
>>
>> Maybe, but the code is not yet in the state to inject memory allocation
>> faiulres to that particular path (ie. the state changes).
>
> With last-chance reservation, we can make state related memory
> allocation almost always to success even memory allocation failure
> injected (if the possibility is low and low concurrency)
> And the last-chance reservation can be configured at compile/module load
> time, making it flex enough for most cases.

Forgot to mention, for that method, I'll definitely keep the BUG_ON() on
@prealloc.

Just make the allocation part fall back to use fs_info->last_chance[] to
grab a valid memory slot.

Thanks,
Qu

>
> The main reason I'm doing such error injection test is to ensure write
> time tree checker is not the cause of the lockup.
>
> Of course I can directly inject error into btrfs_check_leaf_full() and
> btrfs_check_node(), and filter the stack to ensure it only happen in
> write time, and that's already what I'm crafting, based on the bcc error
> inject example and kprobe return value overriding.
>
> But it will never be a bad idea to explore what can go wrong.
> And "always BUG_ON()" -> "good enough (TM)" already looks like a
> improvement to me.
>
> Thanks,
> Qu
>
Nikolay Borisov April 18, 2019, 12:27 p.m. UTC | #6
On 18.04.19 г. 14:54 ч., Qu Wenruo wrote:
<snip>

> 
> Forgot to mention, for that method, I'll definitely keep the BUG_ON() on
> @prealloc.
> 
> Just make the allocation part fall back to use fs_info->last_chance[] to
> grab a valid memory slot.

Before you go and re-invent the wheel I will suggest you take a look at
the mempool api (include/linux/mempool.h). AFAIK it's purpose is to
exactly ensure memory allocations from it will not fail (depending on
how the pool is configured)

> 
> Thanks,
> Qu
> 
>>
>> The main reason I'm doing such error injection test is to ensure write
>> time tree checker is not the cause of the lockup.
>>
>> Of course I can directly inject error into btrfs_check_leaf_full() and
>> btrfs_check_node(), and filter the stack to ensure it only happen in
>> write time, and that's already what I'm crafting, based on the bcc error
>> inject example and kprobe return value overriding.
>>
>> But it will never be a bad idea to explore what can go wrong.
>> And "always BUG_ON()" -> "good enough (TM)" already looks like a
>> improvement to me.
>>
>> Thanks,
>> Qu
>>
>
Qu Wenruo April 18, 2019, 12:44 p.m. UTC | #7
On 2019/4/18 下午8:27, Nikolay Borisov wrote:
> 
> 
> On 18.04.19 г. 14:54 ч., Qu Wenruo wrote:
> <snip>
> 
>>
>> Forgot to mention, for that method, I'll definitely keep the BUG_ON() on
>> @prealloc.
>>
>> Just make the allocation part fall back to use fs_info->last_chance[] to
>> grab a valid memory slot.
> 
> Before you go and re-invent the wheel I will suggest you take a look at
> the mempool api (include/linux/mempool.h). AFAIK it's purpose is to
> exactly ensure memory allocations from it will not fail (depending on
> how the pool is configured)

Know that before.
But should_fail_slab() will ignore GFP flags, thus it doesn't make sense
in error injection case.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> The main reason I'm doing such error injection test is to ensure write
>>> time tree checker is not the cause of the lockup.
>>>
>>> Of course I can directly inject error into btrfs_check_leaf_full() and
>>> btrfs_check_node(), and filter the stack to ensure it only happen in
>>> write time, and that's already what I'm crafting, based on the bcc error
>>> inject example and kprobe return value overriding.
>>>
>>> But it will never be a bad idea to explore what can go wrong.
>>> And "always BUG_ON()" -> "good enough (TM)" already looks like a
>>> improvement to me.
>>>
>>> Thanks,
>>> Qu
>>>
>>
Josef Bacik April 18, 2019, 2:10 p.m. UTC | #8
On Thu, Apr 18, 2019 at 10:24:10AM +0300, Nikolay Borisov wrote:
> 
> 
> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
> > There is a BUG_ON() in __clear_extent_bit() for memory allocation
> > failure.
> > 
> > While comment of __clear_extent_bit() says it can return error, but we
> > always return 0.
> > 
> > Some __clear_extent_bit() callers just ignore the return value, while
> > some still expect error.
> > 
> > Let's return proper error for this memory allocation anyway, to remove
> > that BUG_ON() as a first step, so at least we can continue test.
> 
> I remember Josef did some changes into this code and said that prealloc
> shouldn't fail because this will cause mayhem down the road i.e. proper
> error handling is missing. If anything I think it should be added first
> and then remove the BUG_ONs.

I'm ok with changing this, but it needs to be more than just one patch.  Make
alloc_extent_state_atomic() error injectable and then use either error-inject.py
in bcc or use
https://github.com/josefbacik/debug-scripts/blob/master/inject-error.py to test
it and fix all the fallout from not handling lock_extent() succeeding.

You can use https://github.com/josefbacik/debug-scripts/blob/master/codepaths.py
to find all call sites for inject-error to make sure you are catching all the
cases.  Thanks,

Josef
Qu Wenruo April 18, 2019, 2:15 p.m. UTC | #9
On 2019/4/18 下午10:10, Josef Bacik wrote:
> On Thu, Apr 18, 2019 at 10:24:10AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
>>> There is a BUG_ON() in __clear_extent_bit() for memory allocation
>>> failure.
>>>
>>> While comment of __clear_extent_bit() says it can return error, but we
>>> always return 0.
>>>
>>> Some __clear_extent_bit() callers just ignore the return value, while
>>> some still expect error.
>>>
>>> Let's return proper error for this memory allocation anyway, to remove
>>> that BUG_ON() as a first step, so at least we can continue test.
>>
>> I remember Josef did some changes into this code and said that prealloc
>> shouldn't fail because this will cause mayhem down the road i.e. proper
>> error handling is missing. If anything I think it should be added first
>> and then remove the BUG_ONs.
> 
> I'm ok with changing this, but it needs to be more than just one patch.  Make
> alloc_extent_state_atomic() error injectable and then use either error-inject.py
> in bcc or use
> https://github.com/josefbacik/debug-scripts/blob/master/inject-error.py to test
> it and fix all the fallout from not handling lock_extent() succeeding.

Exactly what I'm going to do. With some blend from bcc inject.py, for
stack filter to provide more precious match.

> 
> You can use https://github.com/josefbacik/debug-scripts/blob/master/codepaths.py
> to find all call sites for inject-error to make sure you are catching all the
> cases.  Thanks,

That script is awesome!

Thanks,
Qu

> 
> Josef
>
Qu Wenruo April 22, 2019, 5:44 a.m. UTC | #10
On 2019/4/18 下午7:54, Qu Wenruo wrote:
>
>
> On 2019/4/18 下午7:51, Qu Wenruo wrote:
>>
>>
>> On 2019/4/18 下午7:38, David Sterba wrote:
>>> On Thu, Apr 18, 2019 at 03:30:20PM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2019/4/18 下午3:24, Nikolay Borisov wrote:
>>>>>
>>>>>
>>>>> On 18.04.19 г. 10:21 ч., Qu Wenruo wrote:
>>>>>> There is a BUG_ON() in __clear_extent_bit() for memory allocation
>>>>>> failure.
>>>>>>
>>>>>> While comment of __clear_extent_bit() says it can return error, but we
>>>>>> always return 0.
>>>>>>
>>>>>> Some __clear_extent_bit() callers just ignore the return value, while
>>>>>> some still expect error.
>>>>>>
>>>>>> Let's return proper error for this memory allocation anyway, to remove
>>>>>> that BUG_ON() as a first step, so at least we can continue test.
>>>>>
>>>>> I remember Josef did some changes into this code and said that prealloc
>>>>> shouldn't fail because this will cause mayhem down the road i.e. proper
>>>>> error handling is missing. If anything I think it should be added first
>>>>> and then remove the BUG_ONs.
>>>>
>>>> That's true, we could have some strange lockup due to
>>>> lock_extent_bits(), as if some clear_extent_bits() failed due to ENOMEM
>>>> and caller just ignore the error, we could have a lockup.
>>>
>>> Not only lockup but unhandled failed extent range locking totally breaks
>>> assumptions that the following code makes and this would lead to
>>> unpredictable corruptions. Just count how many lock_extent_bits calls
>>> are there. And any caller of __set_extent_bit. There are so many that
>>> the BUG_ON is the measure of last resort to prevent worse problems.
>>>
>>>> I'll try to pre-allocate certain amount of extent_state as the last
>>>> chance of redemption.
>>>
>>> This only lowers the chances to hit the allocation error but there's
>>> always a case when certain amount + 1 would be needed.
>>
>> Lower chance is already good enough (TM) for low possibility (0.001)
>> error injection.
>>
>> And, for real world low memory case, lower chance in btrfs means higher
>> chance in other subsystem, less chance user will blame btrfs. :)
>>
>>>
>>>> Anyway, such BUG_ON() right after kmalloc() is really a blockage for
>>>> error injection test.
>>>
>>> Maybe, but the code is not yet in the state to inject memory allocation
>>> faiulres to that particular path (ie. the state changes).
>>
>> With last-chance reservation, we can make state related memory
>> allocation almost always to success even memory allocation failure
>> injected (if the possibility is low and low concurrency)
>> And the last-chance reservation can be configured at compile/module load
>> time, making it flex enough for most cases.
>
> Forgot to mention, for that method, I'll definitely keep the BUG_ON() on
> @prealloc.
>
> Just make the allocation part fall back to use fs_info->last_chance[] to
> grab a valid memory slot.

This makes no sense in fact, for bcc inject tool, modify the predicts
can filter out the NOFAIL case, so all of my previous last-chance method
just makes no sense anyway.

I'm re-testing using the refined predicates: '(!(gfpflags &
__GFP_NOFAIL)) => submit_one_bio()'

And just now, I hit the same lock_extent_bits() hang with NOFAIL filter.
So this is definitely something wrong in the submit_one_bio() path.

Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>> The main reason I'm doing such error injection test is to ensure write
>> time tree checker is not the cause of the lockup.
>>
>> Of course I can directly inject error into btrfs_check_leaf_full() and
>> btrfs_check_node(), and filter the stack to ensure it only happen in
>> write time, and that's already what I'm crafting, based on the bcc error
>> inject example and kprobe return value overriding.
>>
>> But it will never be a bad idea to explore what can go wrong.
>> And "always BUG_ON()" -> "good enough (TM)" already looks like a
>> improvement to me.
>>
>> Thanks,
>> Qu
>>

Patch
diff mbox series

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 828708f6510c..f1bfc632ef7b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -743,7 +743,10 @@  int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 
 	if (state->start < start) {
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc) {
+			err = -ENOMEM;
+			goto out;
+		}
 		err = split_state(tree, state, prealloc, start);
 		if (err)
 			extent_io_tree_panic(tree, err);
@@ -766,7 +769,10 @@  int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	 */
 	if (state->start <= end && state->end > end) {
 		prealloc = alloc_extent_state_atomic(prealloc);
-		BUG_ON(!prealloc);
+		if (!prealloc) {
+			err = -ENOMEM;
+			goto out;
+		}
 		err = split_state(tree, state, prealloc, end + 1);
 		if (err)
 			extent_io_tree_panic(tree, err);
@@ -801,6 +807,8 @@  int __clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
 	if (prealloc)
 		free_extent_state(prealloc);
 
+	if (err)
+		return err;
 	return 0;
 
 }