diff mbox

[2/3] Btrfs: lzo compression must free at least PAGE_SIZE

Message ID 20170519133835.27843-3-nefelim4ag@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets May 19, 2017, 1:38 p.m. UTC
If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/lzo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lionel Bouton May 19, 2017, 2:17 p.m. UTC | #1
Hi,

Le 19/05/2017 à 15:38, Timofey Titovets a écrit :
> If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent
>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/lzo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
> index bd0b0938..637ef1b0 100644
> --- a/fs/btrfs/lzo.c
> +++ b/fs/btrfs/lzo.c
> @@ -207,7 +207,7 @@ static int lzo_compress_pages(struct list_head *ws,
>  		}
>  
>  		/* we're making it bigger, give up */
> -		if (tot_in > 8192 && tot_in < tot_out) {
> +		if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) {
>  			ret = -E2BIG;
>  			goto out;
>  		}
I'm not familiar with this code but I was surprised by the test : you
would expect compression having a benefit when you are freeing an actual
page not reducing data by a page size. So unless I don't understand the
context shouldn't it be something like :

if (tot_in > 8192 && ((tot_in % PAGE_SIZE) <= (tot_out % PAGE_SIZE))

but looking at the code I see that this is in a while loop and there's
another test just after the loop in the existing code :

        if (tot_out > tot_in)
                goto out;

There's a couple of things I don't understand but isn't this designed to
stream data in small chunks through compression before writing it in the
end ? So isn't this later test the proper location to detect if
compression was beneficial ?

You might not save a page early on in the while loop working on a subset
of the data to compress but after enough data being processed you could
save a page. It seems odd that your modification could abort compression
early on although the same condition would become true after enough loops.

Isn't what you want something like :

        if (tot_out % PAGE_SIZE >= tot_in % PAGE_SIZE)
                goto out;

after the loop ?
The >= instead of > would avoid decompression in the case where the
compressed data is smaller but uses the same space on disk.

Best regards,

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lionel Bouton May 19, 2017, 8:19 p.m. UTC | #2
Le 19/05/2017 à 16:17, Lionel Bouton a écrit :
> Hi,
>
> Le 19/05/2017 à 15:38, Timofey Titovets a écrit :
>> If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent
>>
>> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
>> ---
>>  fs/btrfs/lzo.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
>> index bd0b0938..637ef1b0 100644
>> --- a/fs/btrfs/lzo.c
>> +++ b/fs/btrfs/lzo.c
>> @@ -207,7 +207,7 @@ static int lzo_compress_pages(struct list_head *ws,
>>  		}
>>  
>>  		/* we're making it bigger, give up */
>> -		if (tot_in > 8192 && tot_in < tot_out) {
>> +		if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) {
>>  			ret = -E2BIG;
>>  			goto out;
>>  		}
> I'm not familiar with this code but I was surprised by the test : you
> would expect compression having a benefit when you are freeing an actual
> page not reducing data by a page size. So unless I don't understand the
> context shouldn't it be something like :
>
> if (tot_in > 8192 && ((tot_in % PAGE_SIZE) <= (tot_out % PAGE_SIZE))
>
> but looking at the code I see that this is in a while loop and there's
> another test just after the loop in the existing code :
>
>         if (tot_out > tot_in)
>                 goto out;
>
> There's a couple of things I don't understand but isn't this designed to
> stream data in small chunks through compression before writing it in the
> end ? So isn't this later test the proper location to detect if
> compression was beneficial ?
>
> You might not save a page early on in the while loop working on a subset
> of the data to compress but after enough data being processed you could
> save a page. It seems odd that your modification could abort compression
> early on although the same condition would become true after enough loops.
>
> Isn't what you want something like :
>
>         if (tot_out % PAGE_SIZE >= tot_in % PAGE_SIZE)
>                 goto out;
>
> after the loop ?
> The >= instead of > would avoid decompression in the case where the
> compressed data is smaller but uses the same space on disk.

I was too focused on other problems and having a fresh look at what I
wrote I'm embarrassed by what I read.
Used pages for a given amount of data should be (amount / PAGE_SIZE) +
((amount % PAGE_SIZE) == 0 ? 0 : 1) this seems enough of a common thing
to compute that the kernel might have a macro defined for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timofey Titovets May 19, 2017, 9:15 p.m. UTC | #3
2017-05-19 23:19 GMT+03:00 Lionel Bouton <lionel-subscription@bouton.name>:
> I was too focused on other problems and having a fresh look at what I
> wrote I'm embarrassed by what I read.
> Used pages for a given amount of data should be (amount / PAGE_SIZE) +
> ((amount % PAGE_SIZE) == 0 ? 0 : 1) this seems enough of a common thing
> to compute that the kernel might have a macro defined for this.

If i understand the code correctly, the logic of comparing the size of
input/output by bytes is enough (IMHO) for skipping the compression of
non-compressible data, and as btrfs uses PAGE_SIZE as a data cluster
size (and if i understand correctly it's minimum IO size), the logic
of that can be improved in case when compressed data use 126978 <
compressed_size < 131072.
The easiest way to improve that case, i think, is making the
compressed size bigger by PAGE_SIZE.

JFYI:
Once I've read on the list that btrfs don't compress data, if data are
less then PAGE_SIZE because it's useless (i didn't find that in the
code, so i think that i don't fully understand code of compress
routine)

After some time i got a idea that if btrfs determines store data
compressed or not by comparing input/output size of data (i.e. if
compressed size is bigger in compare to input data, don't store
compressed version), this logic can be improved by also checking if
compression profit is more then PAGE_SIZE, because if it's not,
compressed data will pass check and comsume the same amount of space
and as a result will not show any gain.
Lionel Bouton May 20, 2017, 3:47 p.m. UTC | #4
Le 19/05/2017 à 23:15, Timofey Titovets a écrit :
> 2017-05-19 23:19 GMT+03:00 Lionel Bouton
> <lionel-subscription@bouton.name>:
>> I was too focused on other problems and having a fresh look at what I
>> wrote I'm embarrassed by what I read. Used pages for a given amount
>> of data should be (amount / PAGE_SIZE) + ((amount % PAGE_SIZE) == 0 ?
>> 0 : 1) this seems enough of a common thing to compute that the kernel
>> might have a macro defined for this. 
> If i understand the code correctly, the logic of comparing the size of
> input/output by bytes is enough (IMHO)

As I suspected I missed context : the name of the function makes it
clear it is supposed to work on whole pages so you are right about the
comparison.

What I'm still unsure is if the test is at the right spot. The inner
loop seems to work on at most
in_len = min(len, PAGE_SIZE)
chunks of data,
for example on anything with len >= 4xPAGE_SIZE and PAGE_SIZE=4096 it
seems to me there's a problem.

if·(tot_in·>·8192·&&·tot_in·<·tot_out·+·PAGE_SIZE)

tot_in > 8192 is true starting at the 3rd page being processedin my example

If the 3 first pages don't manage to free one full page (ie the function
only reaches at best a 2/3 compression ratio) the modified second
condition is true and the compression is aborted. This even if
continuing the compression would end up in freeing one page (tot_out is
expected to rise slower than tot_in on compressible data so the
difference could rise and reach a full PAGE_SIZE).

Am I still confused by something ?

Best regards,

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timofey Titovets May 20, 2017, 4:23 p.m. UTC | #5
2017-05-20 18:47 GMT+03:00 Lionel Bouton <lionel-subscription@bouton.name>:
> Le 19/05/2017 à 23:15, Timofey Titovets a écrit :
>> 2017-05-19 23:19 GMT+03:00 Lionel Bouton
>> <lionel-subscription@bouton.name>:
>>> I was too focused on other problems and having a fresh look at what I
>>> wrote I'm embarrassed by what I read. Used pages for a given amount
>>> of data should be (amount / PAGE_SIZE) + ((amount % PAGE_SIZE) == 0 ?
>>> 0 : 1) this seems enough of a common thing to compute that the kernel
>>> might have a macro defined for this.
>> If i understand the code correctly, the logic of comparing the size of
>> input/output by bytes is enough (IMHO)
>
> As I suspected I missed context : the name of the function makes it
> clear it is supposed to work on whole pages so you are right about the
> comparison.
>
> What I'm still unsure is if the test is at the right spot. The inner
> loop seems to work on at most
> in_len = min(len, PAGE_SIZE)
> chunks of data,
> for example on anything with len >= 4xPAGE_SIZE and PAGE_SIZE=4096 it
> seems to me there's a problem.
>
> if·(tot_in·>·8192·&&·tot_in·<·tot_out·+·PAGE_SIZE)
>
> tot_in > 8192 is true starting at the 3rd page being processedin my example
>
> If the 3 first pages don't manage to free one full page (ie the function
> only reaches at best a 2/3 compression ratio) the modified second
> condition is true and the compression is aborted. This even if
> continuing the compression would end up in freeing one page (tot_out is
> expected to rise slower than tot_in on compressible data so the
> difference could rise and reach a full PAGE_SIZE).
>
> Am I still confused by something ?
>
> Best regards,
>
> Lionel

You right, size must be checked after all data are already comressed,
so i will fix that and update patch set, thanks
diff mbox

Patch

diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c
index bd0b0938..637ef1b0 100644
--- a/fs/btrfs/lzo.c
+++ b/fs/btrfs/lzo.c
@@ -207,7 +207,7 @@  static int lzo_compress_pages(struct list_head *ws,
 		}
 
 		/* we're making it bigger, give up */
-		if (tot_in > 8192 && tot_in < tot_out) {
+		if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) {
 			ret = -E2BIG;
 			goto out;
 		}