[v3,0/4] brd discard patches

Message ID	2dacc73-854-e71c-1746-99b017401c9a@redhat.com (mailing list archive)
Headers	show Return-Path: <linux-block-owner@vger.kernel.org> Date: Thu, 10 Aug 2023 12:07:07 +0200 (CEST) From: Mikulas Patocka <mpatocka@redhat.com> To: Jens Axboe <axboe@kernel.dk> cc: Li Nan <linan666@huaweicloud.com>, Zdenek Kabelac <zkabelac@redhat.com>, Christoph Hellwig <hch@infradead.org>, Chaitanya Kulkarni <chaitanyak@nvidia.com>, linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH v3 0/4] brd discard patches Message-ID: <2dacc73-854-e71c-1746-99b017401c9a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	brd discard patches \| expand [v3,0/4] brd discard patches [v3,1/4] brd: use a switch statement in brd_submit_bio [v3,2/4] brd: extend the rcu regions to cover read and write [v3,3/4] brd: enable discard [v3,4/4] brd: implement write zeroes

Mikulas Patocka Aug. 10, 2023, 10:07 a.m. UTC

Hi

Here I'm submitting the ramdisk discard patches for the next merge window. 
If you want to make some more changes, please let me now.

Changes since version 2:
There are no functional changes, I only moved the switch statement 
conversion to a separate patch, so that it's easier to review.

Patch 1: introduce a switch statement in brd_submit_bio
Patch 2: extend the rcu regions to cover read and write
Patch 3: enable discard
Patch 4: implement write zeroes

Mikulas

Li Nan Nov. 10, 2023, 1:22 a.m. UTC | #1

friendly ping...

在 2023/8/10 18:07, Mikulas Patocka 写道:
> Hi
> 
> Here I'm submitting the ramdisk discard patches for the next merge window.
> If you want to make some more changes, please let me now.
> 
> Changes since version 2:
> There are no functional changes, I only moved the switch statement
> conversion to a separate patch, so that it's easier to review.
> 
> Patch 1: introduce a switch statement in brd_submit_bio
> Patch 2: extend the rcu regions to cover read and write
> Patch 3: enable discard
> Patch 4: implement write zeroes
> 
> Mikulas
>

Mikulas Patocka Nov. 14, 2023, 1:59 p.m. UTC | #2

On Fri, 10 Nov 2023, Li Nan wrote:

> friendly ping...

Jens? Do you want this patch series or not?

Mikulas

> 在 2023/8/10 18:07, Mikulas Patocka 写道:
> > Hi
> > 
> > Here I'm submitting the ramdisk discard patches for the next merge window.
> > If you want to make some more changes, please let me now.
> > 
> > Changes since version 2:
> > There are no functional changes, I only moved the switch statement
> > conversion to a separate patch, so that it's easier to review.
> > 
> > Patch 1: introduce a switch statement in brd_submit_bio
> > Patch 2: extend the rcu regions to cover read and write
> > Patch 3: enable discard
> > Patch 4: implement write zeroes
> > 
> > Mikulas
> > 
> 
> -- 
> Thanks,
> Nan
>

Ming Lei Jan. 19, 2024, 8:41 a.m. UTC | #3

Hi Mikulas,

On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> Hi
> 
> Here I'm submitting the ramdisk discard patches for the next merge window. 
> If you want to make some more changes, please let me now.

brd discard is removed in f09a06a193d9 ("brd: remove discard support")
in 2017 because it is just driver private write_zero, and user can get same
result with fallocate(FALLOC_FL_ZERO_RANGE).

Also you only mentioned the motivation in V1 cover-letter:

https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@file01.intranet.prod.int.rdu2.redhat.com/

```
Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
it would be benefical to run the testsuite with discard enabled in order
to test discard handling.
```

But we have lots of test disks with discard support: loop, scsi_debug,
null_blk, ublk, ..., so one requestion is that why brd discard is
a must for lvm2 testsuite to cover (lvm)discard handling?

The reason why brd didn't support discard by freeing pages is writeback
deadlock risk, see:

commit f09a06a193d9 ("brd: remove discard support")

-static void discard_from_brd(struct brd_device *brd,
-                       sector_t sector, size_t n)
-{
-       while (n >= PAGE_SIZE) {
-               /*
-                * Don't want to actually discard pages here because
-                * re-allocating the pages can result in writeback
-                * deadlocks under heavy load.
-                */
-               if (0)
-                       brd_free_page(brd, sector);
-               else
-                       brd_zero_page(brd, sector);
-               sector += PAGE_SIZE >> SECTOR_SHIFT;
-               n -= PAGE_SIZE;
-       }
-}

However, you didn't mention how your patches address this potential
risk, care to document it? I can't find any related words about
this problem.

BTW, your patches looks more complicated than the original removed
discard implementation. And if the above questions get addressed,
I am happy to provide review on the following patches.

Thanks,
Ming

Mikulas Patocka Jan. 22, 2024, 4:30 p.m. UTC | #4

Hi

On Fri, 19 Jan 2024, Ming Lei wrote:

> Hi Mikulas,
> 
> On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > Hi
> > 
> > Here I'm submitting the ramdisk discard patches for the next merge window. 
> > If you want to make some more changes, please let me now.
> 
> brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> in 2017 because it is just driver private write_zero, and user can get same
> result with fallocate(FALLOC_FL_ZERO_RANGE).
> 
> Also you only mentioned the motivation in V1 cover-letter:
> 
> https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@file01.intranet.prod.int.rdu2.redhat.com/
> 
> ```
> Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> it would be benefical to run the testsuite with discard enabled in order
> to test discard handling.
> ```
> 
> But we have lots of test disks with discard support: loop, scsi_debug,
> null_blk, ublk, ..., so one requestion is that why brd discard is
> a must for lvm2 testsuite to cover (lvm)discard handling?

We should ask Zdeněk Kabeláč about it - he is expert about the lvm2 
testsuite.

> The reason why brd didn't support discard by freeing pages is writeback
> deadlock risk, see:
> 
> commit f09a06a193d9 ("brd: remove discard support")
> 
> -static void discard_from_brd(struct brd_device *brd,
> -                       sector_t sector, size_t n)
> -{
> -       while (n >= PAGE_SIZE) {
> -               /*
> -                * Don't want to actually discard pages here because
> -                * re-allocating the pages can result in writeback
> -                * deadlocks under heavy load.
> -                */
> -               if (0)
> -                       brd_free_page(brd, sector);
> -               else
> -                       brd_zero_page(brd, sector);
> -               sector += PAGE_SIZE >> SECTOR_SHIFT;
> -               n -= PAGE_SIZE;
> -       }
> -}
> 
> However, you didn't mention how your patches address this potential
> risk, care to document it? I can't find any related words about
> this problem.

The writeback deadlock can happen even without discard - if the machine 
runs out of memory while writing data to a ramdisk. But the probability is 
increased when discard is used, because pages are freed and re-allocated 
more often.

Generally, the admin should make sure that the machine has enough 
available memory when creating a ramdisk - then, the deadlock can't 
happen.

Ramdisk has no limit on the number of allocated pages, so when it runs out 
of memory, the oom killer will try to kill unrelated processes and the 
machine will hang. If there is risk of overflowing the available memory, 
the admin should use tmpfs instead of a ramdisk - tmpfs can be configured 
with a limit and it can also swap out pages.

> BTW, your patches looks more complicated than the original removed
> discard implementation. And if the above questions get addressed,
> I am happy to provide review on the following patches.

My patches actually free the discarded pages. The original discard 
implementation just overwrote the pages with zeroes without freeing them.

Mikulas

> 
> 
> Thanks,
> Ming
>

Ming Lei Jan. 23, 2024, 2:49 a.m. UTC | #5

On Mon, Jan 22, 2024 at 05:30:07PM +0100, Mikulas Patocka wrote:
> Hi
> 
> 
> On Fri, 19 Jan 2024, Ming Lei wrote:
> 
> > Hi Mikulas,
> > 
> > On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > Here I'm submitting the ramdisk discard patches for the next merge window. 
> > > If you want to make some more changes, please let me now.
> > 
> > brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> > in 2017 because it is just driver private write_zero, and user can get same
> > result with fallocate(FALLOC_FL_ZERO_RANGE).
> > 
> > Also you only mentioned the motivation in V1 cover-letter:
> > 
> > https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@file01.intranet.prod.int.rdu2.redhat.com/
> > 
> > ```
> > Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> > it would be benefical to run the testsuite with discard enabled in order
> > to test discard handling.
> > ```
> > 
> > But we have lots of test disks with discard support: loop, scsi_debug,
> > null_blk, ublk, ..., so one requestion is that why brd discard is
> > a must for lvm2 testsuite to cover (lvm)discard handling?
> 
> We should ask Zdeněk Kabeláč about it - he is expert about the lvm2 
> testsuite.
> 
> > The reason why brd didn't support discard by freeing pages is writeback
> > deadlock risk, see:
> > 
> > commit f09a06a193d9 ("brd: remove discard support")
> > 
> > -static void discard_from_brd(struct brd_device *brd,
> > -                       sector_t sector, size_t n)
> > -{
> > -       while (n >= PAGE_SIZE) {
> > -               /*
> > -                * Don't want to actually discard pages here because
> > -                * re-allocating the pages can result in writeback
> > -                * deadlocks under heavy load.
> > -                */
> > -               if (0)
> > -                       brd_free_page(brd, sector);
> > -               else
> > -                       brd_zero_page(brd, sector);
> > -               sector += PAGE_SIZE >> SECTOR_SHIFT;
> > -               n -= PAGE_SIZE;
> > -       }
> > -}
> > 
> > However, you didn't mention how your patches address this potential
> > risk, care to document it? I can't find any related words about
> > this problem.
> 
> The writeback deadlock can happen even without discard - if the machine 
> runs out of memory while writing data to a ramdisk. But the probability is 
> increased when discard is used, because pages are freed and re-allocated 
> more often.

Yeah, I agree, what I meant is that this thing needs to be documented,
given discard is re-introduced, and the original deadlock comment isn't
addressed

> 
> Generally, the admin should make sure that the machine has enough 
> available memory when creating a ramdisk - then, the deadlock can't 
> happen.
> 
> Ramdisk has no limit on the number of allocated pages, so when it runs out 
> of memory, the oom killer will try to kill unrelated processes and the 
> machine will hang. If there is risk of overflowing the available memory, 
> the admin should use tmpfs instead of a ramdisk - tmpfs can be configured 
> with a limit and it can also swap out pages.
> 
> > BTW, your patches looks more complicated than the original removed
> > discard implementation. And if the above questions get addressed,
> > I am happy to provide review on the following patches.
> 
> My patches actually free the discarded pages. The original discard 
> implementation just overwrote the pages with zeroes without freeing them.

The original implementation supports to discard by freeing pages, and
it is just bypassed unconditionally by:

               if (0)
                       brd_free_page(brd, sector);
               else
                       brd_zero_page(brd, sector);

However, page could be freed by discard when it is being consumed in brd_do_bvec().

Maybe your patch of "brd: extend the rcu regions to cover read and write"
can be simplified a bit, such as:

- grab rcu read lock in brd_do_bvec()
- release the rcu read lock when allocating page via alloc_page() in
  brd_insert_page()
- change free page by rcu

Or avoid it by holding page reference:

- grabbing page reference in brd_lookup_page() if it is called from
copy_to_brd() or copy_from_brd(), and drop it after it is consumed


Thanks,
Ming

[v3,0/4] brd discard patches

Message

Comments