diff mbox series

[v2,3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range

Message ID 20220421125348.62483-4-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few fixup patches for mm | expand

Commit Message

Miaohe Lin April 21, 2022, 12:53 p.m. UTC
Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again. Therefore it should be
safe to delete the hwpoison entry and swapin error entry. There is no
reason to kill the process if it has called MADV_FREE on the range.

Suggested-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/madvise.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

David Hildenbrand April 21, 2022, 1:25 p.m. UTC | #1
On 21.04.22 14:53, Miaohe Lin wrote:
> Once the MADV_FREE operation has succeeded, callers can expect they might
> get zero-fill pages if accessing the memory again. Therefore it should be
> safe to delete the hwpoison entry and swapin error entry. There is no
> reason to kill the process if it has called MADV_FREE on the range.
> 
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/madvise.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d6592488b51..5f4537511532 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			swp_entry_t entry;
>  
>  			entry = pte_to_swp_entry(ptent);
> -			if (non_swap_entry(entry))
> -				continue;
> -			nr_swap--;
> -			free_swap_and_cache(entry);
> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			if (!non_swap_entry(entry)) {
> +				nr_swap--;
> +				free_swap_and_cache(entry);
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			} else if (is_hwpoison_entry(entry) ||
> +				   is_swapin_error_entry(entry)) {
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			}
>  			continue;
>  		}
>  

Reading the man page that should be fine, but might not be required.

"[...] the kernel can free the pages at any time. Once pages in the
range have been freed, the caller will see zero-fill-on-demand pages
upon subsequent page references."


LGTM

Acked-by: David Hildenbrand <david@redhat.com>
Miaohe Lin April 21, 2022, 1:44 p.m. UTC | #2
On 2022/4/21 21:25, David Hildenbrand wrote:
> On 21.04.22 14:53, Miaohe Lin wrote:
>> Once the MADV_FREE operation has succeeded, callers can expect they might
>> get zero-fill pages if accessing the memory again. Therefore it should be
>> safe to delete the hwpoison entry and swapin error entry. There is no
>> reason to kill the process if it has called MADV_FREE on the range.
>>
>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/madvise.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d6592488b51..5f4537511532 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>  			swp_entry_t entry;
>>  
>>  			entry = pte_to_swp_entry(ptent);
>> -			if (non_swap_entry(entry))
>> -				continue;
>> -			nr_swap--;
>> -			free_swap_and_cache(entry);
>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			if (!non_swap_entry(entry)) {
>> +				nr_swap--;
>> +				free_swap_and_cache(entry);
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			} else if (is_hwpoison_entry(entry) ||
>> +				   is_swapin_error_entry(entry)) {
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			}
>>  			continue;
>>  		}
>>  
> 
> Reading the man page that should be fine, but might not be required.
> 
> "[...] the kernel can free the pages at any time. Once pages in the
> range have been freed, the caller will see zero-fill-on-demand pages
> upon subsequent page references."

Yes, this part is not mentioned in the man page.

> 
> 
> LGTM
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 

Many thanks for your quick respond and review!
Peter Xu April 21, 2022, 2:28 p.m. UTC | #3
On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
> Once the MADV_FREE operation has succeeded, callers can expect they might
> get zero-fill pages if accessing the memory again. Therefore it should be
> safe to delete the hwpoison entry and swapin error entry. There is no
> reason to kill the process if it has called MADV_FREE on the range.
> 
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/madvise.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d6592488b51..5f4537511532 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			swp_entry_t entry;
>  
>  			entry = pte_to_swp_entry(ptent);
> -			if (non_swap_entry(entry))
> -				continue;
> -			nr_swap--;
> -			free_swap_and_cache(entry);
> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);

Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
a smaller diff, just add the new code above "continue".

> +			if (!non_swap_entry(entry)) {
> +				nr_swap--;
> +				free_swap_and_cache(entry);
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			} else if (is_hwpoison_entry(entry) ||
> +				   is_swapin_error_entry(entry)) {
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);

Since it's been discussed and you're reposting a new version anyway, why
not start with either reusing hwpoison or pte markers?  Or do you think it
should be for future to drop the new swap entry again?

Thanks,
Miaohe Lin April 22, 2022, 2:47 a.m. UTC | #4
On 2022/4/21 22:28, Peter Xu wrote:
> On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
>> Once the MADV_FREE operation has succeeded, callers can expect they might
>> get zero-fill pages if accessing the memory again. Therefore it should be
>> safe to delete the hwpoison entry and swapin error entry. There is no
>> reason to kill the process if it has called MADV_FREE on the range.
>>
>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/madvise.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d6592488b51..5f4537511532 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>  			swp_entry_t entry;
>>  
>>  			entry = pte_to_swp_entry(ptent);
>> -			if (non_swap_entry(entry))
>> -				continue;
>> -			nr_swap--;
>> -			free_swap_and_cache(entry);
>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> 
> Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
> a smaller diff, just add the new code above "continue".

I tried this way, but that lead to long line splitting, so I rewrote the code like this.
If you prefer to just add the new code above "continue", I will do it in the next version.

> 
>> +			if (!non_swap_entry(entry)) {
>> +				nr_swap--;
>> +				free_swap_and_cache(entry);
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			} else if (is_hwpoison_entry(entry) ||
>> +				   is_swapin_error_entry(entry)) {
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> 
> Since it's been discussed and you're reposting a new version anyway, why
> not start with either reusing hwpoison or pte markers?  Or do you think it
> should be for future to drop the new swap entry again?
> 

IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
different processing (and maybe also well comment them) which will make code more complicated and
somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
will support the "swapin error case" in the future, I think it's fine to change to use it then.
Does this make sense for you?

Thanks a lot!

> Thanks,
>
Peter Xu April 22, 2022, 2:52 a.m. UTC | #5
On Fri, Apr 22, 2022 at 10:47:32AM +0800, Miaohe Lin wrote:
> On 2022/4/21 22:28, Peter Xu wrote:
> > On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
> >> Once the MADV_FREE operation has succeeded, callers can expect they might
> >> get zero-fill pages if accessing the memory again. Therefore it should be
> >> safe to delete the hwpoison entry and swapin error entry. There is no
> >> reason to kill the process if it has called MADV_FREE on the range.
> >>
> >> Suggested-by: Alistair Popple <apopple@nvidia.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/madvise.c | 13 ++++++++-----
> >>  1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/mm/madvise.c b/mm/madvise.c
> >> index 4d6592488b51..5f4537511532 100644
> >> --- a/mm/madvise.c
> >> +++ b/mm/madvise.c
> >> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> >>  			swp_entry_t entry;
> >>  
> >>  			entry = pte_to_swp_entry(ptent);
> >> -			if (non_swap_entry(entry))
> >> -				continue;
> >> -			nr_swap--;
> >> -			free_swap_and_cache(entry);
> >> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> > 
> > Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
> > a smaller diff, just add the new code above "continue".
> 
> I tried this way, but that lead to long line splitting, so I rewrote the code like this.
> If you prefer to just add the new code above "continue", I will do it in the next version.

No worry then, feel free to keep it as is.

> 
> > 
> >> +			if (!non_swap_entry(entry)) {
> >> +				nr_swap--;
> >> +				free_swap_and_cache(entry);
> >> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> >> +			} else if (is_hwpoison_entry(entry) ||
> >> +				   is_swapin_error_entry(entry)) {
> >> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> > 
> > Since it's been discussed and you're reposting a new version anyway, why
> > not start with either reusing hwpoison or pte markers?  Or do you think it
> > should be for future to drop the new swap entry again?
> > 
> 
> IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
> different processing (and maybe also well comment them) which will make code more complicated and
> somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
> will support the "swapin error case" in the future, I think it's fine to change to use it then.
> Does this make sense for you?

Yeah it's fine.  If the pte marker things can finally land as expected,
maybe I can try it out as the 2nd user of it. :)
Miaohe Lin April 22, 2022, 3:15 a.m. UTC | #6
On 2022/4/22 10:52, Peter Xu wrote:
> On Fri, Apr 22, 2022 at 10:47:32AM +0800, Miaohe Lin wrote:
>> On 2022/4/21 22:28, Peter Xu wrote:
>>> On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
>>>> Once the MADV_FREE operation has succeeded, callers can expect they might
>>>> get zero-fill pages if accessing the memory again. Therefore it should be
>>>> safe to delete the hwpoison entry and swapin error entry. There is no
>>>> reason to kill the process if it has called MADV_FREE on the range.
>>>>
>>>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/madvise.c | 13 ++++++++-----
>>>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>> index 4d6592488b51..5f4537511532 100644
>>>> --- a/mm/madvise.c
>>>> +++ b/mm/madvise.c
>>>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>>>  			swp_entry_t entry;
>>>>  
>>>>  			entry = pte_to_swp_entry(ptent);
>>>> -			if (non_swap_entry(entry))
>>>> -				continue;
>>>> -			nr_swap--;
>>>> -			free_swap_and_cache(entry);
>>>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>
>>> Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
>>> a smaller diff, just add the new code above "continue".
>>
>> I tried this way, but that lead to long line splitting, so I rewrote the code like this.
>> If you prefer to just add the new code above "continue", I will do it in the next version.
> 
> No worry then, feel free to keep it as is

Will keep it. Thanks!

>>
>>>
>>>> +			if (!non_swap_entry(entry)) {
>>>> +				nr_swap--;
>>>> +				free_swap_and_cache(entry);
>>>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>> +			} else if (is_hwpoison_entry(entry) ||
>>>> +				   is_swapin_error_entry(entry)) {
>>>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>
>>> Since it's been discussed and you're reposting a new version anyway, why
>>> not start with either reusing hwpoison or pte markers?  Or do you think it
>>> should be for future to drop the new swap entry again?
>>>
>>
>> IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
>> different processing (and maybe also well comment them) which will make code more complicated and
>> somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
>> will support the "swapin error case" in the future, I think it's fine to change to use it then.
>> Does this make sense for you?
> 
> Yeah it's fine.  If the pte marker things can finally land as expected,
> maybe I can try it out as the 2nd user of it. :)

Sounds good to me. And if needed, I am glad to do it then. Thanks! ;)

>
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 4d6592488b51..5f4537511532 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -624,11 +624,14 @@  static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			swp_entry_t entry;
 
 			entry = pte_to_swp_entry(ptent);
-			if (non_swap_entry(entry))
-				continue;
-			nr_swap--;
-			free_swap_and_cache(entry);
-			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			if (!non_swap_entry(entry)) {
+				nr_swap--;
+				free_swap_and_cache(entry);
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			} else if (is_hwpoison_entry(entry) ||
+				   is_swapin_error_entry(entry)) {
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			}
 			continue;
 		}