Message ID | 20210417094039.51711-6-linmiaohe@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | close various race windows for swap | expand |
Miaohe Lin <linmiaohe@huawei.com> writes: > When I was investigating the swap code, I found the below possible race > window: > > CPU 1 CPU 2 > ----- ----- > shmem_swapin > swap_cluster_readahead > if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { > swapoff > si->flags &= ~SWP_VALID; > .. > synchronize_rcu(); > .. You have removed these code in the previous patches of the series. And they are not relevant in this patch. > si->swap_file = NULL; > struct inode *inode = si->swap_file->f_mapping->host;[oops!] > > Close this race window by using get/put_swap_device() to guard against > concurrent swapoff. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") No. This isn't the commit that introduces the race condition. Please recheck your git blame result. Best Regards, Huang, Ying > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/shmem.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/shmem.c b/mm/shmem.c > index 26c76b13ad23..936ba5595297 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) > static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, > struct shmem_inode_info *info, pgoff_t index) > { > + struct swap_info_struct *si; > struct vm_area_struct pvma; > struct page *page; > struct vm_fault vmf = { > .vma = &pvma, > }; > > + /* Prevent swapoff from happening to us. */ > + si = get_swap_device(swap); > + if (unlikely(!si)) > + return NULL; > shmem_pseudo_vma_init(&pvma, info, index); > page = swap_cluster_readahead(swap, gfp, &vmf); > shmem_pseudo_vma_destroy(&pvma); > + put_swap_device(si); > > return page; > }
On 2021/4/19 10:15, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> When I was investigating the swap code, I found the below possible race >> window: >> >> CPU 1 CPU 2 >> ----- ----- >> shmem_swapin >> swap_cluster_readahead >> if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { >> swapoff >> si->flags &= ~SWP_VALID; >> .. >> synchronize_rcu(); >> .. > > You have removed these code in the previous patches of the series. And > they are not relevant in this patch. Yes, I should change these. Thanks. > >> si->swap_file = NULL; >> struct inode *inode = si->swap_file->f_mapping->host;[oops!] >> >> Close this race window by using get/put_swap_device() to guard against >> concurrent swapoff. >> >> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > No. This isn't the commit that introduces the race condition. Please > recheck your git blame result. > I think this is really hard to find exact commit. I used git blame and found this race should be existed when this is introduced. Any suggestion ? Thanks. > Best Regards, > Huang, Ying > >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> mm/shmem.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 26c76b13ad23..936ba5595297 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) >> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, >> struct shmem_inode_info *info, pgoff_t index) >> { >> + struct swap_info_struct *si; >> struct vm_area_struct pvma; >> struct page *page; >> struct vm_fault vmf = { >> .vma = &pvma, >> }; >> >> + /* Prevent swapoff from happening to us. */ >> + si = get_swap_device(swap); >> + if (unlikely(!si)) >> + return NULL; >> shmem_pseudo_vma_init(&pvma, info, index); >> page = swap_cluster_readahead(swap, gfp, &vmf); >> shmem_pseudo_vma_destroy(&pvma); >> + put_swap_device(si); >> >> return page; >> } > . >
Miaohe Lin <linmiaohe@huawei.com> writes: > On 2021/4/19 10:15, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> When I was investigating the swap code, I found the below possible race >>> window: >>> >>> CPU 1 CPU 2 >>> ----- ----- >>> shmem_swapin >>> swap_cluster_readahead >>> if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { >>> swapoff >>> si->flags &= ~SWP_VALID; >>> .. >>> synchronize_rcu(); >>> .. >> >> You have removed these code in the previous patches of the series. And >> they are not relevant in this patch. > > Yes, I should change these. Thanks. > >> >>> si->swap_file = NULL; >>> struct inode *inode = si->swap_file->f_mapping->host;[oops!] >>> >>> Close this race window by using get/put_swap_device() to guard against >>> concurrent swapoff. >>> >>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >> >> No. This isn't the commit that introduces the race condition. Please >> recheck your git blame result. >> > > I think this is really hard to find exact commit. I used git blame and found > this race should be existed when this is introduced. Any suggestion ? > Thanks. I think the commit that introduces the race condition is commit 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or not") Best Regards, Huang, Ying >> Best Regards, >> Huang, Ying >> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> --- >>> mm/shmem.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/mm/shmem.c b/mm/shmem.c >>> index 26c76b13ad23..936ba5595297 100644 >>> --- a/mm/shmem.c >>> +++ b/mm/shmem.c >>> @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) >>> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, >>> struct shmem_inode_info *info, pgoff_t index) >>> { >>> + struct swap_info_struct *si; >>> struct vm_area_struct pvma; >>> struct page *page; >>> struct vm_fault vmf = { >>> .vma = &pvma, >>> }; >>> >>> + /* Prevent swapoff from happening to us. */ >>> + si = get_swap_device(swap); >>> + if (unlikely(!si)) >>> + return NULL; >>> shmem_pseudo_vma_init(&pvma, info, index); >>> page = swap_cluster_readahead(swap, gfp, &vmf); >>> shmem_pseudo_vma_destroy(&pvma); >>> + put_swap_device(si); >>> >>> return page; >>> } >> . >>
On 2021/4/19 15:04, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2021/4/19 10:15, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> When I was investigating the swap code, I found the below possible race >>>> window: >>>> >>>> CPU 1 CPU 2 >>>> ----- ----- >>>> shmem_swapin >>>> swap_cluster_readahead >>>> if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { >>>> swapoff >>>> si->flags &= ~SWP_VALID; >>>> .. >>>> synchronize_rcu(); >>>> .. >>> >>> You have removed these code in the previous patches of the series. And >>> they are not relevant in this patch. >> >> Yes, I should change these. Thanks. >> >>> >>>> si->swap_file = NULL; >>>> struct inode *inode = si->swap_file->f_mapping->host;[oops!] >>>> >>>> Close this race window by using get/put_swap_device() to guard against >>>> concurrent swapoff. >>>> >>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >>> >>> No. This isn't the commit that introduces the race condition. Please >>> recheck your git blame result. >>> >> >> I think this is really hard to find exact commit. I used git blame and found >> this race should be existed when this is introduced. Any suggestion ? >> Thanks. > > I think the commit that introduces the race condition is commit > 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or > not") > Thanks. The commit log only describes one race condition. And for that one, this should be correct Fixes tag. But there are still many other race conditions inside swap_cluster_readahead, such as swap_readpage() called from swap_cluster_readahead. This tag could not cover the all race windows. > Best Regards, > Huang, Ying > >>> Best Regards, >>> Huang, Ying >>> >>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>> --- >>>> mm/shmem.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>> index 26c76b13ad23..936ba5595297 100644 >>>> --- a/mm/shmem.c >>>> +++ b/mm/shmem.c >>>> @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) >>>> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, >>>> struct shmem_inode_info *info, pgoff_t index) >>>> { >>>> + struct swap_info_struct *si; >>>> struct vm_area_struct pvma; >>>> struct page *page; >>>> struct vm_fault vmf = { >>>> .vma = &pvma, >>>> }; >>>> >>>> + /* Prevent swapoff from happening to us. */ >>>> + si = get_swap_device(swap); >>>> + if (unlikely(!si)) >>>> + return NULL; >>>> shmem_pseudo_vma_init(&pvma, info, index); >>>> page = swap_cluster_readahead(swap, gfp, &vmf); >>>> shmem_pseudo_vma_destroy(&pvma); >>>> + put_swap_device(si); >>>> >>>> return page; >>>> } >>> . >>> > . >
Miaohe Lin <linmiaohe@huawei.com> writes: > On 2021/4/19 15:04, Huang, Ying wrote: >> Miaohe Lin <linmiaohe@huawei.com> writes: >> >>> On 2021/4/19 10:15, Huang, Ying wrote: >>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>> >>>>> When I was investigating the swap code, I found the below possible race >>>>> window: >>>>> >>>>> CPU 1 CPU 2 >>>>> ----- ----- >>>>> shmem_swapin >>>>> swap_cluster_readahead >>>>> if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { >>>>> swapoff >>>>> si->flags &= ~SWP_VALID; >>>>> .. >>>>> synchronize_rcu(); >>>>> .. >>>> >>>> You have removed these code in the previous patches of the series. And >>>> they are not relevant in this patch. >>> >>> Yes, I should change these. Thanks. >>> >>>> >>>>> si->swap_file = NULL; >>>>> struct inode *inode = si->swap_file->f_mapping->host;[oops!] >>>>> >>>>> Close this race window by using get/put_swap_device() to guard against >>>>> concurrent swapoff. >>>>> >>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >>>> >>>> No. This isn't the commit that introduces the race condition. Please >>>> recheck your git blame result. >>>> >>> >>> I think this is really hard to find exact commit. I used git blame and found >>> this race should be existed when this is introduced. Any suggestion ? >>> Thanks. >> >> I think the commit that introduces the race condition is commit >> 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or >> not") >> > > Thanks. > The commit log only describes one race condition. And for that one, this should be correct > Fixes tag. But there are still many other race conditions inside swap_cluster_readahead, > such as swap_readpage() called from swap_cluster_readahead. This tag could not cover the > all race windows. No. swap_readpage() in swap_cluster_readahead() is OK. Because __read_swap_cache_async() is called before that, so the swap entry will be marked with SWAP_HAS_CACHE, and page will be locked. Best Regards, Huang, Ying >> Best Regards, >> Huang, Ying >> >>>> Best Regards, >>>> Huang, Ying >>>> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>>> --- >>>>> mm/shmem.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>>> index 26c76b13ad23..936ba5595297 100644 >>>>> --- a/mm/shmem.c >>>>> +++ b/mm/shmem.c >>>>> @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) >>>>> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, >>>>> struct shmem_inode_info *info, pgoff_t index) >>>>> { >>>>> + struct swap_info_struct *si; >>>>> struct vm_area_struct pvma; >>>>> struct page *page; >>>>> struct vm_fault vmf = { >>>>> .vma = &pvma, >>>>> }; >>>>> >>>>> + /* Prevent swapoff from happening to us. */ >>>>> + si = get_swap_device(swap); >>>>> + if (unlikely(!si)) >>>>> + return NULL; >>>>> shmem_pseudo_vma_init(&pvma, info, index); >>>>> page = swap_cluster_readahead(swap, gfp, &vmf); >>>>> shmem_pseudo_vma_destroy(&pvma); >>>>> + put_swap_device(si); >>>>> >>>>> return page; >>>>> } >>>> . >>>> >> . >>
On 2021/4/19 15:41, Huang, Ying wrote: > Miaohe Lin <linmiaohe@huawei.com> writes: > >> On 2021/4/19 15:04, Huang, Ying wrote: >>> Miaohe Lin <linmiaohe@huawei.com> writes: >>> >>>> On 2021/4/19 10:15, Huang, Ying wrote: >>>>> Miaohe Lin <linmiaohe@huawei.com> writes: >>>>> >>>>>> When I was investigating the swap code, I found the below possible race >>>>>> window: >>>>>> >>>>>> CPU 1 CPU 2 >>>>>> ----- ----- >>>>>> shmem_swapin >>>>>> swap_cluster_readahead >>>>>> if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { >>>>>> swapoff >>>>>> si->flags &= ~SWP_VALID; >>>>>> .. >>>>>> synchronize_rcu(); >>>>>> .. >>>>> >>>>> You have removed these code in the previous patches of the series. And >>>>> they are not relevant in this patch. >>>> >>>> Yes, I should change these. Thanks. >>>> >>>>> >>>>>> si->swap_file = NULL; >>>>>> struct inode *inode = si->swap_file->f_mapping->host;[oops!] >>>>>> >>>>>> Close this race window by using get/put_swap_device() to guard against >>>>>> concurrent swapoff. >>>>>> >>>>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") >>>>> >>>>> No. This isn't the commit that introduces the race condition. Please >>>>> recheck your git blame result. >>>>> >>>> >>>> I think this is really hard to find exact commit. I used git blame and found >>>> this race should be existed when this is introduced. Any suggestion ? >>>> Thanks. >>> >>> I think the commit that introduces the race condition is commit >>> 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or >>> not") >>> >> >> Thanks. >> The commit log only describes one race condition. And for that one, this should be correct >> Fixes tag. But there are still many other race conditions inside swap_cluster_readahead, >> such as swap_readpage() called from swap_cluster_readahead. This tag could not cover the >> all race windows. > > No. swap_readpage() in swap_cluster_readahead() is OK. Because > __read_swap_cache_async() is called before that, so the swap entry will > be marked with SWAP_HAS_CACHE, and page will be locked. > Oh... I missed this. Many thanks for your remind. > Best Regards, > Huang, Ying > >>> Best Regards, >>> Huang, Ying >>> >>>>> Best Regards, >>>>> Huang, Ying >>>>> >>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>>>>> --- >>>>>> mm/shmem.c | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/mm/shmem.c b/mm/shmem.c >>>>>> index 26c76b13ad23..936ba5595297 100644 >>>>>> --- a/mm/shmem.c >>>>>> +++ b/mm/shmem.c >>>>>> @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) >>>>>> static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, >>>>>> struct shmem_inode_info *info, pgoff_t index) >>>>>> { >>>>>> + struct swap_info_struct *si; >>>>>> struct vm_area_struct pvma; >>>>>> struct page *page; >>>>>> struct vm_fault vmf = { >>>>>> .vma = &pvma, >>>>>> }; >>>>>> >>>>>> + /* Prevent swapoff from happening to us. */ >>>>>> + si = get_swap_device(swap); >>>>>> + if (unlikely(!si)) >>>>>> + return NULL; >>>>>> shmem_pseudo_vma_init(&pvma, info, index); >>>>>> page = swap_cluster_readahead(swap, gfp, &vmf); >>>>>> shmem_pseudo_vma_destroy(&pvma); >>>>>> + put_swap_device(si); >>>>>> >>>>>> return page; >>>>>> } >>>>> . >>>>> >>> . >>> > . >
diff --git a/mm/shmem.c b/mm/shmem.c index 26c76b13ad23..936ba5595297 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1492,15 +1492,21 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { + struct swap_info_struct *si; struct vm_area_struct pvma; struct page *page; struct vm_fault vmf = { .vma = &pvma, }; + /* Prevent swapoff from happening to us. */ + si = get_swap_device(swap); + if (unlikely(!si)) + return NULL; shmem_pseudo_vma_init(&pvma, info, index); page = swap_cluster_readahead(swap, gfp, &vmf); shmem_pseudo_vma_destroy(&pvma); + put_swap_device(si); return page; }
When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { swapoff si->flags &= ~SWP_VALID; .. synchronize_rcu(); .. si->swap_file = NULL; struct inode *inode = si->swap_file->f_mapping->host;[oops!] Close this race window by using get/put_swap_device() to guard against concurrent swapoff. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> --- mm/shmem.c | 6 ++++++ 1 file changed, 6 insertions(+)