diff mbox series

[1/2] mm/gup.c: fix the wrong comments

Message ID 20190408023746.16916-1-sjhuang@iluvatar.ai (mailing list archive)
State New, archived
Headers show
Series [1/2] mm/gup.c: fix the wrong comments | expand

Commit Message

Huang Shijie April 8, 2019, 2:37 a.m. UTC
When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
get_user_pages_fast().

In the following scenario, we will may meet the bug in the DMA case:
	    .....................
	    get_user_pages_fast(start,,, pages);
	        ......
	    sg_alloc_table_from_pages(, pages, ...);
	    .....................

The root cause is that sg_alloc_table_from_pages() requires the
page order to keep the same as it used in the user space, but
get_user_pages_fast() will mess it up.

So change the comments, and make it more clear for the driver
users.

Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
---
 mm/gup.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Matthew Wilcox (Oracle) April 8, 2019, 2:13 p.m. UTC | #1
On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
> get_user_pages_fast().
> 
> In the following scenario, we will may meet the bug in the DMA case:
> 	    .....................
> 	    get_user_pages_fast(start,,, pages);
> 	        ......
> 	    sg_alloc_table_from_pages(, pages, ...);
> 	    .....................
> 
> The root cause is that sg_alloc_table_from_pages() requires the
> page order to keep the same as it used in the user space, but
> get_user_pages_fast() will mess it up.

I don't understand how get_user_pages_fast() can return the pages in a
different order in the array from the order they appear in userspace.
Can you explain?
Huang Shijie April 9, 2019, 1:08 a.m. UTC | #2
On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
> > get_user_pages_fast().
> > 
> > In the following scenario, we will may meet the bug in the DMA case:
> > 	    .....................
> > 	    get_user_pages_fast(start,,, pages);
> > 	        ......
> > 	    sg_alloc_table_from_pages(, pages, ...);
> > 	    .....................
> > 
> > The root cause is that sg_alloc_table_from_pages() requires the
> > page order to keep the same as it used in the user space, but
> > get_user_pages_fast() will mess it up.
> 
> I don't understand how get_user_pages_fast() can return the pages in a
> different order in the array from the order they appear in userspace.
> Can you explain?
Please see the code in gup.c:

	int get_user_pages_fast(unsigned long start, int nr_pages,
				unsigned int gup_flags, struct page **pages)
	{
		.......
		if (gup_fast_permitted(start, nr_pages)) {
			local_irq_disable();
			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
			local_irq_enable();
			ret = nr;
		}
		.......
		if (nr < nr_pages) {
			/* Try to get the remaining pages with get_user_pages */
			start += nr << PAGE_SHIFT;
			pages += nr;                                                  // The @pages is moved forward.

			if (gup_flags & FOLL_LONGTERM) {
				down_read(&current->mm->mmap_sem);
				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
							    start, nr_pages - nr,
							    pages, NULL, gup_flags);
				up_read(&current->mm->mmap_sem);
			} else {
				/*
				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
				 * possible
				 */
				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
							      pages, gup_flags);
			}
		}


		.....................

BTW, I do not know why we mess up the page order. It maybe used in some special case.

Thanks
Huang Shijie
Matthew Wilcox (Oracle) April 9, 2019, 2:49 a.m. UTC | #3
On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > The root cause is that sg_alloc_table_from_pages() requires the
> > > page order to keep the same as it used in the user space, but
> > > get_user_pages_fast() will mess it up.
> > 
> > I don't understand how get_user_pages_fast() can return the pages in a
> > different order in the array from the order they appear in userspace.
> > Can you explain?
> Please see the code in gup.c:
> 
> 	int get_user_pages_fast(unsigned long start, int nr_pages,
> 				unsigned int gup_flags, struct page **pages)
> 	{
> 		.......
> 		if (gup_fast_permitted(start, nr_pages)) {
> 			local_irq_disable();
> 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.

Right ... but if it's not filled entirely, it will be filled part-way,
and then we stop.

> 			local_irq_enable();
> 			ret = nr;
> 		}
> 		.......
> 		if (nr < nr_pages) {
> 			/* Try to get the remaining pages with get_user_pages */
> 			start += nr << PAGE_SHIFT;
> 			pages += nr;                                                  // The @pages is moved forward.

Yes, to the point where gup_pgd_range() stopped.

> 			if (gup_flags & FOLL_LONGTERM) {
> 				down_read(&current->mm->mmap_sem);
> 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time

Right.

> 				/*
> 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> 				 * possible
> 				 */
> 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> 							      pages, gup_flags);

Yes.  But they'll be in the same order.

> BTW, I do not know why we mess up the page order. It maybe used in some special case.

I'm not discounting the possibility that you've found a bug.
But documenting that a bug exists is not the solution; the solution is
fixing the bug.
Huang Shijie April 9, 2019, 3:04 a.m. UTC | #4
On Mon, Apr 08, 2019 at 07:49:29PM -0700, Matthew Wilcox wrote:
> On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > The root cause is that sg_alloc_table_from_pages() requires the
> > > > page order to keep the same as it used in the user space, but
> > > > get_user_pages_fast() will mess it up.
> > > 
> > > I don't understand how get_user_pages_fast() can return the pages in a
> > > different order in the array from the order they appear in userspace.
> > > Can you explain?
> > Please see the code in gup.c:
> > 
> > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > 				unsigned int gup_flags, struct page **pages)
> > 	{
> > 		.......
> > 		if (gup_fast_permitted(start, nr_pages)) {
> > 			local_irq_disable();
> > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
> 
> Right ... but if it's not filled entirely, it will be filled part-way,
> and then we stop.
> 
> > 			local_irq_enable();
> > 			ret = nr;
> > 		}
> > 		.......
> > 		if (nr < nr_pages) {
> > 			/* Try to get the remaining pages with get_user_pages */
> > 			start += nr << PAGE_SHIFT;
> > 			pages += nr;                                                  // The @pages is moved forward.
> 
> Yes, to the point where gup_pgd_range() stopped.
> 
> > 			if (gup_flags & FOLL_LONGTERM) {
> > 				down_read(&current->mm->mmap_sem);
> > 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
> 
> Right.
> 
> > 				/*
> > 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > 				 * possible
> > 				 */
> > 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> > 							      pages, gup_flags);
> 
> Yes.  But they'll be in the same order.
> 
> > BTW, I do not know why we mess up the page order. It maybe used in some special case.
> 
> I'm not discounting the possibility that you've found a bug.
> But documenting that a bug exists is not the solution; the solution is
> fixing the bug.
I do not think it is a bug :)

If we use the get_user_pages_unlocked(), DMA is okay, such as:
                     ....
		     get_user_pages_unlocked()
		     sg_alloc_table_from_pages()
	             .....

I think the comment is not accurate enough. So just add more comments, and tell the driver
users how to use the GUPs.

Thanks
Huang Shijie
Matthew Wilcox (Oracle) April 9, 2019, 11:19 a.m. UTC | #5
On Tue, Apr 09, 2019 at 11:04:18AM +0800, Huang Shijie wrote:
> On Mon, Apr 08, 2019 at 07:49:29PM -0700, Matthew Wilcox wrote:
> > On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > > The root cause is that sg_alloc_table_from_pages() requires the
> > > > > page order to keep the same as it used in the user space, but
> > > > > get_user_pages_fast() will mess it up.
> > > > 
> > > > I don't understand how get_user_pages_fast() can return the pages in a
> > > > different order in the array from the order they appear in userspace.
> > > > Can you explain?
> > > Please see the code in gup.c:
> > > 
> > > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > > 				unsigned int gup_flags, struct page **pages)
> > > 	{
> > > 		.......
> > > 		if (gup_fast_permitted(start, nr_pages)) {
> > > 			local_irq_disable();
> > > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
> > 
> > Right ... but if it's not filled entirely, it will be filled part-way,
> > and then we stop.
> > 
> > > 			local_irq_enable();
> > > 			ret = nr;
> > > 		}
> > > 		.......
> > > 		if (nr < nr_pages) {
> > > 			/* Try to get the remaining pages with get_user_pages */
> > > 			start += nr << PAGE_SHIFT;
> > > 			pages += nr;                                                  // The @pages is moved forward.
> > 
> > Yes, to the point where gup_pgd_range() stopped.
> > 
> > > 			if (gup_flags & FOLL_LONGTERM) {
> > > 				down_read(&current->mm->mmap_sem);
> > > 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
> > 
> > Right.
> > 
> > > 				/*
> > > 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > > 				 * possible
> > > 				 */
> > > 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> > > 							      pages, gup_flags);
> > 
> > Yes.  But they'll be in the same order.
> > 
> > > BTW, I do not know why we mess up the page order. It maybe used in some special case.
> > 
> > I'm not discounting the possibility that you've found a bug.
> > But documenting that a bug exists is not the solution; the solution is
> > fixing the bug.
> I do not think it is a bug :)
> 
> If we use the get_user_pages_unlocked(), DMA is okay, such as:
>                      ....
> 		     get_user_pages_unlocked()
> 		     sg_alloc_table_from_pages()
> 	             .....
> 
> I think the comment is not accurate enough. So just add more comments, and tell the driver
> users how to use the GUPs.

gup_fast() and gup_unlocked() should return the pages in the same order.
If they do not, then it is a bug.
Ira Weiny April 9, 2019, 2:55 p.m. UTC | #6
> On Tue, Apr 09, 2019 at 11:04:18AM +0800, Huang Shijie wrote:
> > On Mon, Apr 08, 2019 at 07:49:29PM -0700, Matthew Wilcox wrote:
> > > On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > > > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > > > The root cause is that sg_alloc_table_from_pages() requires
> > > > > > the page order to keep the same as it used in the user space,
> > > > > > but
> > > > > > get_user_pages_fast() will mess it up.
> > > > >
> > > > > I don't understand how get_user_pages_fast() can return the
> > > > > pages in a different order in the array from the order they appear in
> userspace.
> > > > > Can you explain?
> > > > Please see the code in gup.c:
> > > >
> > > > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > > > 				unsigned int gup_flags, struct page **pages)
> > > > 	{
> > > > 		.......
> > > > 		if (gup_fast_permitted(start, nr_pages)) {
> > > > 			local_irq_disable();
> > > > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);
> // The @pages array maybe filled at the first time.
> > >
> > > Right ... but if it's not filled entirely, it will be filled
> > > part-way, and then we stop.
> > >
> > > > 			local_irq_enable();
> > > > 			ret = nr;
> > > > 		}
> > > > 		.......
> > > > 		if (nr < nr_pages) {
> > > > 			/* Try to get the remaining pages with
> get_user_pages */
> > > > 			start += nr << PAGE_SHIFT;
> > > > 			pages += nr;                                                  // The
> @pages is moved forward.
> > >
> > > Yes, to the point where gup_pgd_range() stopped.
> > >
> > > > 			if (gup_flags & FOLL_LONGTERM) {
> > > > 				down_read(&current->mm->mmap_sem);
> > > > 				ret = __gup_longterm_locked(current,
> current->mm,      // The @pages maybe filled at the second time
> > >
> > > Right.
> > >
> > > > 				/*
> > > > 				 * retain FAULT_FOLL_ALLOW_RETRY
> optimization if
> > > > 				 * possible
> > > > 				 */
> > > > 				ret = get_user_pages_unlocked(start,
> nr_pages - nr,    // The @pages maybe filled at the second time.
> > > > 							      pages, gup_flags);
> > >
> > > Yes.  But they'll be in the same order.
> > >
> > > > BTW, I do not know why we mess up the page order. It maybe used in
> some special case.
> > >
> > > I'm not discounting the possibility that you've found a bug.
> > > But documenting that a bug exists is not the solution; the solution
> > > is fixing the bug.
> > I do not think it is a bug :)
> >
> > If we use the get_user_pages_unlocked(), DMA is okay, such as:
> >                      ....
> > 		     get_user_pages_unlocked()
> > 		     sg_alloc_table_from_pages()
> > 	             .....
> >
> > I think the comment is not accurate enough. So just add more comments,
> > and tell the driver users how to use the GUPs.
> 
> gup_fast() and gup_unlocked() should return the pages in the same order.
> If they do not, then it is a bug.

Is there a reproducer for this?  Or do you have some debug output which shows this problem?

Ira
Ira Weiny April 9, 2019, 8:23 p.m. UTC | #7
On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
> > > get_user_pages_fast().
> > > 
> > > In the following scenario, we will may meet the bug in the DMA case:
> > > 	    .....................
> > > 	    get_user_pages_fast(start,,, pages);
> > > 	        ......
> > > 	    sg_alloc_table_from_pages(, pages, ...);
> > > 	    .....................
> > > 
> > > The root cause is that sg_alloc_table_from_pages() requires the
> > > page order to keep the same as it used in the user space, but
> > > get_user_pages_fast() will mess it up.
> > 
> > I don't understand how get_user_pages_fast() can return the pages in a
> > different order in the array from the order they appear in userspace.
> > Can you explain?
> Please see the code in gup.c:
> 
> 	int get_user_pages_fast(unsigned long start, int nr_pages,
> 				unsigned int gup_flags, struct page **pages)
> 	{
> 		.......
> 		if (gup_fast_permitted(start, nr_pages)) {
> 			local_irq_disable();
> 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
> 			local_irq_enable();
> 			ret = nr;
> 		}
> 		.......
> 		if (nr < nr_pages) {
> 			/* Try to get the remaining pages with get_user_pages */
> 			start += nr << PAGE_SHIFT;
> 			pages += nr;                                                  // The @pages is moved forward.
> 
> 			if (gup_flags & FOLL_LONGTERM) {
> 				down_read(&current->mm->mmap_sem);
> 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
>

Neither this nor the get_user_pages_unlocked is filling the pages a second
time.  It is adding to the page array having moved start and the page array
forward.

Are you doing a FOLL_LONGTERM GUP?  Or are you in the else clause below when
you get this bug?

Ira

> 							    start, nr_pages - nr,
> 							    pages, NULL, gup_flags);
> 				up_read(&current->mm->mmap_sem);
> 			} else {
> 				/*
> 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> 				 * possible
> 				 */
> 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> 							      pages, gup_flags);
> 			}
> 		}
> 
> 
> 		.....................
> 
> BTW, I do not know why we mess up the page order. It maybe used in some special case.
> 
> Thanks
> Huang Shijie
>
Huang Shijie April 10, 2019, 1:18 a.m. UTC | #8
On Tue, Apr 09, 2019 at 01:23:16PM -0700, Ira Weiny wrote:
> On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
> > > > get_user_pages_fast().
> > > > 
> > > > In the following scenario, we will may meet the bug in the DMA case:
> > > > 	    .....................
> > > > 	    get_user_pages_fast(start,,, pages);
> > > > 	        ......
> > > > 	    sg_alloc_table_from_pages(, pages, ...);
> > > > 	    .....................
> > > > 
> > > > The root cause is that sg_alloc_table_from_pages() requires the
> > > > page order to keep the same as it used in the user space, but
> > > > get_user_pages_fast() will mess it up.
> > > 
> > > I don't understand how get_user_pages_fast() can return the pages in a
> > > different order in the array from the order they appear in userspace.
> > > Can you explain?
> > Please see the code in gup.c:
> > 
> > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > 				unsigned int gup_flags, struct page **pages)
> > 	{
> > 		.......
> > 		if (gup_fast_permitted(start, nr_pages)) {
> > 			local_irq_disable();
> > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
> > 			local_irq_enable();
> > 			ret = nr;
> > 		}
> > 		.......
> > 		if (nr < nr_pages) {
> > 			/* Try to get the remaining pages with get_user_pages */
> > 			start += nr << PAGE_SHIFT;
> > 			pages += nr;                                                  // The @pages is moved forward.
> > 
> > 			if (gup_flags & FOLL_LONGTERM) {
> > 				down_read(&current->mm->mmap_sem);
> > 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
> >
> 
> Neither this nor the get_user_pages_unlocked is filling the pages a second
The get_user_pages_unlocked() will call the handle_mm_fault which will allocate a
new page for the empty PTE, and save the new page into the @pages array.


> time.  It is adding to the page array having moved start and the page array
> forward.

Yes. This will mess up the page order.

I will read the code again to check if I am wrong :)

> 
> Are you doing a FOLL_LONGTERM GUP?  Or are you in the else clause below when
> you get this bug?
I do not use FOLL_LONGTERM, I just use the FOLL_WRITE.

So it seems it runs into the else clause below.

Thanks
Huang Shijie

> 
> Ira
> 
> > 							    start, nr_pages - nr,
> > 							    pages, NULL, gup_flags);
> > 				up_read(&current->mm->mmap_sem);
> > 			} else {
> > 				/*
> > 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > 				 * possible
> > 				 */
> > 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> > 							      pages, gup_flags);
> > 			}
> > 		}
> > 
> >
Huang Shijie April 10, 2019, 1:20 a.m. UTC | #9
On Tue, Apr 09, 2019 at 02:55:31PM +0000, Weiny, Ira wrote:
> > On Tue, Apr 09, 2019 at 11:04:18AM +0800, Huang Shijie wrote:
> > > On Mon, Apr 08, 2019 at 07:49:29PM -0700, Matthew Wilcox wrote:
> > > > On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > > > > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > > > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > > > > The root cause is that sg_alloc_table_from_pages() requires
> > > > > > > the page order to keep the same as it used in the user space,
> > > > > > > but
> > > > > > > get_user_pages_fast() will mess it up.
> > > > > >
> > > > > > I don't understand how get_user_pages_fast() can return the
> > > > > > pages in a different order in the array from the order they appear in
> > userspace.
> > > > > > Can you explain?
> > > > > Please see the code in gup.c:
> > > > >
> > > > > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > > > > 				unsigned int gup_flags, struct page **pages)
> > > > > 	{
> > > > > 		.......
> > > > > 		if (gup_fast_permitted(start, nr_pages)) {
> > > > > 			local_irq_disable();
> > > > > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);
> > // The @pages array maybe filled at the first time.
> > > >
> > > > Right ... but if it's not filled entirely, it will be filled
> > > > part-way, and then we stop.
> > > >
> > > > > 			local_irq_enable();
> > > > > 			ret = nr;
> > > > > 		}
> > > > > 		.......
> > > > > 		if (nr < nr_pages) {
> > > > > 			/* Try to get the remaining pages with
> > get_user_pages */
> > > > > 			start += nr << PAGE_SHIFT;
> > > > > 			pages += nr;                                                  // The
> > @pages is moved forward.
> > > >
> > > > Yes, to the point where gup_pgd_range() stopped.
> > > >
> > > > > 			if (gup_flags & FOLL_LONGTERM) {
> > > > > 				down_read(&current->mm->mmap_sem);
> > > > > 				ret = __gup_longterm_locked(current,
> > current->mm,      // The @pages maybe filled at the second time
> > > >
> > > > Right.
> > > >
> > > > > 				/*
> > > > > 				 * retain FAULT_FOLL_ALLOW_RETRY
> > optimization if
> > > > > 				 * possible
> > > > > 				 */
> > > > > 				ret = get_user_pages_unlocked(start,
> > nr_pages - nr,    // The @pages maybe filled at the second time.
> > > > > 							      pages, gup_flags);
> > > >
> > > > Yes.  But they'll be in the same order.
> > > >
> > > > > BTW, I do not know why we mess up the page order. It maybe used in
> > some special case.
> > > >
> > > > I'm not discounting the possibility that you've found a bug.
> > > > But documenting that a bug exists is not the solution; the solution
> > > > is fixing the bug.
> > > I do not think it is a bug :)
> > >
> > > If we use the get_user_pages_unlocked(), DMA is okay, such as:
> > >                      ....
> > > 		     get_user_pages_unlocked()
> > > 		     sg_alloc_table_from_pages()
> > > 	             .....
> > >
> > > I think the comment is not accurate enough. So just add more comments,
> > > and tell the driver users how to use the GUPs.
> > 
> > gup_fast() and gup_unlocked() should return the pages in the same order.
> > If they do not, then it is a bug.
> 
> Is there a reproducer for this?  Or do you have some debug output which shows this problem?
Is Matthew right?

 " gup_fast() and gup_unlocked() should return the pages in the same order.
 If they do not, then it is a bug."

If Matthew is right,
I need more time to debug the DMA issue...
	

Thanks
Huang Shijie
 

> 
> Ira
>
Ira Weiny April 10, 2019, 6:04 p.m. UTC | #10
On Wed, Apr 10, 2019 at 09:20:36AM +0800, Huang Shijie wrote:
> On Tue, Apr 09, 2019 at 02:55:31PM +0000, Weiny, Ira wrote:
> > > On Tue, Apr 09, 2019 at 11:04:18AM +0800, Huang Shijie wrote:
> > > > On Mon, Apr 08, 2019 at 07:49:29PM -0700, Matthew Wilcox wrote:
> > > > > On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > > > > > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > > > > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > > > > > The root cause is that sg_alloc_table_from_pages() requires
> > > > > > > > the page order to keep the same as it used in the user space,
> > > > > > > > but
> > > > > > > > get_user_pages_fast() will mess it up.
> > > > > > >
> > > > > > > I don't understand how get_user_pages_fast() can return the
> > > > > > > pages in a different order in the array from the order they appear in
> > > userspace.
> > > > > > > Can you explain?
> > > > > > Please see the code in gup.c:
> > > > > >
> > > > > > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > > > > > 				unsigned int gup_flags, struct page **pages)
> > > > > > 	{
> > > > > > 		.......
> > > > > > 		if (gup_fast_permitted(start, nr_pages)) {
> > > > > > 			local_irq_disable();
> > > > > > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);
> > > // The @pages array maybe filled at the first time.
> > > > >
> > > > > Right ... but if it's not filled entirely, it will be filled
> > > > > part-way, and then we stop.
> > > > >
> > > > > > 			local_irq_enable();
> > > > > > 			ret = nr;
> > > > > > 		}
> > > > > > 		.......
> > > > > > 		if (nr < nr_pages) {
> > > > > > 			/* Try to get the remaining pages with
> > > get_user_pages */
> > > > > > 			start += nr << PAGE_SHIFT;
> > > > > > 			pages += nr;                                                  // The
> > > @pages is moved forward.
> > > > >
> > > > > Yes, to the point where gup_pgd_range() stopped.
> > > > >
> > > > > > 			if (gup_flags & FOLL_LONGTERM) {
> > > > > > 				down_read(&current->mm->mmap_sem);
> > > > > > 				ret = __gup_longterm_locked(current,
> > > current->mm,      // The @pages maybe filled at the second time
> > > > >
> > > > > Right.
> > > > >
> > > > > > 				/*
> > > > > > 				 * retain FAULT_FOLL_ALLOW_RETRY
> > > optimization if
> > > > > > 				 * possible
> > > > > > 				 */
> > > > > > 				ret = get_user_pages_unlocked(start,
> > > nr_pages - nr,    // The @pages maybe filled at the second time.
> > > > > > 							      pages, gup_flags);
> > > > >
> > > > > Yes.  But they'll be in the same order.
> > > > >
> > > > > > BTW, I do not know why we mess up the page order. It maybe used in
> > > some special case.
> > > > >
> > > > > I'm not discounting the possibility that you've found a bug.
> > > > > But documenting that a bug exists is not the solution; the solution
> > > > > is fixing the bug.
> > > > I do not think it is a bug :)
> > > >
> > > > If we use the get_user_pages_unlocked(), DMA is okay, such as:
> > > >                      ....
> > > > 		     get_user_pages_unlocked()
> > > > 		     sg_alloc_table_from_pages()
> > > > 	             .....
> > > >
> > > > I think the comment is not accurate enough. So just add more comments,
> > > > and tell the driver users how to use the GUPs.
> > > 
> > > gup_fast() and gup_unlocked() should return the pages in the same order.
> > > If they do not, then it is a bug.
> > 
> > Is there a reproducer for this?  Or do you have some debug output which shows this problem?
> Is Matthew right?
> 
>  " gup_fast() and gup_unlocked() should return the pages in the same order.
>  If they do not, then it is a bug."

Yes I think he is...

Ira

> 
> If Matthew is right,
> I need more time to debug the DMA issue...
> 	
> 
> Thanks
> Huang Shijie
>  
> 
> > 
> > Ira
> > 
>
Ira Weiny April 10, 2019, 6:08 p.m. UTC | #11
On Wed, Apr 10, 2019 at 09:18:50AM +0800, Huang Shijie wrote:
> On Tue, Apr 09, 2019 at 01:23:16PM -0700, Ira Weiny wrote:
> > On Tue, Apr 09, 2019 at 09:08:33AM +0800, Huang Shijie wrote:
> > > On Mon, Apr 08, 2019 at 07:13:13AM -0700, Matthew Wilcox wrote:
> > > > On Mon, Apr 08, 2019 at 10:37:45AM +0800, Huang Shijie wrote:
> > > > > When CONFIG_HAVE_GENERIC_GUP is defined, the kernel will use its own
> > > > > get_user_pages_fast().
> > > > > 
> > > > > In the following scenario, we will may meet the bug in the DMA case:
> > > > > 	    .....................
> > > > > 	    get_user_pages_fast(start,,, pages);
> > > > > 	        ......
> > > > > 	    sg_alloc_table_from_pages(, pages, ...);
> > > > > 	    .....................
> > > > > 
> > > > > The root cause is that sg_alloc_table_from_pages() requires the
> > > > > page order to keep the same as it used in the user space, but
> > > > > get_user_pages_fast() will mess it up.
> > > > 
> > > > I don't understand how get_user_pages_fast() can return the pages in a
> > > > different order in the array from the order they appear in userspace.
> > > > Can you explain?
> > > Please see the code in gup.c:
> > > 
> > > 	int get_user_pages_fast(unsigned long start, int nr_pages,
> > > 				unsigned int gup_flags, struct page **pages)
> > > 	{
> > > 		.......
> > > 		if (gup_fast_permitted(start, nr_pages)) {
> > > 			local_irq_disable();
> > > 			gup_pgd_range(addr, end, gup_flags, pages, &nr);               // The @pages array maybe filled at the first time.
> > > 			local_irq_enable();
> > > 			ret = nr;
> > > 		}
> > > 		.......
> > > 		if (nr < nr_pages) {
> > > 			/* Try to get the remaining pages with get_user_pages */
> > > 			start += nr << PAGE_SHIFT;
> > > 			pages += nr;                                                  // The @pages is moved forward.
> > > 
> > > 			if (gup_flags & FOLL_LONGTERM) {
> > > 				down_read(&current->mm->mmap_sem);
> > > 				ret = __gup_longterm_locked(current, current->mm,      // The @pages maybe filled at the second time
> > >
> > 
> > Neither this nor the get_user_pages_unlocked is filling the pages a second
> The get_user_pages_unlocked() will call the handle_mm_fault which will allocate a
> new page for the empty PTE, and save the new page into the @pages array.

But shouldn't this happen if get_user_pages_unlocked() is called directly?

> 
> 
> > time.  It is adding to the page array having moved start and the page array
> > forward.
> 
> Yes. This will mess up the page order.
> 
> I will read the code again to check if I am wrong :)
> 
> > 
> > Are you doing a FOLL_LONGTERM GUP?  Or are you in the else clause below when
> > you get this bug?
> I do not use FOLL_LONGTERM, I just use the FOLL_WRITE.
> 
> So it seems it runs into the else clause below.

Ok thanks,
Ira

> 
> Thanks
> Huang Shijie
> 
> > 
> > Ira
> > 
> > > 							    start, nr_pages - nr,
> > > 							    pages, NULL, gup_flags);
> > > 				up_read(&current->mm->mmap_sem);
> > > 			} else {
> > > 				/*
> > > 				 * retain FAULT_FOLL_ALLOW_RETRY optimization if
> > > 				 * possible
> > > 				 */
> > > 				ret = get_user_pages_unlocked(start, nr_pages - nr,    // The @pages maybe filled at the second time.
> > > 							      pages, gup_flags);
> > > 			}
> > > 		}
> > > 
> > > 
>
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 22acdd0f79ff..fb11ff90ba3b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1129,10 +1129,6 @@  EXPORT_SYMBOL(get_user_pages_locked);
  *  with:
  *
  *      get_user_pages_unlocked(tsk, mm, ..., pages);
- *
- * It is functionally equivalent to get_user_pages_fast so
- * get_user_pages_fast should be used instead if specific gup_flags
- * (e.g. FOLL_FORCE) are not required.
  */
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 			     struct page **pages, unsigned int gup_flags)
@@ -2147,6 +2143,10 @@  int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  * If not successful, it will fall back to taking the lock and
  * calling get_user_pages().
  *
+ * Note this routine may fill the pages array with entries in a
+ * different order than get_user_pages_unlocked(), which may cause
+ * issues for callers expecting the routines to be equivalent.
+ *
  * Returns number of pages pinned. This may be fewer than the number
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
  * were pinned, returns -errno.