diff mbox series

[v3] mm, drm/i915: mark pinned shmemfs pages as unevictable

Message ID 20181031081945.207709-1-vovoy@chromium.org (mailing list archive)
State New, archived
Headers show
Series [v3] mm, drm/i915: mark pinned shmemfs pages as unevictable | expand

Commit Message

Kuo-Hsin Yang Oct. 31, 2018, 8:19 a.m. UTC
The i915 driver uses shmemfs to allocate backing storage for gem
objects. These shmemfs pages can be pinned (increased ref count) by
shmem_read_mapping_page_gfp(). When a lot of pages are pinned, vmscan
wastes a lot of time scanning these pinned pages. In some extreme case,
all pages in the inactive anon lru are pinned, and only the inactive
anon lru is scanned due to inactive_ratio, the system cannot swap and
invokes the oom-killer. Mark these pinned pages as unevictable to speed
up vmscan.

Add check_move_lru_page() to move page to appropriate lru list.

This patch was inspired by Chris Wilson's change [1].

[1]: https://patchwork.kernel.org/patch/9768741/

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
---
The previous mapping_set_unevictable patch is worse on gem_syslatency
because it defers to vmscan to move these pages to the unevictable list
and the test measures latency to allocate 2MiB pages. This performance
impact can be solved by explicit moving pages to the unevictable list in
the i915 function.

Chris, can you help to run the "igt/benchmarks/gem_syslatency -t 120 -b -m"
test with this patch on your testing machine? I tried to run the test on
a Celeron N4000, 4GB Ram machine. The mean value with this patch is
similar to that with the mlock patch.

x tip-mean.txt # current stock i915
+ lock_vma-mean.txt # the old mlock patch
* mapping-mean.txt # this patch

   N        Min        Max     Median        Avg     Stddev
x 60    548.898   2563.653   2149.573   1999.273    480.837
+ 60    479.049   2119.902   1964.399   1893.226    314.736
* 60    455.358   3212.368   1991.308   1903.686    411.448

Changes for v3:
 Use check_move_lru_page instead of shmem_unlock_mapping to move pages
 to appropriate lru lists.

Changes for v2:
 Squashed the two patches.

 Documentation/vm/unevictable-lru.rst |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c      | 20 +++++++++++++++++++-
 include/linux/swap.h                 |  1 +
 mm/vmscan.c                          | 20 +++++++++++++++++---
 4 files changed, 40 insertions(+), 5 deletions(-)

Comments

Chris Wilson Oct. 31, 2018, 9:41 a.m. UTC | #1
Quoting Kuo-Hsin Yang (2018-10-31 08:19:45)
> The i915 driver uses shmemfs to allocate backing storage for gem
> objects. These shmemfs pages can be pinned (increased ref count) by
> shmem_read_mapping_page_gfp(). When a lot of pages are pinned, vmscan
> wastes a lot of time scanning these pinned pages. In some extreme case,
> all pages in the inactive anon lru are pinned, and only the inactive
> anon lru is scanned due to inactive_ratio, the system cannot swap and
> invokes the oom-killer. Mark these pinned pages as unevictable to speed
> up vmscan.
> 
> Add check_move_lru_page() to move page to appropriate lru list.
> 
> This patch was inspired by Chris Wilson's change [1].
> 
> [1]: https://patchwork.kernel.org/patch/9768741/
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
> ---
> The previous mapping_set_unevictable patch is worse on gem_syslatency
> because it defers to vmscan to move these pages to the unevictable list
> and the test measures latency to allocate 2MiB pages. This performance
> impact can be solved by explicit moving pages to the unevictable list in
> the i915 function.
> 
> Chris, can you help to run the "igt/benchmarks/gem_syslatency -t 120 -b -m"
> test with this patch on your testing machine? I tried to run the test on
> a Celeron N4000, 4GB Ram machine. The mean value with this patch is
> similar to that with the mlock patch.

Will do. As you are confident, I'll try a few different machines. :)
-Chris
Kuo-Hsin Yang Oct. 31, 2018, 10:42 a.m. UTC | #2
On Wed, Oct 31, 2018 at 5:42 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Will do. As you are confident, I'll try a few different machines. :)
> -Chris
Great! Thanks for your help. :)

Vovo
Dave Hansen Oct. 31, 2018, 2:19 p.m. UTC | #3
On 10/31/18 1:19 AM, owner-linux-mm@kvack.org wrote:
> -These are currently used in two places in the kernel:
> +These are currently used in three places in the kernel:
>  
>   (1) By ramfs to mark the address spaces of its inodes when they are created,
>       and this mark remains for the life of the inode.
> @@ -154,6 +154,8 @@ These are currently used in two places in the kernel:
>       swapped out; the application must touch the pages manually if it wants to
>       ensure they're in memory.
>  
> + (3) By the i915 driver to mark pinned address space until it's unpinned.

mlock() and ramfs usage are pretty easy to track down.  /proc/$pid/smaps
or /proc/meminfo can show us mlock() and good ol' 'df' and friends can
show us ramfs the extent of pinned memory.

With these, if we see "Unevictable" in meminfo bump up, we at least have
a starting point to find the cause.

Do we have an equivalent for i915?
Michal Hocko Oct. 31, 2018, 2:24 p.m. UTC | #4
On Wed 31-10-18 16:19:45, Kuo-Hsin Yang wrote:
[...]
> The previous mapping_set_unevictable patch is worse on gem_syslatency
> because it defers to vmscan to move these pages to the unevictable list
> and the test measures latency to allocate 2MiB pages. This performance
> impact can be solved by explicit moving pages to the unevictable list in
> the i915 function.

As I've mentioned in the previous version and testing results. Are you
sure that the lazy unevictable pages collecting is the real problem
here? The test case was generating a lot of page cache and we simply do
not reclaim anon LRUs at all. Maybe I have misunderstood the test
though. I am also wondering whether unevictable pages culling can be
really visible when we do the anon LRU reclaim because the swap path is
quite expensinve on its own.
Dave Hansen Oct. 31, 2018, 2:40 p.m. UTC | #5
On 10/31/18 7:24 AM, Michal Hocko wrote:
> I am also wondering whether unevictable pages culling can be
> really visible when we do the anon LRU reclaim because the swap path is
> quite expensinve on its own.

Didn't we create the unevictable lists in the first place because
scanning alone was observed to be so expensive in some scenarios?

Or am I misunderstanding your question.
Michal Hocko Oct. 31, 2018, 4:42 p.m. UTC | #6
On Wed 31-10-18 07:40:14, Dave Hansen wrote:
> On 10/31/18 7:24 AM, Michal Hocko wrote:
> > I am also wondering whether unevictable pages culling can be
> > really visible when we do the anon LRU reclaim because the swap path is
> > quite expensinve on its own.
> 
> Didn't we create the unevictable lists in the first place because
> scanning alone was observed to be so expensive in some scenarios?

Yes, that is the case. I might just misunderstood the code I thought
those pages were already on the LRU when unevictable flag was set and
we would only move these pages to the unevictable list lazy during the
reclaim. If the flag is set at the time when the page is added to the
LRU then it should get to the proper LRU list right away. But then I do
not understand the test results from previous run at all.
Kuo-Hsin Yang Nov. 1, 2018, 11:28 a.m. UTC | #7
On Thu, Nov 1, 2018 at 12:42 AM Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 31-10-18 07:40:14, Dave Hansen wrote:
> > Didn't we create the unevictable lists in the first place because
> > scanning alone was observed to be so expensive in some scenarios?
>
> Yes, that is the case. I might just misunderstood the code I thought
> those pages were already on the LRU when unevictable flag was set and
> we would only move these pages to the unevictable list lazy during the
> reclaim. If the flag is set at the time when the page is added to the
> LRU then it should get to the proper LRU list right away. But then I do
> not understand the test results from previous run at all.

"gem_syslatency -t 120 -b -m" allocates a lot of anon pages, it consists of
these looping threads:
  * ncpu threads to alloc i915 shmem buffers, these buffers are freed by i915
shrinker.
  * ncpu threads to mmap, write, munmap an 2 MiB mapping.
  * 1 thread to cat all files to /dev/null

Without the unevictable patch, after rebooting and running
"gem_syslatency -t 120 -b -m", I got these custom vmstat:
  pgsteal_kswapd_anon 29261
  pgsteal_kswapd_file 1153696
  pgsteal_direct_anon 255
  pgsteal_direct_file 13050
  pgscan_kswapd_anon 14524536
  pgscan_kswapd_file 1488683
  pgscan_direct_anon 1702448
  pgscan_direct_file 25849

And meminfo shows large anon lru size during test.
  # cat /proc/meminfo | grep -i "active("
  Active(anon):     377760 kB
  Inactive(anon):  3195392 kB
  Active(file):      19216 kB
  Inactive(file):    16044 kB

With this patch, the custom vmstat after test:
  pgsteal_kswapd_anon 74962
  pgsteal_kswapd_file 903588
  pgsteal_direct_anon 4434
  pgsteal_direct_file 14969
  pgscan_kswapd_anon 2814791
  pgscan_kswapd_file 1113676
  pgscan_direct_anon 526766
  pgscan_direct_file 32432

The anon pgscan count is reduced.
Kuo-Hsin Yang Nov. 1, 2018, 12:06 p.m. UTC | #8
On Wed, Oct 31, 2018 at 10:19 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 10/31/18 1:19 AM, owner-linux-mm@kvack.org wrote:
> > -These are currently used in two places in the kernel:
> > +These are currently used in three places in the kernel:
> >
> >   (1) By ramfs to mark the address spaces of its inodes when they are created,
> >       and this mark remains for the life of the inode.
> > @@ -154,6 +154,8 @@ These are currently used in two places in the kernel:
> >       swapped out; the application must touch the pages manually if it wants to
> >       ensure they're in memory.
> >
> > + (3) By the i915 driver to mark pinned address space until it's unpinned.
>
> mlock() and ramfs usage are pretty easy to track down.  /proc/$pid/smaps
> or /proc/meminfo can show us mlock() and good ol' 'df' and friends can
> show us ramfs the extent of pinned memory.
>
> With these, if we see "Unevictable" in meminfo bump up, we at least have
> a starting point to find the cause.
>
> Do we have an equivalent for i915?

AFAIK, there is no way to get i915 unevictable page count, some
modification to i915 debugfs is required.
Chris Wilson Nov. 1, 2018, 12:20 p.m. UTC | #9
Quoting Chris Wilson (2018-10-31 09:41:55)
> Quoting Kuo-Hsin Yang (2018-10-31 08:19:45)
> > The i915 driver uses shmemfs to allocate backing storage for gem
> > objects. These shmemfs pages can be pinned (increased ref count) by
> > shmem_read_mapping_page_gfp(). When a lot of pages are pinned, vmscan
> > wastes a lot of time scanning these pinned pages. In some extreme case,
> > all pages in the inactive anon lru are pinned, and only the inactive
> > anon lru is scanned due to inactive_ratio, the system cannot swap and
> > invokes the oom-killer. Mark these pinned pages as unevictable to speed
> > up vmscan.
> > 
> > Add check_move_lru_page() to move page to appropriate lru list.
> > 
> > This patch was inspired by Chris Wilson's change [1].
> > 
> > [1]: https://patchwork.kernel.org/patch/9768741/
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
> > ---
> > The previous mapping_set_unevictable patch is worse on gem_syslatency
> > because it defers to vmscan to move these pages to the unevictable list
> > and the test measures latency to allocate 2MiB pages. This performance
> > impact can be solved by explicit moving pages to the unevictable list in
> > the i915 function.
> > 
> > Chris, can you help to run the "igt/benchmarks/gem_syslatency -t 120 -b -m"
> > test with this patch on your testing machine? I tried to run the test on
> > a Celeron N4000, 4GB Ram machine. The mean value with this patch is
> > similar to that with the mlock patch.
> 
> Will do. As you are confident, I'll try a few different machines. :)

I had one anomalous result with Ivybridge, but 3/4 different machines
confirm this is effective. I normalized the latency results from each
such that 0 is the baseline median latency (no i915 activity) and 1 is
the median latency with i915 running drm-tip.

    N           Min           Max        Median           Avg        Stddev
ivb 120      0.701641       2.79209       1.24469     1.3333911    0.40871825
byt 120     -0.108194     0.0777012     0.0485302    0.01343581   0.061524734
bxt 120     -0.262057       6.27002     0.0801667    0.15963388    0.63528121
kbl 120    -0.0891262       1.22326    -0.0245336   0.041492506    0.14929689

Just need to go back and check on ivb, perhaps running on a few older 
chipsets as well. But the evidence so far indicates that this eliminates
the impact of i915 activity on the performance of shrink_page_list,
reducing the amount of crippling stalls under mempressure and often
preventing them.
-Chris
Michal Hocko Nov. 1, 2018, 1:09 p.m. UTC | #10
On Thu 01-11-18 19:28:46, Vovo Yang wrote:
> On Thu, Nov 1, 2018 at 12:42 AM Michal Hocko <mhocko@kernel.org> wrote:
> > On Wed 31-10-18 07:40:14, Dave Hansen wrote:
> > > Didn't we create the unevictable lists in the first place because
> > > scanning alone was observed to be so expensive in some scenarios?
> >
> > Yes, that is the case. I might just misunderstood the code I thought
> > those pages were already on the LRU when unevictable flag was set and
> > we would only move these pages to the unevictable list lazy during the
> > reclaim. If the flag is set at the time when the page is added to the
> > LRU then it should get to the proper LRU list right away. But then I do
> > not understand the test results from previous run at all.
> 
> "gem_syslatency -t 120 -b -m" allocates a lot of anon pages, it consists of
> these looping threads:
>   * ncpu threads to alloc i915 shmem buffers, these buffers are freed by i915
> shrinker.
>   * ncpu threads to mmap, write, munmap an 2 MiB mapping.
>   * 1 thread to cat all files to /dev/null
> 
> Without the unevictable patch, after rebooting and running
> "gem_syslatency -t 120 -b -m", I got these custom vmstat:
>   pgsteal_kswapd_anon 29261
>   pgsteal_kswapd_file 1153696
>   pgsteal_direct_anon 255
>   pgsteal_direct_file 13050
>   pgscan_kswapd_anon 14524536
>   pgscan_kswapd_file 1488683
>   pgscan_direct_anon 1702448
>   pgscan_direct_file 25849
> 
> And meminfo shows large anon lru size during test.
>   # cat /proc/meminfo | grep -i "active("
>   Active(anon):     377760 kB
>   Inactive(anon):  3195392 kB
>   Active(file):      19216 kB
>   Inactive(file):    16044 kB
> 
> With this patch, the custom vmstat after test:
>   pgsteal_kswapd_anon 74962
>   pgsteal_kswapd_file 903588
>   pgsteal_direct_anon 4434
>   pgsteal_direct_file 14969
>   pgscan_kswapd_anon 2814791
>   pgscan_kswapd_file 1113676
>   pgscan_direct_anon 526766
>   pgscan_direct_file 32432
> 
> The anon pgscan count is reduced.

OK, so that explain my question about the test case. Even though you
generate a lot of page cache, the amount is still too small to trigger
pagecache mostly reclaim and anon LRUs are scanned as well.

Now to the difference with the previous version which simply set the
UNEVICTABLE flag on mapping. Am I right assuming that pages are already
at LRU at the time? Is there any reason the mapping cannot have the flag
set before they are added to the LRU?
Dave Hansen Nov. 1, 2018, 2:30 p.m. UTC | #11
On 11/1/18 5:06 AM, Vovo Yang wrote:
>> mlock() and ramfs usage are pretty easy to track down.  /proc/$pid/smaps
>> or /proc/meminfo can show us mlock() and good ol' 'df' and friends can
>> show us ramfs the extent of pinned memory.
>>
>> With these, if we see "Unevictable" in meminfo bump up, we at least have
>> a starting point to find the cause.
>>
>> Do we have an equivalent for i915?
> AFAIK, there is no way to get i915 unevictable page count, some
> modification to i915 debugfs is required.

Is something like this feasible to add to this patch set before it gets
merged?  For now, it's probably easy to tell if i915 is at fault because
if the unevictable memory isn't from mlock or ramfs, it must be i915.

But, if we leave it as-is, it'll just defer the issue to the fourth user
of the unevictable list, who will have to come back and add some
debugging for this.

Seems prudent to just do it now.
Kuo-Hsin Yang Nov. 2, 2018, 12:35 p.m. UTC | #12
On Thu, Nov 1, 2018 at 9:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> OK, so that explain my question about the test case. Even though you
> generate a lot of page cache, the amount is still too small to trigger
> pagecache mostly reclaim and anon LRUs are scanned as well.
>
> Now to the difference with the previous version which simply set the
> UNEVICTABLE flag on mapping. Am I right assuming that pages are already
> at LRU at the time? Is there any reason the mapping cannot have the flag
> set before they are added to the LRU?

I checked again. When I run gem_syslatency, it sets unevictable flag
first and then adds pages to LRU, so my explanation to the previous
test result is wrong. It should not be necessary to explicitly move
these pages to unevictable list for this test case. The performance
improvement of this patch on kbl might be due to not calling
shmem_unlock_mapping.

The perf result of a shmem lock test shows find_get_entries is the
most expensive part of shmem_unlock_mapping.
85.32%--ksys_shmctl
        shmctl_do_lock
         --85.29%--shmem_unlock_mapping
                   |--45.98%--find_get_entries
                   |           --10.16%--radix_tree_next_chunk
                   |--16.78%--check_move_unevictable_pages
                   |--16.07%--__pagevec_release
                   |           --15.67%--release_pages
                   |                      --4.82%--free_unref_page_list
                   |--4.38%--pagevec_remove_exceptionals
                    --0.59%--_cond_resched
Kuo-Hsin Yang Nov. 2, 2018, 1:22 p.m. UTC | #13
On Thu, Nov 1, 2018 at 10:30 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 11/1/18 5:06 AM, Vovo Yang wrote:
> >> mlock() and ramfs usage are pretty easy to track down.  /proc/$pid/smaps
> >> or /proc/meminfo can show us mlock() and good ol' 'df' and friends can
> >> show us ramfs the extent of pinned memory.
> >>
> >> With these, if we see "Unevictable" in meminfo bump up, we at least have
> >> a starting point to find the cause.
> >>
> >> Do we have an equivalent for i915?

Chris helped to answer this question:
Though it includes a few non-shmemfs objects, see
debugfs/dri/0/i915_gem_objects and the "bound objects".

Example i915_gem_object output:
  591 objects, 95449088 bytes
  55 unbound objects, 1880064 bytes
  533 bound objects, 93040640 bytes
  ...

> > AFAIK, there is no way to get i915 unevictable page count, some
> > modification to i915 debugfs is required.
>
> Is something like this feasible to add to this patch set before it gets
> merged?  For now, it's probably easy to tell if i915 is at fault because
> if the unevictable memory isn't from mlock or ramfs, it must be i915.
>
> But, if we leave it as-is, it'll just defer the issue to the fourth user
> of the unevictable list, who will have to come back and add some
> debugging for this.
>
> Seems prudent to just do it now.
Dave Hansen Nov. 2, 2018, 2:05 p.m. UTC | #14
On 11/2/18 6:22 AM, Vovo Yang wrote:
> On Thu, Nov 1, 2018 at 10:30 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 11/1/18 5:06 AM, Vovo Yang wrote:
>>>> mlock() and ramfs usage are pretty easy to track down.  /proc/$pid/smaps
>>>> or /proc/meminfo can show us mlock() and good ol' 'df' and friends can
>>>> show us ramfs the extent of pinned memory.
>>>>
>>>> With these, if we see "Unevictable" in meminfo bump up, we at least have
>>>> a starting point to find the cause.
>>>>
>>>> Do we have an equivalent for i915?
> Chris helped to answer this question:
> Though it includes a few non-shmemfs objects, see
> debugfs/dri/0/i915_gem_objects and the "bound objects".
> 
> Example i915_gem_object output:
>   591 objects, 95449088 bytes
>   55 unbound objects, 1880064 bytes
>   533 bound objects, 93040640 bytes

Do those non-shmemfs objects show up on the unevictable list?  How far
can the amount of memory on the unevictable list and the amount
displayed in this "bound objects" value diverge?
Michal Hocko Nov. 2, 2018, 6:26 p.m. UTC | #15
On Fri 02-11-18 20:35:11, Vovo Yang wrote:
> On Thu, Nov 1, 2018 at 9:10 PM Michal Hocko <mhocko@kernel.org> wrote:
> > OK, so that explain my question about the test case. Even though you
> > generate a lot of page cache, the amount is still too small to trigger
> > pagecache mostly reclaim and anon LRUs are scanned as well.
> >
> > Now to the difference with the previous version which simply set the
> > UNEVICTABLE flag on mapping. Am I right assuming that pages are already
> > at LRU at the time? Is there any reason the mapping cannot have the flag
> > set before they are added to the LRU?
> 
> I checked again. When I run gem_syslatency, it sets unevictable flag
> first and then adds pages to LRU, so my explanation to the previous
> test result is wrong. It should not be necessary to explicitly move
> these pages to unevictable list for this test case.

OK, that starts to make sense finally.

> The performance
> improvement of this patch on kbl might be due to not calling
> shmem_unlock_mapping.

Yes that one can get quite expensive. find_get_entries is really
pointless here because you already do have your pages. Abstracting
check_move_unevictable_pages into a pagevec api sounds like a reasonable
compromise between the code duplication and relatively low-level api to
export.

> The perf result of a shmem lock test shows find_get_entries is the
> most expensive part of shmem_unlock_mapping.
> 85.32%--ksys_shmctl
>         shmctl_do_lock
>          --85.29%--shmem_unlock_mapping
>                    |--45.98%--find_get_entries
>                    |           --10.16%--radix_tree_next_chunk
>                    |--16.78%--check_move_unevictable_pages
>                    |--16.07%--__pagevec_release
>                    |           --15.67%--release_pages
>                    |                      --4.82%--free_unref_page_list
>                    |--4.38%--pagevec_remove_exceptionals
>                     --0.59%--_cond_resched
Kuo-Hsin Yang Nov. 5, 2018, 11:24 a.m. UTC | #16
On Fri, Nov 2, 2018 at 10:05 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 11/2/18 6:22 AM, Vovo Yang wrote:
> > Chris helped to answer this question:
> > Though it includes a few non-shmemfs objects, see
> > debugfs/dri/0/i915_gem_objects and the "bound objects".
> >
> > Example i915_gem_object output:
> >   591 objects, 95449088 bytes
> >   55 unbound objects, 1880064 bytes
> >   533 bound objects, 93040640 bytes
>
> Do those non-shmemfs objects show up on the unevictable list?  How far
> can the amount of memory on the unevictable list and the amount
> displayed in this "bound objects" value diverge?

Those non-shmemfs objects would not show up on the unevictable list.

On typical use case, The size of gtt bounded objects (in unevictable
list) is very close to the bound size in i915_gem_objects. E.g. on my
laptop: i915_gem_object shows 110075904 bytes bounded objects, and
there are 109760512 bytes gtt bounded objects, the difference is about
0.3%.
diff mbox series

Patch

diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index fdd84cb8d511..a812fb55136d 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -143,7 +143,7 @@  using a number of wrapper functions:
 	Query the address space, and return true if it is completely
 	unevictable.
 
-These are currently used in two places in the kernel:
+These are currently used in three places in the kernel:
 
  (1) By ramfs to mark the address spaces of its inodes when they are created,
      and this mark remains for the life of the inode.
@@ -154,6 +154,8 @@  These are currently used in two places in the kernel:
      swapped out; the application must touch the pages manually if it wants to
      ensure they're in memory.
 
+ (3) By the i915 driver to mark pinned address space until it's unpinned.
+
 
 Detecting Unevictable Pages
 ---------------------------
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0c8aa57ce83b..6dc3ecef67e4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2387,6 +2387,7 @@  i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj,
 {
 	struct sgt_iter sgt_iter;
 	struct page *page;
+	struct address_space *mapping;
 
 	__i915_gem_object_release_shmem(obj, pages, true);
 
@@ -2395,6 +2396,9 @@  i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj,
 	if (i915_gem_object_needs_bit17_swizzle(obj))
 		i915_gem_object_save_bit_17_swizzle(obj, pages);
 
+	mapping = file_inode(obj->base.filp)->i_mapping;
+	mapping_clear_unevictable(mapping);
+
 	for_each_sgt_page(page, sgt_iter, pages) {
 		if (obj->mm.dirty)
 			set_page_dirty(page);
@@ -2402,6 +2406,10 @@  i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj,
 		if (obj->mm.madv == I915_MADV_WILLNEED)
 			mark_page_accessed(page);
 
+		lock_page(page);
+		check_move_lru_page(page);
+		unlock_page(page);
+
 		put_page(page);
 	}
 	obj->mm.dirty = false;
@@ -2559,6 +2567,7 @@  static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	 * Fail silently without starting the shrinker
 	 */
 	mapping = obj->base.filp->f_mapping;
+	mapping_set_unevictable(mapping);
 	noreclaim = mapping_gfp_constraint(mapping, ~__GFP_RECLAIM);
 	noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
 
@@ -2630,6 +2639,10 @@  static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 		}
 		last_pfn = page_to_pfn(page);
 
+		lock_page(page);
+		check_move_lru_page(page);
+		unlock_page(page);
+
 		/* Check that the i965g/gm workaround works. */
 		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
 	}
@@ -2673,8 +2686,13 @@  static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 err_sg:
 	sg_mark_end(sg);
 err_pages:
-	for_each_sgt_page(page, sgt_iter, st)
+	mapping_clear_unevictable(mapping);
+	for_each_sgt_page(page, sgt_iter, st) {
+		lock_page(page);
+		check_move_lru_page(page);
+		unlock_page(page);
 		put_page(page);
+	}
 	sg_free_table(st);
 	kfree(st);
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index d8a07a4f171d..a812f24d69f2 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -370,6 +370,7 @@  static inline int node_reclaim(struct pglist_data *pgdat, gfp_t mask,
 
 extern int page_evictable(struct page *page);
 extern void check_move_unevictable_pages(struct page **, int nr_pages);
+extern void check_move_lru_page(struct page *page);
 
 extern int kswapd_run(int nid);
 extern void kswapd_stop(int nid);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 62ac0c488624..2399ccaa15e7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4184,12 +4184,11 @@  int page_evictable(struct page *page)
 
 #ifdef CONFIG_SHMEM
 /**
- * check_move_unevictable_pages - check pages for evictability and move to appropriate zone lru list
+ * check_move_unevictable_pages - move evictable pages to appropriate evictable
+ * lru lists
  * @pages:	array of pages to check
  * @nr_pages:	number of pages to check
  *
- * Checks pages for evictability and moves them to the appropriate lru list.
- *
  * This function is only used for SysV IPC SHM_UNLOCK.
  */
 void check_move_unevictable_pages(struct page **pages, int nr_pages)
@@ -4234,3 +4233,18 @@  void check_move_unevictable_pages(struct page **pages, int nr_pages)
 	}
 }
 #endif /* CONFIG_SHMEM */
+
+/**
+ * check_move_lru_page - check page for evictability and move it to
+ * appropriate zone lru list
+ * @page: page to be move to appropriate lru list
+ *
+ * If this function fails to isolate an unevictable page, vmscan will handle it
+ * when it attempts to reclaim the page.
+ */
+void check_move_lru_page(struct page *page)
+{
+	if (!isolate_lru_page(page))
+		putback_lru_page(page);
+}
+EXPORT_SYMBOL(check_move_lru_page);