Message ID | fb91b40d258414b0fdce2c380752e48daa6a70d6.camel@HansenPartnership.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PATCH] fs/dax: fix compile problem on parisc and mips | expand |
On Thu, Dec 03, 2020 at 04:33:10PM -0800, James Bottomley wrote:
> These platforms define PMD_ORDER in asm/pgtable.h
I think that's the real problem, though.
#define PGD_ORDER 1 /* Number of pages per pgd */
#define PMD_ORDER 1 /* Number of pages per pmd */
#define PGD_ALLOC_ORDER (2 + 1) /* first pgd contains pmd */
#else
#define PGD_ORDER 1 /* Number of pages per pgd */
#define PGD_ALLOC_ORDER (PGD_ORDER + 1)
That should clearly be PMD_ALLOC_ORDER, not PMD_ORDER. Or even
PAGES_PER_PMD like the comment calls it, because I really think
that doing an order-3 (8 pages) allocation for the PGD is wrong.
On 12/4/20 4:48 AM, Matthew Wilcox wrote: > On Thu, Dec 03, 2020 at 04:33:10PM -0800, James Bottomley wrote: >> These platforms define PMD_ORDER in asm/pgtable.h > > I think that's the real problem, though. > > #define PGD_ORDER 1 /* Number of pages per pgd */ > #define PMD_ORDER 1 /* Number of pages per pmd */ > #define PGD_ALLOC_ORDER (2 + 1) /* first pgd contains pmd */ > #else > #define PGD_ORDER 1 /* Number of pages per pgd */ > #define PGD_ALLOC_ORDER (PGD_ORDER + 1) > > That should clearly be PMD_ALLOC_ORDER, not PMD_ORDER. Or even > PAGES_PER_PMD like the comment calls it, because I really think > that doing an order-3 (8 pages) allocation for the PGD is wrong. We need a spinlock to protect parallel accesses to the PGD, search for pgd_spinlock(). This spinlock is stored behind the memory used for the PGD, which is why we allocate more memory (and waste 3 pages). Helge
On Fri, Dec 04, 2020 at 08:57:37AM +0100, Helge Deller wrote: > On 12/4/20 4:48 AM, Matthew Wilcox wrote: > > On Thu, Dec 03, 2020 at 04:33:10PM -0800, James Bottomley wrote: > >> These platforms define PMD_ORDER in asm/pgtable.h > > > > I think that's the real problem, though. > > > > #define PGD_ORDER 1 /* Number of pages per pgd */ > > #define PMD_ORDER 1 /* Number of pages per pmd */ > > #define PGD_ALLOC_ORDER (2 + 1) /* first pgd contains pmd */ > > #else > > #define PGD_ORDER 1 /* Number of pages per pgd */ > > #define PGD_ALLOC_ORDER (PGD_ORDER + 1) > > > > That should clearly be PMD_ALLOC_ORDER, not PMD_ORDER. Or even > > PAGES_PER_PMD like the comment calls it, because I really think > > that doing an order-3 (8 pages) allocation for the PGD is wrong. > > We need a spinlock to protect parallel accesses to the PGD, > search for pgd_spinlock(). > This spinlock is stored behind the memory used for the PGD, which > is why we allocate more memory (and waste 3 pages). There are better places to store it than that! For example, storing it in the struct page, like many architectures do for split ptlocks. You'll have to skip the _pt_pad_1 so it doesn't get confused with being a compound_head, but soemthing like this: struct { /* PA-RISC PGD */ unsigned long _pa_pad_1; /* compound_head */ #if ALLOC_SPLIT_PA_PTLOCKS spinlock_t *pa_ptl; #else spinlock_t pa_ptl; #endif }; inside struct page (linux/mm_types.h) should do the trick. You'll still need to allocate them separately if various debugging options are enabled (see the ALLOC_SPLIT_PTLOCKS for details), but usually this will save you a lot of memory. You could also fill in pt_mm like x86 does for its pgds, and then use mm->page_table_lock to protect whatever the PGD lock currently protects. Maybe page_table_lock is sometimes held when calling ptep_set_wrprotect() and sometimes isn't; then this wouldn't work. Also, could you fix the comments? They don't match the code: #define PGD_ORDER 1 /* Number of pages per pgd */ should be #define PGD_ALLOC_ORDER 1 /* 2 pages (8KiB) per pgd */
On 2020-12-04 7:44 a.m., Matthew Wilcox wrote: > You'll > still need to allocate them separately if various debugging options > are enabled (see the ALLOC_SPLIT_PTLOCKS for details), but usually > this will save you a lot of memory. We need all we can get: (.mlocate): page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 2 PID: 28271 Comm: (.mlocate) Not tainted 5.9.11+ #1 Hardware name: 9000/800/rp3440 Backtrace: [<000000004018d050>] show_stack+0x50/0x70 [<0000000040826354>] dump_stack+0xbc/0x130 [<000000004033dc14>] warn_alloc+0x144/0x1e0 [<000000004033e930>] __alloc_pages_slowpath.constprop.0+0xc80/0xcf8 [<000000004033ec48>] __alloc_pages_nodemask+0x2a0/0x2f0 [<0000000040351a2c>] cache_alloc_refill+0x6b4/0xe50 [<000000004035416c>] __kmalloc+0x5e4/0x740 [<00000000040ddbe8>] nfsd_reply_cache_init+0x1d0/0x360 [nfsd] [<00000000040d1118>] nfsd_init_net+0xb0/0x1c0 [nfsd] [<00000000406a5860>] ops_init+0x68/0x178 [<00000000406a5b24>] setup_net+0x1b4/0x348 [<00000000406a6e20>] copy_net_ns+0x1f0/0x450 [<00000000401e6578>] create_new_namespaces+0x1b0/0x418 [<00000000401e72ac>] unshare_nsproxy_namespaces+0x8c/0xf0 [<00000000401b6acc>] ksys_unshare+0x1bc/0x440 [<00000000401b6d70>] sys_unshare+0x20/0x38 [<0000000040188018>] syscall_exit+0x0/0x14 Mem-Info: active_anon:1209957 inactive_anon:438171 isolated_anon:0 active_file:38971 inactive_file:21741 isolated_file:0 unevictable:4662 dirty:144 writeback:0 slab_reclaimable:45748 slab_unreclaimable:51548 mapped:34940 shmem:1471859 pagetables:5429 bounce:0 free:213676 free_pcp:317 free_cma:0 Node 0 active_anon:4839828kB inactive_anon:1753200kB active_file:155884kB inactive_file:86964kB unevictable:18648kB isolated(anon):0kB isolated(file):0kB mapped:139760kB dirty:576kB writeback:0kB shmem:5887436kB writeback_tmp:0kB kernel_stack:6032kB all_unreclaimable? no Normal free:853948kB min:11448kB low:19636kB high:27824kB reserved_highatomic:0KB active_anon:4839828kB inactive_anon:1753544kB active_file:155884kB inactive_file:86964kB unevictable:18648kB writepending:576kB present:8386560kB managed:8211756kB mlocked:18648kB pagetables:21716kB bounce:0kB free_pcp:1168kB local_pcp:352kB free_cma:0kB lowmem_reserve[]: 0 0 Normal: 58860*4kB (UME) 48155*8kB (UME) 11414*16kB (UME) 1291*32kB (UME) 134*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 853192kB 1607717 total pagecache pages 73822 pages in swap cache Swap cache stats: add 21252454, delete 21178083, find 5702723/6815498 Free swap = 33742652kB Total swap = 49758652kB 2096640 pages RAM 0 pages HighMem/MovableOnly 43701 pages reserved Cheers, Dave
On Fri, Dec 04, 2020 at 08:28:47AM -0500, John David Anglin wrote: > (.mlocate): page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 > [<000000004035416c>] __kmalloc+0x5e4/0x740 > [<00000000040ddbe8>] nfsd_reply_cache_init+0x1d0/0x360 [nfsd] Oof, order 5. Fortunately, that one was already fixed by commit 8c38b705b4f4ca4e7f9cc116141bc38391917c30.
diff --git a/fs/dax.c b/fs/dax.c index 5b47834f2e1b..4d3b0db5c321 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -50,7 +50,7 @@ static inline unsigned int pe_order(enum page_entry_size pe_size) #define PG_PMD_NR (PMD_SIZE >> PAGE_SHIFT) /* The order of a PMD entry */ -#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define DAX_PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES]; @@ -98,7 +98,7 @@ static bool dax_is_locked(void *entry) static unsigned int dax_entry_order(void *entry) { if (xa_to_value(entry) & DAX_PMD) - return PMD_ORDER; + return DAX_PMD_ORDER; return 0; } @@ -1471,7 +1471,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, { struct vm_area_struct *vma = vmf->vma; struct address_space *mapping = vma->vm_file->f_mapping; - XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER); + XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, DAX_PMD_ORDER); unsigned long pmd_addr = vmf->address & PMD_MASK; bool write = vmf->flags & FAULT_FLAG_WRITE; bool sync; @@ -1530,7 +1530,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * entry is already in the array, for instance), it will return * VM_FAULT_FALLBACK. */ - entry = grab_mapping_entry(&xas, mapping, PMD_ORDER); + entry = grab_mapping_entry(&xas, mapping, DAX_PMD_ORDER); if (xa_is_internal(entry)) { result = xa_to_internal(entry); goto fallback; @@ -1696,7 +1696,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order) if (order == 0) ret = vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn); #ifdef CONFIG_FS_DAX_PMD - else if (order == PMD_ORDER) + else if (order == DAX_PMD_ORDER) ret = vmf_insert_pfn_pmd(vmf, pfn, FAULT_FLAG_WRITE); #endif else
These platforms define PMD_ORDER in asm/pgtable.h This means that as soon as dax.c included asm/pgtable.h in commit 11cf9d863dcb ("fs/dax: Deposit pagetable even when installing zero page") we clash with PMD_ORDER introduced by cfc93c6c6c96 ("dax: Convert dax_insert_pfn_mkwrite to XArray") and we get this problem: /home/jejb/git/linux-build/fs/dax.c:53: warning: "PMD_ORDER" redefined 53 | #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) | In file included from /home/jejb/git/linux-build/include/linux/pgtable.h:6, from /home/jejb/git/linux-build/include/linux/mm.h:33, from /home/jejb/git/linux-build/include/linux/bvec.h:14, from /home/jejb/git/linux-build/include/linux/blk_types.h:10, from /home/jejb/git/linux-build/include/linux/genhd.h:19, from /home/jejb/git/linux-build/include/linux/blkdev.h:8, from /home/jejb/git/linux-build/fs/dax.c:10: /home/jejb/git/linux-build/arch/parisc/include/asm/pgtable.h:124: note: this is the location of the previous definition 124 | #define PMD_ORDER 1 /* Number of pages per pmd */ | make[2]: *** Deleting file 'fs/dax.o' Fix by renaming dax's PMD_ORDER to DAX_PMD_ORDER Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> --- fs/dax.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)