Message ID | 20190514025604.9997-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/nvdimm: Use correct #defines instead of opencoding | expand |
On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > The nfpn related change is needed to fix the kernel message > > "number of pfns truncated from 2617344 to 163584" > > The change makes sure the nfpns stored in the superblock is right value. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > drivers/nvdimm/pfn_devs.c | 6 +++--- > drivers/nvdimm/region_devs.c | 8 ++++---- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 347cab166376..6751ff0296ef 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > * when populating the vmemmap. This *should* be equal to > * PMD_SIZE for most architectures. > */ > - offset = ALIGN(start + reserve + 64 * npfns, > - max(nd_pfn->align, PMD_SIZE)) - start; > + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > + max(nd_pfn->align, PMD_SIZE)) - start; No, I think we need to record the page-size into the superblock format otherwise this breaks in debug builds where the struct-page size is extended. > } else if (nd_pfn->mode == PFN_MODE_RAM) > offset = ALIGN(start + reserve, nd_pfn->align) - start; > else > @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > return -ENXIO; > } > > - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; Similar comment, if the page size is variable then the superblock needs to explicitly account for it.
On 5/14/19 9:28 AM, Dan Williams wrote: > On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> The nfpn related change is needed to fix the kernel message >> >> "number of pfns truncated from 2617344 to 163584" >> >> The change makes sure the nfpns stored in the superblock is right value. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> drivers/nvdimm/pfn_devs.c | 6 +++--- >> drivers/nvdimm/region_devs.c | 8 ++++---- >> 2 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >> index 347cab166376..6751ff0296ef 100644 >> --- a/drivers/nvdimm/pfn_devs.c >> +++ b/drivers/nvdimm/pfn_devs.c >> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> * when populating the vmemmap. This *should* be equal to >> * PMD_SIZE for most architectures. >> */ >> - offset = ALIGN(start + reserve + 64 * npfns, >> - max(nd_pfn->align, PMD_SIZE)) - start; >> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, >> + max(nd_pfn->align, PMD_SIZE)) - start; > > No, I think we need to record the page-size into the superblock format > otherwise this breaks in debug builds where the struct-page size is > extended. > >> } else if (nd_pfn->mode == PFN_MODE_RAM) >> offset = ALIGN(start + reserve, nd_pfn->align) - start; >> else >> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> return -ENXIO; >> } >> >> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; >> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > > Similar comment, if the page size is variable then the superblock > needs to explicitly account for it. > PAGE_SIZE is not really variable. What we can run into is the issue you mentioned above. The size of struct page can change which means the reserved space for keeping vmemmap in device may not be sufficient for certain kernel builds. I was planning to add another patch that fails namespace init if we don't have enough space to keep the struct page. Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? -aneesh
On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > On 5/14/19 9:28 AM, Dan Williams wrote: > > On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > > <aneesh.kumar@linux.ibm.com> wrote: > >> > >> The nfpn related change is needed to fix the kernel message > >> > >> "number of pfns truncated from 2617344 to 163584" > >> > >> The change makes sure the nfpns stored in the superblock is right value. > >> > >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > >> --- > >> drivers/nvdimm/pfn_devs.c | 6 +++--- > >> drivers/nvdimm/region_devs.c | 8 ++++---- > >> 2 files changed, 7 insertions(+), 7 deletions(-) > >> > >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > >> index 347cab166376..6751ff0296ef 100644 > >> --- a/drivers/nvdimm/pfn_devs.c > >> +++ b/drivers/nvdimm/pfn_devs.c > >> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> * when populating the vmemmap. This *should* be equal to > >> * PMD_SIZE for most architectures. > >> */ > >> - offset = ALIGN(start + reserve + 64 * npfns, > >> - max(nd_pfn->align, PMD_SIZE)) - start; > >> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > >> + max(nd_pfn->align, PMD_SIZE)) - start; > > > > No, I think we need to record the page-size into the superblock format > > otherwise this breaks in debug builds where the struct-page size is > > extended. > > > >> } else if (nd_pfn->mode == PFN_MODE_RAM) > >> offset = ALIGN(start + reserve, nd_pfn->align) - start; > >> else > >> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> return -ENXIO; > >> } > >> > >> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > >> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > > > > Similar comment, if the page size is variable then the superblock > > needs to explicitly account for it. > > > > PAGE_SIZE is not really variable. What we can run into is the issue you > mentioned above. The size of struct page can change which means the > reserved space for keeping vmemmap in device may not be sufficient for > certain kernel builds. > > I was planning to add another patch that fails namespace init if we > don't have enough space to keep the struct page. > > Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? So that the kernel has a chance to identify cases where the superblock it is handling was created on a system with different PAGE_SIZE assumptions.
On 5/14/19 9:42 AM, Dan Williams wrote: > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> On 5/14/19 9:28 AM, Dan Williams wrote: >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V >>> <aneesh.kumar@linux.ibm.com> wrote: >>>> >>>> The nfpn related change is needed to fix the kernel message >>>> >>>> "number of pfns truncated from 2617344 to 163584" >>>> >>>> The change makes sure the nfpns stored in the superblock is right value. >>>> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>> --- >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- >>>> drivers/nvdimm/region_devs.c | 8 ++++---- >>>> 2 files changed, 7 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >>>> index 347cab166376..6751ff0296ef 100644 >>>> --- a/drivers/nvdimm/pfn_devs.c >>>> +++ b/drivers/nvdimm/pfn_devs.c >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >>>> * when populating the vmemmap. This *should* be equal to >>>> * PMD_SIZE for most architectures. >>>> */ >>>> - offset = ALIGN(start + reserve + 64 * npfns, >>>> - max(nd_pfn->align, PMD_SIZE)) - start; >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, >>>> + max(nd_pfn->align, PMD_SIZE)) - start; >>> >>> No, I think we need to record the page-size into the superblock format >>> otherwise this breaks in debug builds where the struct-page size is >>> extended. >>> >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; >>>> else >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >>>> return -ENXIO; >>>> } >>>> >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; >>> >>> Similar comment, if the page size is variable then the superblock >>> needs to explicitly account for it. >>> >> >> PAGE_SIZE is not really variable. What we can run into is the issue you >> mentioned above. The size of struct page can change which means the >> reserved space for keeping vmemmap in device may not be sufficient for >> certain kernel builds. >> >> I was planning to add another patch that fails namespace init if we >> don't have enough space to keep the struct page. >> >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? > > So that the kernel has a chance to identify cases where the superblock > it is handling was created on a system with different PAGE_SIZE > assumptions. > The reason to do that is we don't have enough space to keep struct page backing the total number of pfns? If so, what i suggested above should handle that. or are you finding any other reason why we should fail a namespace init with a different PAGE_SIZE value? My another patch handle the details w.r.t devdax alignment for which devdax got created with PAGE_SIZE 4K but we are now trying to load that in a kernel with PAGE_SIZE 64k. -aneesh
On Mon, May 13, 2019 at 9:46 PM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > On 5/14/19 9:42 AM, Dan Williams wrote: > > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V > > <aneesh.kumar@linux.ibm.com> wrote: > >> > >> On 5/14/19 9:28 AM, Dan Williams wrote: > >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > >>> <aneesh.kumar@linux.ibm.com> wrote: > >>>> > >>>> The nfpn related change is needed to fix the kernel message > >>>> > >>>> "number of pfns truncated from 2617344 to 163584" > >>>> > >>>> The change makes sure the nfpns stored in the superblock is right value. > >>>> > >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > >>>> --- > >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- > >>>> drivers/nvdimm/region_devs.c | 8 ++++---- > >>>> 2 files changed, 7 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > >>>> index 347cab166376..6751ff0296ef 100644 > >>>> --- a/drivers/nvdimm/pfn_devs.c > >>>> +++ b/drivers/nvdimm/pfn_devs.c > >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >>>> * when populating the vmemmap. This *should* be equal to > >>>> * PMD_SIZE for most architectures. > >>>> */ > >>>> - offset = ALIGN(start + reserve + 64 * npfns, > >>>> - max(nd_pfn->align, PMD_SIZE)) - start; > >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > >>>> + max(nd_pfn->align, PMD_SIZE)) - start; > >>> > >>> No, I think we need to record the page-size into the superblock format > >>> otherwise this breaks in debug builds where the struct-page size is > >>> extended. > >>> > >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) > >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; > >>>> else > >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >>>> return -ENXIO; > >>>> } > >>>> > >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > >>> > >>> Similar comment, if the page size is variable then the superblock > >>> needs to explicitly account for it. > >>> > >> > >> PAGE_SIZE is not really variable. What we can run into is the issue you > >> mentioned above. The size of struct page can change which means the > >> reserved space for keeping vmemmap in device may not be sufficient for > >> certain kernel builds. > >> > >> I was planning to add another patch that fails namespace init if we > >> don't have enough space to keep the struct page. > >> > >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? > > > > So that the kernel has a chance to identify cases where the superblock > > it is handling was created on a system with different PAGE_SIZE > > assumptions. > > > > The reason to do that is we don't have enough space to keep struct page > backing the total number of pfns? If so, what i suggested above should > handle that. > > or are you finding any other reason why we should fail a namespace init > with a different PAGE_SIZE value? I want the kernel to be able to start understand cross-architecture and cross-configuration geometries. Which to me means incrementing the info-block version and recording PAGE_SIZE and sizeof(struct page) in the info-block directly. > My another patch handle the details w.r.t devdax alignment for which > devdax got created with PAGE_SIZE 4K but we are now trying to load that > in a kernel with PAGE_SIZE 64k. Sure, but what about the reverse? These info-block format assumptions are as fundamental as the byte-order of the info-block, it needs to be cross-arch compatible and the x86 assumptions need to be fully lifted.
Dan Williams <dan.j.williams@intel.com> writes: > On Mon, May 13, 2019 at 9:46 PM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> On 5/14/19 9:42 AM, Dan Williams wrote: >> > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V >> > <aneesh.kumar@linux.ibm.com> wrote: >> >> >> >> On 5/14/19 9:28 AM, Dan Williams wrote: >> >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V >> >>> <aneesh.kumar@linux.ibm.com> wrote: >> >>>> >> >>>> The nfpn related change is needed to fix the kernel message >> >>>> >> >>>> "number of pfns truncated from 2617344 to 163584" >> >>>> >> >>>> The change makes sure the nfpns stored in the superblock is right value. >> >>>> >> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> >>>> --- >> >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- >> >>>> drivers/nvdimm/region_devs.c | 8 ++++---- >> >>>> 2 files changed, 7 insertions(+), 7 deletions(-) >> >>>> >> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >> >>>> index 347cab166376..6751ff0296ef 100644 >> >>>> --- a/drivers/nvdimm/pfn_devs.c >> >>>> +++ b/drivers/nvdimm/pfn_devs.c >> >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> >>>> * when populating the vmemmap. This *should* be equal to >> >>>> * PMD_SIZE for most architectures. >> >>>> */ >> >>>> - offset = ALIGN(start + reserve + 64 * npfns, >> >>>> - max(nd_pfn->align, PMD_SIZE)) - start; >> >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, >> >>>> + max(nd_pfn->align, PMD_SIZE)) - start; >> >>> >> >>> No, I think we need to record the page-size into the superblock format >> >>> otherwise this breaks in debug builds where the struct-page size is >> >>> extended. >> >>> >> >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) >> >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; >> >>>> else >> >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> >>>> return -ENXIO; >> >>>> } >> >>>> >> >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; >> >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; >> >>> >> >>> Similar comment, if the page size is variable then the superblock >> >>> needs to explicitly account for it. >> >>> >> >> >> >> PAGE_SIZE is not really variable. What we can run into is the issue you >> >> mentioned above. The size of struct page can change which means the >> >> reserved space for keeping vmemmap in device may not be sufficient for >> >> certain kernel builds. >> >> >> >> I was planning to add another patch that fails namespace init if we >> >> don't have enough space to keep the struct page. >> >> >> >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? >> > >> > So that the kernel has a chance to identify cases where the superblock >> > it is handling was created on a system with different PAGE_SIZE >> > assumptions. >> > >> >> The reason to do that is we don't have enough space to keep struct page >> backing the total number of pfns? If so, what i suggested above should >> handle that. >> >> or are you finding any other reason why we should fail a namespace init >> with a different PAGE_SIZE value? > > I want the kernel to be able to start understand cross-architecture > and cross-configuration geometries. Which to me means incrementing the > info-block version and recording PAGE_SIZE and sizeof(struct page) in > the info-block directly. > >> My another patch handle the details w.r.t devdax alignment for which >> devdax got created with PAGE_SIZE 4K but we are now trying to load that >> in a kernel with PAGE_SIZE 64k. > > Sure, but what about the reverse? These info-block format assumptions > are as fundamental as the byte-order of the info-block, it needs to be > cross-arch compatible and the x86 assumptions need to be fully lifted. Something like the below (Not tested). I am not sure what we will init the page_size for minor version < 3. This will mark the namespace disabled if the PAGE_SIZE and sizeof(struct page) doesn't match with the values used during namespace create. diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index dde9853453d3..d6e0933d0dd4 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -36,6 +36,9 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 record the page size and struct page size */ + __le32 page_size; + __le32 page_struct_size; u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 6f9f78858018..bbc1d792d7f3 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -477,6 +477,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (__le16_to_cpu(pfn_sb->version_minor) < 2) pfn_sb->align = 0; + if (__le16_to_cpu(pfn_sb->version_minor) < 3) { + /* + * For a large part we use PAGE_SIZE. But we + * do have some accounting code using SIZE_4K. + */ + pfn_sb->page_size = cpu_to_le32(PAGE_SIZE); + pfn_sb->page_struct_size = cpu_to_le32(64); + } + switch (le32_to_cpu(pfn_sb->mode)) { case PFN_MODE_RAM: case PFN_MODE_PMEM: @@ -504,6 +513,12 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EOPNOTSUPP; } + if (le32_to_cpu(pfn_sb->page_size) != PAGE_SIZE) + return -EOPNOTSUPP; + + if (le32_to_cpu(pfn_sb->page_struct_size) != sizeof(struct page)) + return -EOPNOTSUPP; + if (!nd_pfn->uuid) { /* * When probing a namepace via nd_pfn_probe() the uuid @@ -798,7 +813,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align);
> -----Original Message----- > From: Linux-nvdimm <linux-nvdimm-bounces@lists.01.org> On Behalf Of > Aneesh Kumar K.V > Sent: Tuesday, May 21, 2019 4:51 AM > Subject: Re: [PATCH] mm/nvdimm: Use correct #defines instead of > opencoding > ... > @@ -36,6 +36,9 @@ struct nd_pfn_sb { > __le32 end_trunc; > /* minor-version-2 record the base alignment of the mapping */ > __le32 align; > + /* minor-version-3 record the page size and struct page size > */ > + __le32 page_size; > + __le32 page_struct_size; > u8 padding[4000]; > __le64 checksum; > }; You might need to reduce the padding size to offset the extra added fields.
On Tue, May 21, 2019 at 2:51 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > On Mon, May 13, 2019 at 9:46 PM Aneesh Kumar K.V > > <aneesh.kumar@linux.ibm.com> wrote: > >> > >> On 5/14/19 9:42 AM, Dan Williams wrote: > >> > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V > >> > <aneesh.kumar@linux.ibm.com> wrote: > >> >> > >> >> On 5/14/19 9:28 AM, Dan Williams wrote: > >> >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > >> >>> <aneesh.kumar@linux.ibm.com> wrote: > >> >>>> > >> >>>> The nfpn related change is needed to fix the kernel message > >> >>>> > >> >>>> "number of pfns truncated from 2617344 to 163584" > >> >>>> > >> >>>> The change makes sure the nfpns stored in the superblock is right value. > >> >>>> > >> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > >> >>>> --- > >> >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- > >> >>>> drivers/nvdimm/region_devs.c | 8 ++++---- > >> >>>> 2 files changed, 7 insertions(+), 7 deletions(-) > >> >>>> > >> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > >> >>>> index 347cab166376..6751ff0296ef 100644 > >> >>>> --- a/drivers/nvdimm/pfn_devs.c > >> >>>> +++ b/drivers/nvdimm/pfn_devs.c > >> >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> >>>> * when populating the vmemmap. This *should* be equal to > >> >>>> * PMD_SIZE for most architectures. > >> >>>> */ > >> >>>> - offset = ALIGN(start + reserve + 64 * npfns, > >> >>>> - max(nd_pfn->align, PMD_SIZE)) - start; > >> >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > >> >>>> + max(nd_pfn->align, PMD_SIZE)) - start; > >> >>> > >> >>> No, I think we need to record the page-size into the superblock format > >> >>> otherwise this breaks in debug builds where the struct-page size is > >> >>> extended. > >> >>> > >> >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) > >> >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; > >> >>>> else > >> >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> >>>> return -ENXIO; > >> >>>> } > >> >>>> > >> >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > >> >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > >> >>> > >> >>> Similar comment, if the page size is variable then the superblock > >> >>> needs to explicitly account for it. > >> >>> > >> >> > >> >> PAGE_SIZE is not really variable. What we can run into is the issue you > >> >> mentioned above. The size of struct page can change which means the > >> >> reserved space for keeping vmemmap in device may not be sufficient for > >> >> certain kernel builds. > >> >> > >> >> I was planning to add another patch that fails namespace init if we > >> >> don't have enough space to keep the struct page. > >> >> > >> >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? > >> > > >> > So that the kernel has a chance to identify cases where the superblock > >> > it is handling was created on a system with different PAGE_SIZE > >> > assumptions. > >> > > >> > >> The reason to do that is we don't have enough space to keep struct page > >> backing the total number of pfns? If so, what i suggested above should > >> handle that. > >> > >> or are you finding any other reason why we should fail a namespace init > >> with a different PAGE_SIZE value? > > > > I want the kernel to be able to start understand cross-architecture > > and cross-configuration geometries. Which to me means incrementing the > > info-block version and recording PAGE_SIZE and sizeof(struct page) in > > the info-block directly. > > > >> My another patch handle the details w.r.t devdax alignment for which > >> devdax got created with PAGE_SIZE 4K but we are now trying to load that > >> in a kernel with PAGE_SIZE 64k. > > > > Sure, but what about the reverse? These info-block format assumptions > > are as fundamental as the byte-order of the info-block, it needs to be > > cross-arch compatible and the x86 assumptions need to be fully lifted. > > Something like the below (Not tested). I am not sure what we will init the page_size > for minor version < 3. This will mark the namespace disabled if the > PAGE_SIZE and sizeof(struct page) doesn't match with the values used > during namespace create. Yes, this is on the right track. I would special-case page_size == 0 as 4096 and page_struct_size == 0 as 64. If either of those is non-zero then the info-block version needs to be revved and it needs to be crafted to make older kernels fail to parse it. There was an earlier attempt to implement minimum info-block versions here: https://lore.kernel.org/lkml/155000670159.348031.17631616775326330606.stgit@dwillia2-desk3.amr.corp.intel.com/ ...but that was dropped in favor of the the "sub-section" patches.
On 5/21/19 9:37 PM, Dan Williams wrote: > On Tue, May 21, 2019 at 2:51 AM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: .... >> >> Something like the below (Not tested). I am not sure what we will init the page_size >> for minor version < 3. This will mark the namespace disabled if the >> PAGE_SIZE and sizeof(struct page) doesn't match with the values used >> during namespace create. > > Yes, this is on the right track. > > I would special-case page_size == 0 as 4096 and page_struct_size == 0 > as 64. If either of those is non-zero then the info-block version > needs to be revved and it needs to be crafted to make older kernels > fail to parse it. > page_size = SZ_4K implies we fail to enable namesepaces created on ppc64 till now. We do work fine with page_size = PAGE_SIZE. It is a few error check and pfn_sb->npfns that got wrong values. We do reserve the correct space for the required pfns even when we recorded wrong pfn_sb->npfs. > There was an earlier attempt to implement minimum info-block versions here: > > https://lore.kernel.org/lkml/155000670159.348031.17631616775326330606.stgit@dwillia2-desk3.amr.corp.intel.com/ > > ...but that was dropped in favor of the the "sub-section" patches. > Ok i will pick that too. -aneesh
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 347cab166376..6751ff0296ef 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, + max(nd_pfn->align, PMD_SIZE)) - start; } else if (nd_pfn->mode == PFN_MODE_RAM) offset = ALIGN(start + reserve, nd_pfn->align) - start; else @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; pfn_sb->mode = cpu_to_le32(nd_pfn->mode); pfn_sb->dataoff = cpu_to_le64(offset); pfn_sb->npfns = cpu_to_le64(npfns); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b4ef7d9ff22e..2d8facea5a03 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_mapping_desc *mapping = &ndr_desc->mapping[i]; struct nvdimm *nvdimm = mapping->nvdimm; - if ((mapping->start | mapping->size) % SZ_4K) { - dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n", - caller, dev_name(&nvdimm->dev), i); - + if ((mapping->start | mapping->size) % PAGE_SIZE) { + dev_err(&nvdimm_bus->dev, + "%s: %s mapping%d is not 4K aligned\n", + caller, dev_name(&nvdimm->dev), i); return NULL; }
The nfpn related change is needed to fix the kernel message "number of pfns truncated from 2617344 to 163584" The change makes sure the nfpns stored in the superblock is right value. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- drivers/nvdimm/pfn_devs.c | 6 +++--- drivers/nvdimm/region_devs.c | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-)