diff mbox

[v2,06/18] pc: implement NVDIMM device abstract

Message ID 1439563931-12352-7-git-send-email-guangrong.xiao@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong Aug. 14, 2015, 2:51 p.m. UTC
Introduce "pc-nvdimm" device and it has two parameters:
- @file, which is the backed memory file for NVDIMM device

- @configdata, specify if we need to reserve 128k at the end of
  @file for nvdimm device's config data. Default is false

If @configdata is false, Qemu will build a static and readonly
namespace in memory and use it serveing for
DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
This is good for the user who want to pass whole nvdimm device
and make its data is complete visible to guest

We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
Qemu command to create NVDIMM device for the guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/Makefile.objs                   |  2 +-
 hw/mem/Makefile.objs               |  1 +
 hw/mem/nvdimm/pc-nvdimm.c          | 99 ++++++++++++++++++++++++++++++++++++++
 include/hw/mem/pc-nvdimm.h         | 31 ++++++++++++
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 hw/mem/nvdimm/pc-nvdimm.c
 create mode 100644 include/hw/mem/pc-nvdimm.h

Comments

Stefan Hajnoczi Aug. 25, 2015, 2:57 p.m. UTC | #1
On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
> +static void set_file(Object *obj, const char *str, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    if (nvdimm->file) {
> +        g_free(nvdimm->file);
> +    }

g_free(NULL) is a nop so it's safe to replace the if with just
g_free(nvdimm->file).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Aug. 26, 2015, 9:37 a.m. UTC | #2
On 08/25/2015 10:57 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 14, 2015 at 10:51:59PM +0800, Xiao Guangrong wrote:
>> +static void set_file(Object *obj, const char *str, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    if (nvdimm->file) {
>> +        g_free(nvdimm->file);
>> +    }
>
> g_free(NULL) is a nop so it's safe to replace the if with just
> g_free(nvdimm->file).
>

Yeah, the man page says you're right. Will clean it up.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov Sept. 2, 2015, 9:58 a.m. UTC | #3
On Fri, 14 Aug 2015 22:51:59 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Introduce "pc-nvdimm" device and it has two parameters:
Why do you use prefix "pc-", I suppose we potentially
could use this device not only with x86 targets but with
other targets as well.
I'd just drop 'pc' prefix through out patchset.

> - @file, which is the backed memory file for NVDIMM device
Could you try to split device into backend/frontend parts,
like it's done with pc-dimm. As I understand it's preferred
way to implement this kind of devices.
Then you could reuse memory backends that we already have
including file backend.

So CLI could look like:
-object memory-backend-file,id=mem0,file=/storage/foo
-device nvdimm,memdev=mem0,configdata=on

> 
> - @configdata, specify if we need to reserve 128k at the end of
>   @file for nvdimm device's config data. Default is false
> 
> If @configdata is false, Qemu will build a static and readonly
> namespace in memory and use it serveing for
> DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
> This is good for the user who want to pass whole nvdimm device
> and make its data is complete visible to guest
> 
> We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
> Qemu command to create NVDIMM device for the guest
PS:
please try to fix commit message spelling/grammar wise.

[...]
> +++ b/hw/mem/nvdimm/pc-nvdimm.c
> @@ -0,0 +1,99 @@
> +/*
> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
s/Implement/Implementation/ in all new files
an maybe s/NVDIMM (A // as it's reduntant

[...]
> +static bool has_configdata(Object *obj, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    return nvdimm->configdata;
> +}
> +
> +static void set_configdata(Object *obj, bool value, Error **errp)
> +{
> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
> +
> +    nvdimm->configdata = value;
> +}
usually for property setters/getters we use form:
 "device_prefix"_[g|s]et_foo
so
 nvdim_get_configdata ...

[...]

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Sept. 2, 2015, 10:36 a.m. UTC | #4
On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> On Fri, 14 Aug 2015 22:51:59 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Introduce "pc-nvdimm" device and it has two parameters:
> Why do you use prefix "pc-", I suppose we potentially
> could use this device not only with x86 targets but with
> other targets as well.
> I'd just drop 'pc' prefix through out patchset.

Yeah, the prefix is stolen from pc-dimm, will drop this
prefix as your suggestion.

>
>> - @file, which is the backed memory file for NVDIMM device
> Could you try to split device into backend/frontend parts,
> like it's done with pc-dimm. As I understand it's preferred
> way to implement this kind of devices.
> Then you could reuse memory backends that we already have
> including file backend.

I considered it too and Stefan, Paolo got the some idea in
V1's review, however:

| However, file-based memory used by NVDIMM is special, it divides the file
| to two parts, one part is used as PMEM and another part is used to store
| NVDIMM's configure data.
|
| Maybe we can introduce "end-reserved" property to reserve specified size
| at the end of the file. Or create a new class type based on
| memory-backend-file (named nvdimm-backend-file) class to hide this magic
| thing?

Your idea?

>
> So CLI could look like:
> -object memory-backend-file,id=mem0,file=/storage/foo
> -device nvdimm,memdev=mem0,configdata=on
>
>>
>> - @configdata, specify if we need to reserve 128k at the end of
>>    @file for nvdimm device's config data. Default is false
>>
>> If @configdata is false, Qemu will build a static and readonly
>> namespace in memory and use it serveing for
>> DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests.
>> This is good for the user who want to pass whole nvdimm device
>> and make its data is complete visible to guest
>>
>> We can use "-device pc-nvdimm,file=/dev/pmem,configdata" in the
>> Qemu command to create NVDIMM device for the guest
> PS:
> please try to fix commit message spelling/grammar wise.

Sorry for my careless, will try it fix them.

>
> [...]
>> +++ b/hw/mem/nvdimm/pc-nvdimm.c
>> @@ -0,0 +1,99 @@
>> +/*
>> + * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
> s/Implement/Implementation/ in all new files
> an maybe s/NVDIMM (A // as it's reduntant

Okay, will drop it.

>
> [...]
>> +static bool has_configdata(Object *obj, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    return nvdimm->configdata;
>> +}
>> +
>> +static void set_configdata(Object *obj, bool value, Error **errp)
>> +{
>> +    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
>> +
>> +    nvdimm->configdata = value;
>> +}
> usually for property setters/getters we use form:
>   "device_prefix"_[g|s]et_foo
> so
>   nvdim_get_configdata ...

Good to me.

Thanks for your review, Igor!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov Sept. 2, 2015, 11:31 a.m. UTC | #5
On Wed, 2 Sep 2015 18:36:43 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> > On Fri, 14 Aug 2015 22:51:59 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> Introduce "pc-nvdimm" device and it has two parameters:
> > Why do you use prefix "pc-", I suppose we potentially
> > could use this device not only with x86 targets but with
> > other targets as well.
> > I'd just drop 'pc' prefix through out patchset.
> 
> Yeah, the prefix is stolen from pc-dimm, will drop this
> prefix as your suggestion.
> 
> >
> >> - @file, which is the backed memory file for NVDIMM device
> > Could you try to split device into backend/frontend parts,
> > like it's done with pc-dimm. As I understand it's preferred
> > way to implement this kind of devices.
> > Then you could reuse memory backends that we already have
> > including file backend.
> 
> I considered it too and Stefan, Paolo got the some idea in
> V1's review, however:
> 
> | However, file-based memory used by NVDIMM is special, it divides the file
> | to two parts, one part is used as PMEM and another part is used to store
> | NVDIMM's configure data.
> |
> | Maybe we can introduce "end-reserved" property to reserve specified size
> | at the end of the file. Or create a new class type based on
> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> | thing?
I'd go with separate backend/frontend idea.

Question is if this config area is part backend or frontend?
If we pass-through NVDIMM device do we need to set configdata=true
and QEMU would skip building config structures and use structures
that are already present on passed-through device in that place?


> 
> Your idea?
[...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Sept. 6, 2015, 6:07 a.m. UTC | #6
On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> On Wed, 2 Sep 2015 18:36:43 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
>>> On Fri, 14 Aug 2015 22:51:59 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> Introduce "pc-nvdimm" device and it has two parameters:
>>> Why do you use prefix "pc-", I suppose we potentially
>>> could use this device not only with x86 targets but with
>>> other targets as well.
>>> I'd just drop 'pc' prefix through out patchset.
>>
>> Yeah, the prefix is stolen from pc-dimm, will drop this
>> prefix as your suggestion.
>>
>>>
>>>> - @file, which is the backed memory file for NVDIMM device
>>> Could you try to split device into backend/frontend parts,
>>> like it's done with pc-dimm. As I understand it's preferred
>>> way to implement this kind of devices.
>>> Then you could reuse memory backends that we already have
>>> including file backend.
>>
>> I considered it too and Stefan, Paolo got the some idea in
>> V1's review, however:
>>
>> | However, file-based memory used by NVDIMM is special, it divides the file
>> | to two parts, one part is used as PMEM and another part is used to store
>> | NVDIMM's configure data.
>> |
>> | Maybe we can introduce "end-reserved" property to reserve specified size
>> | at the end of the file. Or create a new class type based on
>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
>> | thing?
> I'd go with separate backend/frontend idea.
>
> Question is if this config area is part backend or frontend?

Configdata area is used to store nvdimm device's configuration, normally, it's
namespace info.

Currently, we chosen configdata located at the end of nvdimm's backend-memory
as it's easy to configure / use and configdata is naturally non-volatile and it
is like the layout on physical device.

However, using two separated backed-memory is okay, for example:
-object memory-backend-file,id=mem0,file=/storage/foo
-object memory-backend-file,id=mem1,file=/storage/bar
-device nvdimm,memdev=mem0,configdata=mem1
then configdata is written to a single backend.

Which one is better for you? :)

> If we pass-through NVDIMM device do we need to set configdata=true
> and QEMU would skip building config structures and use structures
> that are already present on passed-through device in that place?
>

The file specified by @file is something like a normal disk, like /dev/sda/,
host process can use whole space on it. If we want to directly pass it to guest,
we can specify 'configdata=false'. If we allow guest to 'partition' (create
namespace on) it then we use 'configdata=true' to reserve some space to store
its partition info (namesapce info).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov Sept. 7, 2015, 1:40 p.m. UTC | #7
On Sun, 6 Sep 2015 14:07:21 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> > On Wed, 2 Sep 2015 18:36:43 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> >>> On Fri, 14 Aug 2015 22:51:59 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>> Introduce "pc-nvdimm" device and it has two parameters:
> >>> Why do you use prefix "pc-", I suppose we potentially
> >>> could use this device not only with x86 targets but with
> >>> other targets as well.
> >>> I'd just drop 'pc' prefix through out patchset.
> >>
> >> Yeah, the prefix is stolen from pc-dimm, will drop this
> >> prefix as your suggestion.
> >>
> >>>
> >>>> - @file, which is the backed memory file for NVDIMM device
> >>> Could you try to split device into backend/frontend parts,
> >>> like it's done with pc-dimm. As I understand it's preferred
> >>> way to implement this kind of devices.
> >>> Then you could reuse memory backends that we already have
> >>> including file backend.
> >>
> >> I considered it too and Stefan, Paolo got the some idea in
> >> V1's review, however:
> >>
> >> | However, file-based memory used by NVDIMM is special, it divides the file
> >> | to two parts, one part is used as PMEM and another part is used to store
> >> | NVDIMM's configure data.
> >> |
> >> | Maybe we can introduce "end-reserved" property to reserve specified size
> >> | at the end of the file. Or create a new class type based on
> >> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> >> | thing?
> > I'd go with separate backend/frontend idea.
> >
> > Question is if this config area is part backend or frontend?
> 
> Configdata area is used to store nvdimm device's configuration, normally, it's
> namespace info.
> 
> Currently, we chosen configdata located at the end of nvdimm's backend-memory
> as it's easy to configure / use and configdata is naturally non-volatile and it
> is like the layout on physical device.
> 
> However, using two separated backed-memory is okay, for example:
> -object memory-backend-file,id=mem0,file=/storage/foo
> -object memory-backend-file,id=mem1,file=/storage/bar
> -device nvdimm,memdev=mem0,configdata=mem1
> then configdata is written to a single backend.
> 
> Which one is better for you? :)
> 
> > If we pass-through NVDIMM device do we need to set configdata=true
> > and QEMU would skip building config structures and use structures
> > that are already present on passed-through device in that place?
> >
> 
> The file specified by @file is something like a normal disk, like /dev/sda/,
> host process can use whole space on it. If we want to directly pass it to guest,
> we can specify 'configdata=false'. If we allow guest to 'partition' (create
> namespace on) it then we use 'configdata=true' to reserve some space to store
> its partition info (namesapce info).
As far as I understand currently linux provides to userspace only one interface
which is block device i.e. /dev/sdX and on top of it userspace can put
PM/DAX aware filesystem and use files from it. In either cases kernel
just provides access to separate namespaces and not to a whole NVDIMM which
includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
so we could consider it as just a file/storage that could be used by userspace.

Lets assume that NVDIMM should always have 'labels area'.
In that case I'd always reserve space for it and
 * format it (build a new one) if backend doesn't have a
   valid labels area dropping configdata parameter along the way
 * or if backing-file already has valid labels area I'd just use it.

If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
option and just use labels backend's without allowing changes writeback.
IT would be better to make it another series on top of basic NVDIMM implementation
if there is an actual usecase for it.

PS:
Also when you write commit messages, comment and name variables try to use terms from
relevant spec and mention specs where you describe data structures from them.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Sept. 8, 2015, 2:03 p.m. UTC | #8
On 09/07/2015 09:40 PM, Igor Mammedov wrote:
> On Sun, 6 Sep 2015 14:07:21 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
>>> On Wed, 2 Sep 2015 18:36:43 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
>>>>> On Fri, 14 Aug 2015 22:51:59 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>> Introduce "pc-nvdimm" device and it has two parameters:
>>>>> Why do you use prefix "pc-", I suppose we potentially
>>>>> could use this device not only with x86 targets but with
>>>>> other targets as well.
>>>>> I'd just drop 'pc' prefix through out patchset.
>>>>
>>>> Yeah, the prefix is stolen from pc-dimm, will drop this
>>>> prefix as your suggestion.
>>>>
>>>>>
>>>>>> - @file, which is the backed memory file for NVDIMM device
>>>>> Could you try to split device into backend/frontend parts,
>>>>> like it's done with pc-dimm. As I understand it's preferred
>>>>> way to implement this kind of devices.
>>>>> Then you could reuse memory backends that we already have
>>>>> including file backend.
>>>>
>>>> I considered it too and Stefan, Paolo got the some idea in
>>>> V1's review, however:
>>>>
>>>> | However, file-based memory used by NVDIMM is special, it divides the file
>>>> | to two parts, one part is used as PMEM and another part is used to store
>>>> | NVDIMM's configure data.
>>>> |
>>>> | Maybe we can introduce "end-reserved" property to reserve specified size
>>>> | at the end of the file. Or create a new class type based on
>>>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
>>>> | thing?
>>> I'd go with separate backend/frontend idea.
>>>
>>> Question is if this config area is part backend or frontend?
>>
>> Configdata area is used to store nvdimm device's configuration, normally, it's
>> namespace info.
>>
>> Currently, we chosen configdata located at the end of nvdimm's backend-memory
>> as it's easy to configure / use and configdata is naturally non-volatile and it
>> is like the layout on physical device.
>>
>> However, using two separated backed-memory is okay, for example:
>> -object memory-backend-file,id=mem0,file=/storage/foo
>> -object memory-backend-file,id=mem1,file=/storage/bar
>> -device nvdimm,memdev=mem0,configdata=mem1
>> then configdata is written to a single backend.
>>
>> Which one is better for you? :)
>>
>>> If we pass-through NVDIMM device do we need to set configdata=true
>>> and QEMU would skip building config structures and use structures
>>> that are already present on passed-through device in that place?
>>>
>>
>> The file specified by @file is something like a normal disk, like /dev/sda/,
>> host process can use whole space on it. If we want to directly pass it to guest,
>> we can specify 'configdata=false'. If we allow guest to 'partition' (create
>> namespace on) it then we use 'configdata=true' to reserve some space to store
>> its partition info (namesapce info).
> As far as I understand currently linux provides to userspace only one interface
> which is block device i.e. /dev/sdX and on top of it userspace can put
> PM/DAX aware filesystem and use files from it. In either cases kernel
> just provides access to separate namespaces and not to a whole NVDIMM which
> includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
> so we could consider it as just a file/storage that could be used by userspace.
>

Yes, it is.

> Lets assume that NVDIMM should always have 'labels area'.
> In that case I'd always reserve space for it and
>   * format it (build a new one) if backend doesn't have a
>     valid labels area dropping configdata parameter along the way
>   * or if backing-file already has valid labels area I'd just use it.

Yes.

>
> If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
> option and just use labels backend's without allowing changes writeback.
> IT would be better to make it another series on top of basic NVDIMM implementation
> if there is an actual usecase for it.

I'd prefer the way that discards not only its label data but also the whole nvdimm device,
that is, open(, RDONLY) + mmap(, MAP_PRIVATE), the idea was raised by Stefan.

The 'configdata = false' in this patchset does not aim at making label readonly, it provides
a way to make the file in a single partition. For example, you create a image in /dev/pmem0,
pass it to guest, then the whole file will appear at /dev/pmem0 in guest, and guest can directly
use the image in that device. Under this case, no file region is reserved and the label data is
build in memory which can not be updated by guest.

>
> PS:
> Also when you write commit messages, comment and name variables try to use terms from
> relevant spec and mention specs where you describe data structures from them.

Parts of the names/definitions were stolen from Kernel NVDIMM driver, i will update it to
let them reflect the specs. Thanks, Igor.






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov Sept. 10, 2015, 9:47 a.m. UTC | #9
On Tue, 8 Sep 2015 22:03:01 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 09/07/2015 09:40 PM, Igor Mammedov wrote:
> > On Sun, 6 Sep 2015 14:07:21 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 09/02/2015 07:31 PM, Igor Mammedov wrote:
> >>> On Wed, 2 Sep 2015 18:36:43 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 09/02/2015 05:58 PM, Igor Mammedov wrote:
> >>>>> On Fri, 14 Aug 2015 22:51:59 +0800
> >>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>>>
> >>>>>> Introduce "pc-nvdimm" device and it has two parameters:
> >>>>> Why do you use prefix "pc-", I suppose we potentially
> >>>>> could use this device not only with x86 targets but with
> >>>>> other targets as well.
> >>>>> I'd just drop 'pc' prefix through out patchset.
> >>>>
> >>>> Yeah, the prefix is stolen from pc-dimm, will drop this
> >>>> prefix as your suggestion.
> >>>>
> >>>>>
> >>>>>> - @file, which is the backed memory file for NVDIMM device
> >>>>> Could you try to split device into backend/frontend parts,
> >>>>> like it's done with pc-dimm. As I understand it's preferred
> >>>>> way to implement this kind of devices.
> >>>>> Then you could reuse memory backends that we already have
> >>>>> including file backend.
> >>>>
> >>>> I considered it too and Stefan, Paolo got the some idea in
> >>>> V1's review, however:
> >>>>
> >>>> | However, file-based memory used by NVDIMM is special, it divides the file
> >>>> | to two parts, one part is used as PMEM and another part is used to store
> >>>> | NVDIMM's configure data.
> >>>> |
> >>>> | Maybe we can introduce "end-reserved" property to reserve specified size
> >>>> | at the end of the file. Or create a new class type based on
> >>>> | memory-backend-file (named nvdimm-backend-file) class to hide this magic
> >>>> | thing?
> >>> I'd go with separate backend/frontend idea.
> >>>
> >>> Question is if this config area is part backend or frontend?
> >>
> >> Configdata area is used to store nvdimm device's configuration, normally, it's
> >> namespace info.
> >>
> >> Currently, we chosen configdata located at the end of nvdimm's backend-memory
> >> as it's easy to configure / use and configdata is naturally non-volatile and it
> >> is like the layout on physical device.
> >>
> >> However, using two separated backed-memory is okay, for example:
> >> -object memory-backend-file,id=mem0,file=/storage/foo
> >> -object memory-backend-file,id=mem1,file=/storage/bar
> >> -device nvdimm,memdev=mem0,configdata=mem1
> >> then configdata is written to a single backend.
> >>
> >> Which one is better for you? :)
> >>
> >>> If we pass-through NVDIMM device do we need to set configdata=true
> >>> and QEMU would skip building config structures and use structures
> >>> that are already present on passed-through device in that place?
> >>>
> >>
> >> The file specified by @file is something like a normal disk, like /dev/sda/,
> >> host process can use whole space on it. If we want to directly pass it to guest,
> >> we can specify 'configdata=false'. If we allow guest to 'partition' (create
> >> namespace on) it then we use 'configdata=true' to reserve some space to store
> >> its partition info (namesapce info).
> > As far as I understand currently linux provides to userspace only one interface
> > which is block device i.e. /dev/sdX and on top of it userspace can put
> > PM/DAX aware filesystem and use files from it. In either cases kernel
> > just provides access to separate namespaces and not to a whole NVDIMM which
> > includes 'labels area'. Hence /dev/sdX is not passed-though NVDIMM,
> > so we could consider it as just a file/storage that could be used by userspace.
> >
> 
> Yes, it is.
> 
> > Lets assume that NVDIMM should always have 'labels area'.
> > In that case I'd always reserve space for it and
> >   * format it (build a new one) if backend doesn't have a
> >     valid labels area dropping configdata parameter along the way
On a second glance, qemu probably shouldn't 'format/build' labels
in this case and only supply non configured NVDIMM to guest.
It probably should be guest responsibility to configure namespaces
on bare NVDIMM. Also it would allow to seamlessly introduce separate labels
backend as described below to cover 'configdata = false' case.

> >   * or if backing-file already has valid labels area I'd just use it.
> 
> Yes.
> 
> >
> > If you need to make labels area readonly you can introduce 'NVDIMM.readonly_labels'
> > option and just use labels backend's without allowing changes writeback.
> > IT would be better to make it another series on top of basic NVDIMM implementation
> > if there is an actual usecase for it.
> 
> I'd prefer the way that discards not only its label data but also the whole nvdimm device,
> that is, open(, RDONLY) + mmap(, MAP_PRIVATE), the idea was raised by Stefan.
Yes, we can do this but it's not related to NVDIMM at all, it's backend's feature
that also could be (re)used by pc-dimm with file backend.
I think it could be done on top.

> 
> The 'configdata = false' in this patchset does not aim at making label readonly, it provides
> a way to make the file in a single partition. For example, you create a image in /dev/pmem0,
> pass it to guest, then the whole file will appear at /dev/pmem0 in guest, and guest can directly
> use the image in that device. Under this case, no file region is reserved and the label data is
> build in memory which can not be updated by guest.
If we will have 'configdata = false' as in this patchset, one will have
to take special care to keep it bug compatible with previous versions
as implementation changes/fixes might affect compatibility.
I think it is simpler and more robust to always have metadata in backend.
That's also would be migration friendly since labels area will stay
consistent with data it describes, regardless of QEMU version when we
do cross version migration.

Later we could introduce a separate option for a separate labels area
to cover above described usecase.

> 
> >
> > PS:
> > Also when you write commit messages, comment and name variables try to use terms from
> > relevant spec and mention specs where you describe data structures from them.
> 
> Parts of the names/definitions were stolen from Kernel NVDIMM driver, i will update it to
> let them reflect the specs. Thanks, Igor.
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 48b5762..67fc3a8 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -49,3 +49,4 @@  CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 4962ed7..dfcde36 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@  CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
+CONFIG_NVDIMM=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 73afa41..1e25d3f 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,7 +30,7 @@  devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..4df7482 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@ 
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm/pc-nvdimm.o
diff --git a/hw/mem/nvdimm/pc-nvdimm.c b/hw/mem/nvdimm/pc-nvdimm.c
new file mode 100644
index 0000000..a53d235
--- /dev/null
+++ b/hw/mem/nvdimm/pc-nvdimm.c
@@ -0,0 +1,99 @@ 
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/pc-nvdimm.h"
+
+static char *get_file(Object *obj, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    return g_strdup(nvdimm->file);
+}
+
+static void set_file(Object *obj, const char *str, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    if (nvdimm->file) {
+        g_free(nvdimm->file);
+    }
+
+    nvdimm->file = g_strdup(str);
+}
+
+static bool has_configdata(Object *obj, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    return nvdimm->configdata;
+}
+
+static void set_configdata(Object *obj, bool value, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(obj);
+
+    nvdimm->configdata = value;
+}
+
+static void pc_nvdimm_init(Object *obj)
+{
+    object_property_add_str(obj, "file", get_file, set_file, NULL);
+    object_property_add_bool(obj, "configdata", has_configdata,
+                             set_configdata, NULL);
+}
+
+static void pc_nvdimm_realize(DeviceState *dev, Error **errp)
+{
+    PCNVDIMMDevice *nvdimm = PC_NVDIMM(dev);
+
+    if (!nvdimm->file) {
+        error_setg(errp, "file property is not set");
+    }
+}
+
+static void pc_nvdimm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    /* nvdimm hotplug has not been supported yet. */
+    dc->hotpluggable = false;
+
+    dc->realize = pc_nvdimm_realize;
+    dc->desc = "NVDIMM memory module";
+}
+
+static TypeInfo pc_nvdimm_info = {
+    .name          = TYPE_PC_NVDIMM,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(PCNVDIMMDevice),
+    .instance_init = pc_nvdimm_init,
+    .class_init    = pc_nvdimm_class_init,
+};
+
+static void pc_nvdimm_register_types(void)
+{
+    type_register_static(&pc_nvdimm_info);
+}
+
+type_init(pc_nvdimm_register_types)
diff --git a/include/hw/mem/pc-nvdimm.h b/include/hw/mem/pc-nvdimm.h
new file mode 100644
index 0000000..51152b8
--- /dev/null
+++ b/include/hw/mem/pc-nvdimm.h
@@ -0,0 +1,31 @@ 
+/*
+ * NVDIMM (A Non-Volatile Dual In-line Memory Module) Virtualization Implement
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef __PC_NVDIMM_H
+#define __PC_NVDIMM_H
+
+#include "hw/qdev.h"
+
+typedef struct PCNVDIMMDevice {
+    /* private */
+    DeviceState parent_obj;
+
+    char *file;
+    bool configdata;
+} PCNVDIMMDevice;
+
+#define TYPE_PC_NVDIMM "pc-nvdimm"
+
+#define PC_NVDIMM(obj) \
+    OBJECT_CHECK(PCNVDIMMDevice, (obj), TYPE_PC_NVDIMM)
+
+#endif