diff mbox

[RFC,v3,2/5] ndctl: monitor: add ndctl create-monitor command

Message ID 20180209080225.5137-3-qi.fuli@jp.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

QI Fuli Feb. 9, 2018, 8:02 a.m. UTC
This patch is used to add $ndctl create-monitor command, by which users can
create a new monitor. Users can select the DIMMS to be monitored by using
[--dimm] [--bus] [--region] [--namespace] options. The notifications can
be outputed to a special file or syslog by using [--output] option, the
special file will be placed under /var/log/ndctl. A name is also required for
a monitor,so users can destroy the monitor by the name. When a monitor is
created successfully, a file with same name will be created under
/var/ndctl/monitor.
Example: 
#ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output m_nmem.log

Signed-off-by: QI Fuli <qi.fuli@jp.fujitsu.com>
---
 builtin.h         |   1 +
 configure.ac      |   3 +
 ndctl/Makefile.am |   3 +-
 ndctl/monitor.c   | 342 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 ndctl/ndctl.c     |   1 +
 5 files changed, 349 insertions(+), 1 deletion(-)
 create mode 100644 ndctl/monitor.c

Comments

Dan Williams Feb. 11, 2018, 8:48 p.m. UTC | #1
On Fri, Feb 9, 2018 at 12:02 AM, QI Fuli <qi.fuli@jp.fujitsu.com> wrote:
> This patch is used to add $ndctl create-monitor command, by which users can
> create a new monitor. Users can select the DIMMS to be monitored by using
> [--dimm] [--bus] [--region] [--namespace] options. The notifications can
> be outputed to a special file or syslog by using [--output] option, the
> special file will be placed under /var/log/ndctl. A name is also required for
> a monitor,so users can destroy the monitor by the name. When a monitor is
> created successfully, a file with same name will be created under
> /var/ndctl/monitor.
> Example:
> #ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output m_nmem.log

Hi Qi,

This is getting closer to where I want to see this go, but still some
architecture details to incorporate. I mentioned on the cover letter
that systemd can handle starting, stopping, and show the status of the
monitor. The other detail to incorporate is that monitor events can
come DIMMs, but also namespaces, regions, and the bus.

The event list I have collected to date is:

dimm-spares-remaining
dimm-media-temperature
dimm-controller-temperature
dimm-health-state
dimm-unclean-shutdown
dimm-detected
namespace-media-error
namespace-detected
region-media-error
region-detected
bus-media-error
bus-address-range-scrub-complete
bus-detected

...and I think all of those should be separate options, probably
something like the following, but I'd Vishal to comment if this scheme
can be handled with the bash tab-completion implementation:

   ndctl monitor --dimm-events=spares-remaining,media-temperature
--namespace-events=all --regions-events --bus=ACPI.NFIT

...where an empty --<object>-events option is equivalent to
--<object>-events=all. Also, similar to "ndctl list" specifying
specific buses, namespaces, etc causes the monitor to filter the
objects based on those properties.

Since "ndctl list" already has this filtering implemented I'd like to
see it refactored and shared between the 2 implementations rather than
duplicated as is done in this patch. In other words rework cmd_list()
into a generic nvdimm object walking routine with callback functions
to 'list' or 'monitor' a given object that matches the filter.

Let me know if the above makes sense. I'm thinking the 'ndctl list'
refactoring might be something I need to handle.
Verma, Vishal L Feb. 13, 2018, 12:54 a.m. UTC | #2
On Sun, 2018-02-11 at 12:48 -0800, Dan Williams wrote:
> On Fri, Feb 9, 2018 at 12:02 AM, QI Fuli <qi.fuli@jp.fujitsu.com>
> wrote:
> > This patch is used to add $ndctl create-monitor command, by which
> > users can
> > create a new monitor. Users can select the DIMMS to be monitored by
> > using
> > [--dimm] [--bus] [--region] [--namespace] options. The
> > notifications can
> > be outputed to a special file or syslog by using [--output] option,
> > the
> > special file will be placed under /var/log/ndctl. A name is also
> > required for
> > a monitor,so users can destroy the monitor by the name. When a
> > monitor is
> > created successfully, a file with same name will be created under
> > /var/ndctl/monitor.
> > Example:
> > #ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output
> > m_nmem.log
> 
> Hi Qi,
> 
> This is getting closer to where I want to see this go, but still some
> architecture details to incorporate. I mentioned on the cover letter
> that systemd can handle starting, stopping, and show the status of
> the
> monitor. The other detail to incorporate is that monitor events can
> come DIMMs, but also namespaces, regions, and the bus.
> 
> The event list I have collected to date is:
> 
> dimm-spares-remaining
> dimm-media-temperature
> dimm-controller-temperature
> dimm-health-state
> dimm-unclean-shutdown
> dimm-detected
> namespace-media-error
> namespace-detected
> region-media-error
> region-detected
> bus-media-error
> bus-address-range-scrub-complete
> bus-detected
> 
> ...and I think all of those should be separate options, probably
> something like the following, but I'd Vishal to comment if this
> scheme
> can be handled with the bash tab-completion implementation:
> 
>    ndctl monitor --dimm-events=spares-remaining,media-temperature
> --namespace-events=all --regions-events --bus=ACPI.NFIT

Yes I think we should be able to extend bash completion for a comma
separated list of arguments.

> 
> ...where an empty --<object>-events option is equivalent to
> --<object>-events=all. Also, similar to "ndctl list" specifying
> specific buses, namespaces, etc causes the monitor to filter the
> objects based on those properties.
> 
> Since "ndctl list" already has this filtering implemented I'd like to
> see it refactored and shared between the 2 implementations rather
> than
> duplicated as is done in this patch. In other words rework cmd_list()
> into a generic nvdimm object walking routine with callback functions
> to 'list' or 'monitor' a given object that matches the filter.
> 
> Let me know if the above makes sense. I'm thinking the 'ndctl list'
> refactoring might be something I need to handle.
Gotou, Yasunori/五島 康文 Feb. 13, 2018, 9:58 a.m. UTC | #3
Hi,

> On Fri, Feb 9, 2018 at 12:02 AM, QI Fuli <qi.fuli@jp.fujitsu.com> wrote:
> > This patch is used to add $ndctl create-monitor command, by which users can
> > create a new monitor. Users can select the DIMMS to be monitored by using
> > [--dimm] [--bus] [--region] [--namespace] options. The notifications can
> > be outputed to a special file or syslog by using [--output] option, the
> > special file will be placed under /var/log/ndctl. A name is also required for
> > a monitor,so users can destroy the monitor by the name. When a monitor is
> > created successfully, a file with same name will be created under
> > /var/ndctl/monitor.
> > Example:
> > #ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output m_nmem.log
> 
> Hi Qi,
> 
> This is getting closer to where I want to see this go, but still some
> architecture details to incorporate. I mentioned on the cover letter
> that systemd can handle starting, stopping, and show the status of the
> monitor. The other detail to incorporate is that monitor events can
> come DIMMs, but also namespaces, regions, and the bus.
> 
> The event list I have collected to date is:
> 
> dimm-spares-remaining
> dimm-media-temperature
> dimm-controller-temperature
> dimm-health-state
> dimm-unclean-shutdown
> dimm-detected
> namespace-media-error
> namespace-detected
> region-media-error
> region-detected
> bus-media-error
> bus-address-range-scrub-complete
> bus-detected
> 
> ...and I think all of those should be separate options, probably
> something like the following, but I'd Vishal to comment if this scheme
> can be handled with the bash tab-completion implementation:
> 
>    ndctl monitor --dimm-events=spares-remaining,media-temperature
> --namespace-events=all --regions-events --bus=ACPI.NFIT
> 
> ...where an empty --<object>-events option is equivalent to
> --<object>-events=all. Also, similar to "ndctl list" specifying
> specific buses, namespaces, etc causes the monitor to filter the
> objects based on those properties.

Hmmmm....

Currently, I'm confusing what features/options are required for nvdimm daemon.
For example, what is use-case of "--bus=ACPI.NFIT"? 

For normal administorator of a server, what he/she's interest is 
"need to replace nvdimm module or not", and "need to backup/restore
on the nvdimm module or not". 

For normal programs, they just use device name or directory/filename of
the filesystem on the nvdimm.

To backup thier data, he/she need to solve relationship between
nvdimm modules and device name (/dev/pmem* or /dev/dax*).

So, IMHO, I suppose "namespace(device name) specifying (or all namespace)"
is enough the following events which requires replace the nvdimm module.

 - spare-remaining
 - helth-state
 - media-error

And I'm not sure what is use-case of specifying region, bus, and dimm
on these events.


In addition, could you tell me what administrator/program can do
on the following events? What nvdimm daemon should do on each event?

  - media-temperature
  - controller-temperature
  - address-range-scrub-complete
  - unclean-shutdown


Thanks,

> 
> Since "ndctl list" already has this filtering implemented I'd like to
> see it refactored and shared between the 2 implementations rather than
> duplicated as is done in this patch. In other words rework cmd_list()
> into a generic nvdimm object walking routine with callback functions
> to 'list' or 'monitor' a given object that matches the filter.
> 
> Let me know if the above makes sense. I'm thinking the 'ndctl list'
> refactoring might be something I need to handle.
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
Gotou, Yasunori/五島 康文 Feb. 13, 2018, 10:12 a.m. UTC | #4
> Hi,
> 
> > On Fri, Feb 9, 2018 at 12:02 AM, QI Fuli <qi.fuli@jp.fujitsu.com> wrote:
> > > This patch is used to add $ndctl create-monitor command, by which users can
> > > create a new monitor. Users can select the DIMMS to be monitored by using
> > > [--dimm] [--bus] [--region] [--namespace] options. The notifications can
> > > be outputed to a special file or syslog by using [--output] option, the
> > > special file will be placed under /var/log/ndctl. A name is also required for
> > > a monitor,so users can destroy the monitor by the name. When a monitor is
> > > created successfully, a file with same name will be created under
> > > /var/ndctl/monitor.
> > > Example:
> > > #ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output m_nmem.log
> > 
> > Hi Qi,
> > 
> > This is getting closer to where I want to see this go, but still some
> > architecture details to incorporate. I mentioned on the cover letter
> > that systemd can handle starting, stopping, and show the status of the
> > monitor. The other detail to incorporate is that monitor events can
> > come DIMMs, but also namespaces, regions, and the bus.
> > 
> > The event list I have collected to date is:
> > 
> > dimm-spares-remaining
> > dimm-media-temperature
> > dimm-controller-temperature
> > dimm-health-state
> > dimm-unclean-shutdown
> > dimm-detected
> > namespace-media-error
> > namespace-detected
> > region-media-error
> > region-detected
> > bus-media-error
> > bus-address-range-scrub-complete
> > bus-detected
> > 
> > ...and I think all of those should be separate options, probably
> > something like the following, but I'd Vishal to comment if this scheme
> > can be handled with the bash tab-completion implementation:
> > 
> >    ndctl monitor --dimm-events=spares-remaining,media-temperature
> > --namespace-events=all --regions-events --bus=ACPI.NFIT
> > 
> > ...where an empty --<object>-events option is equivalent to
> > --<object>-events=all. Also, similar to "ndctl list" specifying
> > specific buses, namespaces, etc causes the monitor to filter the
> > objects based on those properties.
> 
> Hmmmm....
> 
> Currently, I'm confusing what features/options are required for nvdimm daemon.
> For example, what is use-case of "--bus=ACPI.NFIT"? 
> 
> For normal administorator of a server, what he/she's interest is 
> "need to replace nvdimm module or not", and "need to backup/restore
> on the nvdimm module or not". 
> 
> For normal programs, they just use device name or directory/filename of
> the filesystem on the nvdimm.
> 
> To backup thier data, he/she need to solve relationship between
> nvdimm modules and device name (/dev/pmem* or /dev/dax*).
> 
> So, IMHO, I suppose "namespace(device name) specifying (or all namespace)"
> is enough the following events which requires replace the nvdimm module.
> 
>  - spare-remaining
>  - helth-state
>  - media-error
> 
> And I'm not sure what is use-case of specifying region, bus, and dimm
> on these events.
> 
> 
> In addition, could you tell me what administrator/program can do
> on the following events? What nvdimm daemon should do on each event?
> 
>   - media-temperature
>   - controller-temperature
>   - address-range-scrub-complete
>   - unclean-shutdown
> 

I would like to ask one more thing.

What should nvdimm daemon do on detected events of bus/region/namespace/dimm ?
IIRC, udev handles such hotplug events. 
What is relationship/roles between nvdimm daemon and udev?

Bye,

> 
> Thanks,
> 
> > 
> > Since "ndctl list" already has this filtering implemented I'd like to
> > see it refactored and shared between the 2 implementations rather than
> > duplicated as is done in this patch. In other words rework cmd_list()
> > into a generic nvdimm object walking routine with callback functions
> > to 'list' or 'monitor' a given object that matches the filter.
> > 
> > Let me know if the above makes sense. I'm thinking the 'ndctl list'
> > refactoring might be something I need to handle.
> > _______________________________________________
> > Linux-nvdimm mailing list
> > Linux-nvdimm@lists.01.org
> > https://lists.01.org/mailman/listinfo/linux-nvdimm
>
QI Fuli Feb. 15, 2018, 5:51 a.m. UTC | #5
>> Hi Qi,
>>
>> This is getting closer to where I want to see this go, but still some
>> architecture details to incorporate. I mentioned on the cover letter
>> that systemd can handle starting, stopping, and show the status of
>> the
>> monitor. The other detail to incorporate is that monitor events can
>> come DIMMs, but also namespaces, regions, and the bus.
>>
>> The event list I have collected to date is:
>>
>> dimm-spares-remaining
>> dimm-media-temperature
>> dimm-controller-temperature
>> dimm-health-state
>> dimm-unclean-shutdown
>> dimm-detected
>> namespace-media-error
>> namespace-detected
>> region-media-error
>> region-detected
>> bus-media-error
>> bus-address-range-scrub-complete
>> bus-detected
By referring to dimm_listen() in smart-listen.c, I understand how to 
monitor events
come DIMMs. I have checked the ndctl, but I could not find how to 
monitor events
come namespaces, region and bus. Could you please mention more?
>> ...and I think all of those should be separate options, probably
>> something like the following, but I'd Vishal to comment if this
>> scheme
>> can be handled with the bash tab-completion implementation:
>>
>>     ndctl monitor --dimm-events=spares-remaining,media-temperature
>> --namespace-events=all --regions-events --bus=ACPI.NFIT
> Yes I think we should be able to extend bash completion for a comma
> separated list of arguments.
>
>> ...where an empty --<object>-events option is equivalent to
>> --<object>-events=all. Also, similar to "ndctl list" specifying
>> specific buses, namespaces, etc causes the monitor to filter the
>> objects based on those properties.
>>
>> Since "ndctl list" already has this filtering implemented I'd like to
>> see it refactored and shared between the 2 implementations rather
>> than
>> duplicated as is done in this patch. In other words rework cmd_list()
>> into a generic nvdimm object walking routine with callback functions
>> to 'list' or 'monitor' a given object that matches the filter.
>>
>> Let me know if the above makes sense. I'm thinking the 'ndctl list'
>> refactoring might be something I need to handle.
Yes, I agree with refactoring ndctl_list and add a structure shared between
ndctl list and ndctl monitor.
But I prefer to use tokens rather than callback functions.

I also think the option should support multiple parameters likes:
  ndclt monitor --dimm nmem1,nmem2 --region region1,region2

The parameters from the same option can be stored in a list_node.

#define filter_node(field) \
struct filter_##field##_node { \
         const char *name; \
         struct list_node list; \
};

filter_node(bus);
filter_node(dimm);
filter_node(region);
filter_node(namespace);

struct ndctl_filter {
         struct list_head filter_bus;
         struct list_head filter_dimm;
         struct list_head filter_region;
         struct list_head filter_namespace;
} nf;

Make the tokens like follows:

#define list_util_field_filter(field) \
list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
         if (util_##field##_filter(field, filter_##field##_node->name)) { \
                 flag = true; \
                 break; \
         } \
}

#define list_util_bus_filter_by_field(field) \
list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
         if (util_bus_filter_by_##field(bus, 
filter_##field##_node->name)) { \
                 flag = true; \
                 break; \
         } \
}

#define list_util_dimm_filter_by_field(field) \
list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
         if (util_dimm_filter_by_##field(dimm, 
filter_##field##_node->name)) { \
                 flag = true; \
                 break; \
         } \
}

Here is how to use the tokens:

ndctl_bus_foreach(ctx, bus) {
         struct filter_bus_node *filter_bus_node;
         list_util_field_filter(bus);
         if (!flag) {
                 struct filter_region_node *filter_region_node;
                 list_util_bus_filter_by_field(region);
         }
         if (!flag) {
                 struct filter_namespace_node *filter_namespace_node;
                 list_util_bus_filter_by_field(namespace);
         }
         if (!flag)
                 continue;
         ndctl_dimm_foreach(bus, dimm) {
         flag = false;
         struct filter_dimm_node *filter_dimm_node;
                 list_util_field_filter(dimm);
                 if (!flag) {
                         struct filter_region_node *filter_region_node;
                         list_util_dimm_filter_by_field(region);
                 }
                 if (!flag) {
                         struct filter_namespace_node 
*filter_namespace_node;
                         list_util_dimm_filter_by_field(namespace);
                 }
                 if (!flag)
                         continue;
     }
}

To make sure the util_ndctl_filter() goes right, I would like you give 
some comments?
Dan Williams Feb. 17, 2018, 12:54 a.m. UTC | #6
On Tue, Feb 13, 2018 at 1:58 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> Hi,
>
>> On Fri, Feb 9, 2018 at 12:02 AM, QI Fuli <qi.fuli@jp.fujitsu.com> wrote:
>> > This patch is used to add $ndctl create-monitor command, by which users can
>> > create a new monitor. Users can select the DIMMS to be monitored by using
>> > [--dimm] [--bus] [--region] [--namespace] options. The notifications can
>> > be outputed to a special file or syslog by using [--output] option, the
>> > special file will be placed under /var/log/ndctl. A name is also required for
>> > a monitor,so users can destroy the monitor by the name. When a monitor is
>> > created successfully, a file with same name will be created under
>> > /var/ndctl/monitor.
>> > Example:
>> > #ndctl create-monitor --monitor m_nmem1 --dimm nmem1 --output m_nmem.log
>>
>> Hi Qi,
>>
>> This is getting closer to where I want to see this go, but still some
>> architecture details to incorporate. I mentioned on the cover letter
>> that systemd can handle starting, stopping, and show the status of the
>> monitor. The other detail to incorporate is that monitor events can
>> come DIMMs, but also namespaces, regions, and the bus.
>>
>> The event list I have collected to date is:
>>
>> dimm-spares-remaining
>> dimm-media-temperature
>> dimm-controller-temperature
>> dimm-health-state
>> dimm-unclean-shutdown
>> dimm-detected
>> namespace-media-error
>> namespace-detected
>> region-media-error
>> region-detected
>> bus-media-error
>> bus-address-range-scrub-complete
>> bus-detected
>>
>> ...and I think all of those should be separate options, probably
>> something like the following, but I'd Vishal to comment if this scheme
>> can be handled with the bash tab-completion implementation:
>>
>>    ndctl monitor --dimm-events=spares-remaining,media-temperature
>> --namespace-events=all --regions-events --bus=ACPI.NFIT
>>
>> ...where an empty --<object>-events option is equivalent to
>> --<object>-events=all. Also, similar to "ndctl list" specifying
>> specific buses, namespaces, etc causes the monitor to filter the
>> objects based on those properties.
>
> Hmmmm....
>
> Currently, I'm confusing what features/options are required for nvdimm daemon.
> For example, what is use-case of "--bus=ACPI.NFIT"?

Other platforms may support different bus types, there are also
proposals like this one for custom NVDIMM buses [1]. The other use
case is allowing the user to monitor for any media error on the bus,
or the completion of ARS.

> For normal administorator of a server, what he/she's interest is
> "need to replace nvdimm module or not", and "need to backup/restore
> on the nvdimm module or not".

I think that's only part of it. Data center operations want to know
more than just when it is time to replace a module, they want to
collect almost any data that the operating system can provide about
the platform.

> For normal programs, they just use device name or directory/filename of
> the filesystem on the nvdimm.
>
> To backup thier data, he/she need to solve relationship between
> nvdimm modules and device name (/dev/pmem* or /dev/dax*).
>
> So, IMHO, I suppose "namespace(device name) specifying (or all namespace)"
> is enough the following events which requires replace the nvdimm module.
>
>  - spare-remaining
>  - helth-state
>  - media-error
>
> And I'm not sure what is use-case of specifying region, bus, and dimm
> on these events.

The reason for supporting those other event sources in my mind is
having a unified interface for tracking topology, health/status, and
error events.

>
> In addition, could you tell me what administrator/program can do
> on the following events? What nvdimm daemon should do on each event?
>
>   - media-temperature
>   - controller-temperature
>   - address-range-scrub-complete
>   - unclean-shutdown

Media temperature and controller temperature alarms can signal to data
center operations that the server is getting too hot, and might need
remediation, perhaps a specific fan has failed and replacing that fan
becomes a high priority when these alarms start firing.

Address-range-scrub complete might be a signal that the server may get
a boost in throughput since the overhead of the background operation
is now complete. ARS may continue to run long after the server has
booted, so the end of that event may be important to server loading
decisions.

Unclean shutdown notification allows events that occurred at the last
shutdown to be recorded at the next boot. Otherwise an operator would
need to write a separate tool to go retrieve this information. Having
it all in one place reduces the number of tools / data-sources that
operations infrastructure needs to consider.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-January/013926.html
Dan Williams Feb. 17, 2018, 1 a.m. UTC | #7
On Tue, Feb 13, 2018 at 2:12 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
>> Hi,
[..]
> I would like to ask one more thing.
>
> What should nvdimm daemon do on detected events of bus/region/namespace/dimm ?
> IIRC, udev handles such hotplug events.
> What is relationship/roles between nvdimm daemon and udev?

UDEV handles the hotplug events, but I imagine device topology changes
are of interest to whatever agent is going to be consuming monitor
events, or monitoring changes that an administrator makes to a
machine. My inspiration for including these events in the monitor
comes from the monitor functionality in mdadm. It has events like:

           DeviceDisappeared
           RebuildStarted
           RebuildNN
           RebuildFinished
           Fail
           FailSpare
           SpareActive
           NewArray
           DegradedArray
           MoveSpare
           SparesMissing

To be clear, I'm not suggesting that all of these events need to be
supported in the first implementation of the monitor, but I want the
architecture and the user interface to be ready to support events from
any device in the topology.
Dan Williams Feb. 17, 2018, 1:23 a.m. UTC | #8
On Wed, Feb 14, 2018 at 9:51 PM, Qi, Fuli <qi.fuli@jp.fujitsu.com> wrote:
>
>>> Hi Qi,
>>>
>>> This is getting closer to where I want to see this go, but still some
>>> architecture details to incorporate. I mentioned on the cover letter
>>> that systemd can handle starting, stopping, and show the status of
>>> the
>>> monitor. The other detail to incorporate is that monitor events can
>>> come DIMMs, but also namespaces, regions, and the bus.
>>>
>>> The event list I have collected to date is:
>>>
>>> dimm-spares-remaining
>>> dimm-media-temperature
>>> dimm-controller-temperature
>>> dimm-health-state
>>> dimm-unclean-shutdown
>>> dimm-detected
>>> namespace-media-error
>>> namespace-detected
>>> region-media-error
>>> region-detected
>>> bus-media-error
>>> bus-address-range-scrub-complete
>>> bus-detected
>
> By referring to dimm_listen() in smart-listen.c, I understand how to monitor
> events
> come DIMMs. I have checked the ndctl, but I could not find how to monitor
> events
> come namespaces, region and bus. Could you please mention more?
>>>
>>> ...and I think all of those should be separate options, probably
>>> something like the following, but I'd Vishal to comment if this
>>> scheme
>>> can be handled with the bash tab-completion implementation:
>>>
>>>     ndctl monitor --dimm-events=spares-remaining,media-temperature
>>> --namespace-events=all --regions-events --bus=ACPI.NFIT
>>
>> Yes I think we should be able to extend bash completion for a comma
>> separated list of arguments.
>>
>>> ...where an empty --<object>-events option is equivalent to
>>> --<object>-events=all. Also, similar to "ndctl list" specifying
>>> specific buses, namespaces, etc causes the monitor to filter the
>>> objects based on those properties.
>>>
>>> Since "ndctl list" already has this filtering implemented I'd like to
>>> see it refactored and shared between the 2 implementations rather
>>> than
>>> duplicated as is done in this patch. In other words rework cmd_list()
>>> into a generic nvdimm object walking routine with callback functions
>>> to 'list' or 'monitor' a given object that matches the filter.
>>>
>>> Let me know if the above makes sense. I'm thinking the 'ndctl list'
>>> refactoring might be something I need to handle.
>
> Yes, I agree with refactoring ndctl_list and add a structure shared between
> ndctl list and ndctl monitor.
> But I prefer to use tokens rather than callback functions.
>
> I also think the option should support multiple parameters likes:
>  ndclt monitor --dimm nmem1,nmem2 --region region1,region2
>
> The parameters from the same option can be stored in a list_node.
>
> #define filter_node(field) \
> struct filter_##field##_node { \
>         const char *name; \
>         struct list_node list; \
> };
>
> filter_node(bus);
> filter_node(dimm);
> filter_node(region);
> filter_node(namespace);
>
> struct ndctl_filter {
>         struct list_head filter_bus;
>         struct list_head filter_dimm;
>         struct list_head filter_region;
>         struct list_head filter_namespace;
> } nf;
>
> Make the tokens like follows:
>
> #define list_util_field_filter(field) \
> list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
>         if (util_##field##_filter(field, filter_##field##_node->name)) { \
>                 flag = true; \
>                 break; \
>         } \
> }
>
> #define list_util_bus_filter_by_field(field) \
> list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
>         if (util_bus_filter_by_##field(bus, filter_##field##_node->name)) {
> \
>                 flag = true; \
>                 break; \
>         } \
> }
>
> #define list_util_dimm_filter_by_field(field) \
> list_for_each(&nf.filter_##field, filter_##field##_node, list) { \
>         if (util_dimm_filter_by_##field(dimm, filter_##field##_node->name))
> { \
>                 flag = true; \
>                 break; \
>         } \
> }
>
> Here is how to use the tokens:
>
> ndctl_bus_foreach(ctx, bus) {
>         struct filter_bus_node *filter_bus_node;
>         list_util_field_filter(bus);
>         if (!flag) {
>                 struct filter_region_node *filter_region_node;
>                 list_util_bus_filter_by_field(region);
>         }
>         if (!flag) {
>                 struct filter_namespace_node *filter_namespace_node;
>                 list_util_bus_filter_by_field(namespace);
>         }
>         if (!flag)
>                 continue;
>         ndctl_dimm_foreach(bus, dimm) {
>         flag = false;
>         struct filter_dimm_node *filter_dimm_node;
>                 list_util_field_filter(dimm);
>                 if (!flag) {
>                         struct filter_region_node *filter_region_node;
>                         list_util_dimm_filter_by_field(region);
>                 }
>                 if (!flag) {
>                         struct filter_namespace_node *filter_namespace_node;
>                         list_util_dimm_filter_by_field(namespace);
>                 }
>                 if (!flag)
>                         continue;
>     }
> }
>
> To make sure the util_ndctl_filter() goes right, I would like you give some
> comments?

I think the multi-layer macros makes the code difficult to read, and
I'm not seeing how it handles the case where we're building up a
combined json record representing the entire bus, i.e. "ndctl list
-BRND". I'd rather re-write "ndctl list" in terms of this filter
structure:

struct ndctl_filter_ops {
        int (*match_bus)(struct ndctl_bus *bus, void **ctx, unsigned
long flags);
        int (*match_dimm)(struct ndctl_bus *dimm, void *ctx, unsigned
long flags);
        int (*match_region)(struct ndctl_bus *region, void **ctx,
unsigned long flags);
        int (*match_namespace)(struct ndctl_bus *namespace, void *ctx,
unsigned long flags);
};

...where the match routine is only populated if that device is
selected and it passes filter specific context via the "ctx" variable.
In the case of "ndctl list" that might be the "jbus" container json
object being returned from ->match_bus() and then passed to
->match_dimm() to hold "jdimm" instances. Note that ->match_dimm() and
->match_namespace() only take context as input because those objects
don't have any children.
Gotou, Yasunori/五島 康文 Feb. 19, 2018, 2:36 a.m. UTC | #9
> On Tue, Feb 13, 2018 at 2:12 AM, Yasunori Goto <y-goto@jp.fujitsu.com> wrote:
> >> Hi,
> [..]
> > I would like to ask one more thing.
> >
> > What should nvdimm daemon do on detected events of bus/region/namespace/dimm ?
> > IIRC, udev handles such hotplug events.
> > What is relationship/roles between nvdimm daemon and udev?
> 
> UDEV handles the hotplug events, but I imagine device topology changes
> are of interest to whatever agent is going to be consuming monitor
> events, or monitoring changes that an administrator makes to a
> machine. My inspiration for including these events in the monitor
> comes from the monitor functionality in mdadm. It has events like:
> 
>            DeviceDisappeared
>            RebuildStarted
>            RebuildNN
>            RebuildFinished
>            Fail
>            FailSpare
>            SpareActive
>            NewArray
>            DegradedArray
>            MoveSpare
>            SparesMissing
> 
> To be clear, I'm not suggesting that all of these events need to be
> supported in the first implementation of the monitor, but I want the
> architecture and the user interface to be ready to support events from
> any device in the topology.
> 

Ok, I got it.
Thank you for your clarification.

Bye,
---
Yasunori Goto
diff mbox

Patch

diff --git a/builtin.h b/builtin.h
index 5e1b7ef..850f6a8 100644
--- a/builtin.h
+++ b/builtin.h
@@ -36,6 +36,7 @@  int cmd_write_labels(int argc, const char **argv, void *ctx);
 int cmd_init_labels(int argc, const char **argv, void *ctx);
 int cmd_check_labels(int argc, const char **argv, void *ctx);
 int cmd_inject_error(int argc, const char **argv, void *ctx);
+int cmd_create_monitor(int argc, const char **argv, void *ctx);
 int cmd_list(int argc, const char **argv, void *ctx);
 #ifdef ENABLE_TEST
 int cmd_test(int argc, const char **argv, void *ctx);
diff --git a/configure.ac b/configure.ac
index 70ba360..e859e04 100644
--- a/configure.ac
+++ b/configure.ac
@@ -160,6 +160,9 @@  AC_CONFIG_FILES([
         Documentation/daxctl/Makefile
 ])
 
+AC_CONFIG_COMMANDS([monitorlogdir], [$MKDIR_P /var/log/ndctl])
+AC_CONFIG_COMMANDS([monitorprocdir], [$MKDIR_P /var/ndctl/monitor])
+
 AC_OUTPUT
 AC_MSG_RESULT([
         $PACKAGE $VERSION
diff --git a/ndctl/Makefile.am b/ndctl/Makefile.am
index 6677607..d9a484d 100644
--- a/ndctl/Makefile.am
+++ b/ndctl/Makefile.am
@@ -13,7 +13,8 @@  ndctl_SOURCES = ndctl.c \
 		test.c \
 		../util/json.c \
 		util/json-smart.c \
-		inject-error.c
+		inject-error.c \
+		monitor.c
 
 if ENABLE_DESTRUCTIVE
 ndctl_SOURCES += ../test/blk_namespaces.c \
diff --git a/ndctl/monitor.c b/ndctl/monitor.c
new file mode 100644
index 0000000..cf1cd6e
--- /dev/null
+++ b/ndctl/monitor.c
@@ -0,0 +1,342 @@ 
+/*
+ * Copyright (c) 2018, FUJITSU LIMITED. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU Lesser General Public License,
+ * version 2.1, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT ANY
+ * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for
+ * more details.
+ */
+#include <stdio.h>
+#include <json-c/json.h>
+#include <signal.h>
+#include <dirent.h>
+#include <util/parse-options.h>
+#include <util/log.h>
+#include <util/filter.h>
+#include <util/json.h>
+#include <ndctl/lib/private.h>
+#include <ndctl/libndctl.h>
+#include <sys/stat.h>
+#define NUM_MAX_DIMM 1024
+#define BUF_SIZE 4096
+
+struct monitor_dimm {
+	struct ndctl_dimm *dimm;
+	const char *devname;
+	int health_eventfd;
+};
+
+static struct parameter {
+	const char *bus;
+	const char *region;
+	const char *dimm;
+	const char *namespace;
+	const char *event;
+	const char *output;
+	const char *monitor;
+	bool all;
+} param;
+
+static const char *proc_path = "/var/ndctl/monitor/";
+
+static char *get_full_path_filename(const char *path, const char *name)
+{
+	char *filename;
+	int len = strlen(path) + strlen (name) +1;
+	filename = (char *) malloc(len);
+	if (!filename)
+		return NULL;
+	filename[0] = '\0';
+	strcpy(filename, path);
+	strcat(filename, name);
+	return filename;
+}
+
+static void log_syslog(struct ndctl_ctx *ctx, int priority, const char *file,
+	int line, const char *fn, const char *format, va_list args)
+{
+	char *buf;
+	buf = (char *)malloc(BUF_SIZE);
+	if (!buf) {
+		syslog(LOG_ERR, "could not get memory for log_syslog\n");
+		exit(EXIT_FAILURE);
+	}
+	vsnprintf(buf, BUF_SIZE, format, args);
+	syslog(priority, "%s", buf);
+	free(buf);
+}
+
+static void log_output(struct ndctl_ctx *ctx, int priority, const char *file,
+	int line, const char *fn, const char *format, va_list args)
+{
+	FILE *f;
+	char *filename, *buf;
+	const char *log_path = "/var/log/ndctl/";
+	filename = get_full_path_filename(log_path, param.output);
+
+	f = fopen(filename, "a+");
+	if (!f) {
+		syslog(LOG_ERR, "open %s failed\n", filename);
+		exit(EXIT_FAILURE);
+	}
+
+	buf = (char *)malloc(BUF_SIZE);
+	if (!buf) {
+		syslog(LOG_ERR, "could not get memory for log_output\n");
+		exit(EXIT_FAILURE);
+	}
+	vsnprintf(buf, BUF_SIZE, format, args);
+	fprintf(f, "%s\n", buf);
+	free(buf);
+	fclose(f);
+}
+
+static int notify_json_msg(struct ndctl_ctx *ctx, struct monitor_dimm *m_dimm)
+{
+	time_t c_time;
+	char date[32];
+	struct json_object *jmsg, *jdatetime, *jpid, *jdimm, *jhealth;
+
+	jmsg = json_object_new_object();
+
+	c_time = time(NULL);
+	strftime(date, sizeof(date), "%Y-%m-%d %H:%M:%S", localtime(&c_time));
+	jdatetime = json_object_new_string(date);
+	json_object_object_add(jmsg, "datetime", jdatetime);
+
+	jpid = json_object_new_int((int)getpid());
+	json_object_object_add(jmsg, "pid", jpid);
+
+	jdimm = util_dimm_to_json(m_dimm->dimm, 0);
+	json_object_object_add(jmsg, "dimm", jdimm);
+
+	jhealth = util_dimm_health_to_json(m_dimm->dimm);
+	if (jhealth)
+		json_object_object_add(jdimm, "health", jhealth);
+
+	notice(ctx, "%s",
+		json_object_to_json_string_ext(jmsg, JSON_C_TO_STRING_PLAIN));
+	return 0;
+}
+
+#define add_param_to_json(field) \
+if (param.field) { \
+	j##field = json_object_new_string(param.field); \
+	json_object_object_add(jmonitors, #field, j##field); \
+}
+
+static int create_monitor_proc(struct ndctl_ctx *ctx)
+{
+	time_t c_time;
+	char date[32];
+	char *filename;
+	struct json_object *jmonitors, *jmonitor, *jpid, *jcreate, *jbus,
+		*jdimm, *jregion, *jnamespace, *joutput, *jevent;
+	jmonitors = json_object_new_object();
+	add_param_to_json(monitor);
+	c_time = time(NULL);
+	strftime(date, sizeof(date), "%Y-%m-%d %H:%M:%S", localtime(&c_time));
+	jcreate = json_object_new_string(date);
+	json_object_object_add(jmonitors, "create", jcreate);
+	jpid = json_object_new_int((int)getpid());
+	json_object_object_add(jmonitors, "pid", jpid);
+	add_param_to_json(dimm);
+	add_param_to_json(bus);
+	add_param_to_json(region);
+	add_param_to_json(namespace);
+	add_param_to_json(event);
+	add_param_to_json(output);
+
+	filename = get_full_path_filename(proc_path, param.monitor);
+	return json_object_to_file(filename, jmonitors);
+}
+
+static int destroy_monitor_proc(void)
+{
+	char *filename;
+	filename = get_full_path_filename(proc_path, param.monitor);
+	return remove(filename);
+}
+
+static int set_monitor_dimm(struct ndctl_ctx *ctx, fd_set *fds, int *maxfd,
+				struct monitor_dimm *m_dimm)
+{
+	struct ndctl_bus *bus;
+	int fd, num_dimm = 0;
+	char buf[BUF_SIZE];
+
+	ndctl_bus_foreach(ctx, bus) {
+		struct ndctl_dimm *dimm;
+		if(!util_bus_filter(bus, param.bus)
+			|| !util_bus_filter_by_dimm(bus, param.dimm)
+			|| !util_bus_filter_by_region(bus, param.region)
+			|| !util_bus_filter_by_namespace(bus, param.namespace))
+			continue;
+		ndctl_dimm_foreach(bus, dimm) {
+			if(!util_dimm_filter(dimm, param.dimm)
+				|| !util_dimm_filter_by_region(dimm,
+						param.region)
+				|| !util_dimm_filter_by_namespace(dimm,
+						param.namespace))
+				continue;
+			if (!ndctl_dimm_is_cmd_supported(dimm,
+					ND_CMD_SMART_THRESHOLD))
+				continue;
+			m_dimm[num_dimm].dimm = dimm;
+			m_dimm[num_dimm].devname = ndctl_dimm_get_devname(dimm);
+			fd = ndctl_dimm_get_health_eventfd(dimm);
+			pread(fd, buf, sizeof(buf), 0);
+			m_dimm[num_dimm].health_eventfd = fd;
+
+			if (fds)
+				FD_SET(fd, fds);
+			if (maxfd) {
+				if (*maxfd < fd)
+					*maxfd = fd;
+			}
+			num_dimm++;
+		}
+	}
+	return num_dimm;
+}
+
+static bool check_monitor_exist(void)
+{
+	FILE *f;
+	char *filename;
+	filename = get_full_path_filename(proc_path, param.monitor);
+	f = fopen(filename, "r");
+	if (!f)
+		return false;
+	fclose(f);
+	return true;
+}
+static int monitor_dimm_event(struct ndctl_ctx *ctx)
+{
+	int rc, maxfd, num_dimm;
+	struct monitor_dimm *m_dimm;
+	char buf[BUF_SIZE];
+	m_dimm = calloc(NUM_MAX_DIMM, sizeof(struct monitor_dimm));
+	if (!m_dimm) {
+		error("monitor_dimm memory space cannot be allocated\n");
+		goto out;
+	}
+
+	fd_set fds;
+	FD_ZERO(&fds);
+
+	num_dimm = set_monitor_dimm(ctx, &fds, &maxfd, m_dimm);
+	if (num_dimm == 0) {
+		error("no monitor dimms can be found\n");
+		goto out;
+	}
+
+
+	if (daemon(0, 0) != 0) {
+		err(ctx, "daemon start failed\n");
+		goto out;
+	}
+
+	printf("ndctl create-monitor %s started\n", param.monitor);
+	if (create_monitor_proc(ctx) != 0) {
+		err(ctx, "daemon start failed\n");
+		goto out;
+	}
+
+	while(1){
+		rc = select(maxfd + 1, NULL, NULL, &fds, NULL);
+		if (rc < 1) {
+			if (rc == 0)
+				err(ctx, "select unexpected timeout\n");
+			else
+				err(ctx, "select %s\n", strerror(errno));
+			goto out_clean;
+		}
+		for (int i = 0; i < num_dimm; i++) {
+			if (!FD_ISSET(m_dimm[i].health_eventfd, &fds)){
+				FD_SET(m_dimm[i].health_eventfd, &fds);
+				continue;
+			}
+			if (notify_json_msg(ctx, &m_dimm[i]) != 0)
+				goto out_clean;
+			pread(m_dimm[i].health_eventfd, buf, sizeof(buf), 0);
+		}
+	}
+	free(m_dimm);
+	return 0;
+out:
+	printf("ndctl create-monitor %s failed\n", param.monitor);
+	return 1;
+
+out_clean:
+	err(ctx, "ndctl monitor %s stoped\n", param.monitor);
+	if (destroy_monitor_proc() != 0)
+		err(ctx, "failed to clean temp file");
+	return 1;
+
+}
+
+int cmd_create_monitor(int argc, const char **argv, void *ctx)
+{
+	const struct option options[] = {
+		OPT_STRING('b', "bus", &param.bus, "bus-id", "filter by bus"),
+		OPT_STRING('r', "region", &param.region, "region-id",
+				"filter by region"),
+		OPT_STRING('d', "dimm", &param.dimm, "dimm-id",
+				"filter by dimm"),
+		OPT_STRING('n', "namespace", &param.namespace,
+				"namespace-id", "filter by namespace id"),
+		OPT_STRING('e', "event", &param.event, "event-id",
+				"filter by event"),
+		OPT_STRING('o', "output", &param.output, "output file name",
+				"monitor output file name"),
+		OPT_STRING('m', "monitor", &param.monitor, "monitor name",
+				"monitor name")
+	};
+	const char * const u[] = {
+		"ndctl create-monitor <monitor> <output> [<options>]",
+		NULL
+	};
+	argc = parse_options(argc, argv, options, u, 0);
+	for (int i = 0; i < argc; i++) {
+		error("unknown parameter \"%s\"\n", argv[i]);
+		goto out;
+	}
+	if (!param.monitor) {
+		error("monitor name --monitor is required\n");
+		goto out;
+	}
+	if (check_monitor_exist()) {
+		error("monitor %s is exist\n", param.monitor);
+		goto out;
+	}
+	if (!param.output) {
+		error("output file name --output is required\n");
+		goto out;
+	}
+
+	ndctl_set_log_priority((struct ndctl_ctx*)ctx, LOG_NOTICE);
+
+	if (strcmp(param.output, "syslog") == 0)
+		ndctl_set_log_fn((struct ndctl_ctx*)ctx, log_syslog);
+	else
+		ndctl_set_log_fn((struct ndctl_ctx*)ctx, log_output);
+
+	if (monitor_dimm_event((struct ndctl_ctx*)ctx) != 0)
+		goto out;
+
+	char *filename;
+	filename = get_full_path_filename(proc_path, param.monitor);
+	if (remove(filename) != 0) {
+		err((struct ndctl_ctx*)ctx, "failed to delete %s\n", filename);
+		goto out;
+	}
+	return 0;
+
+out:
+	return 1;
+}
diff --git a/ndctl/ndctl.c b/ndctl/ndctl.c
index 0f748e1..6c63d79 100644
--- a/ndctl/ndctl.c
+++ b/ndctl/ndctl.c
@@ -84,6 +84,7 @@  static struct cmd_struct commands[] = {
 	{ "init-labels", cmd_init_labels },
 	{ "check-labels", cmd_check_labels },
 	{ "inject-error", cmd_inject_error },
+	{ "create-monitor", cmd_create_monitor },
 	{ "list", cmd_list },
 	{ "help", cmd_help },
 	#ifdef ENABLE_TEST