diff mbox series

[v4,3/7] regulator: IRQ based event/error notification helpers

Message ID 2b87b4637fde2225006cc122bc855efca0dcd7f1.1617692184.git.matti.vaittinen@fi.rohmeurope.com (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show
Series Extend regulator notification support | expand

Commit Message

Vaittinen, Matti April 6, 2021, 7:13 a.m. UTC
Provide helper function for IC's implementing regulator notifications
when an IRQ fires. The helper also works for IRQs which can not be acked.
Helper can be set to disable the IRQ at handler and then re-enabling it
on delayed work later. The helper also adds regulator_get_error_flags()
errors in cache for the duration of IRQ disabling.

Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
---
Changes since v3:
 - Comment block styling
 - Added prints to point the potential HW failure before BUG()
 - Corrected typo from kerneldoc
 - added missing newlines

 drivers/regulator/Makefile       |   2 +-
 drivers/regulator/core.c         |  24 +-
 drivers/regulator/irq_helpers.c  | 432 +++++++++++++++++++++++++++++++
 include/linux/regulator/driver.h | 135 ++++++++++
 4 files changed, 589 insertions(+), 4 deletions(-)
 create mode 100644 drivers/regulator/irq_helpers.c

Comments

Vaittinen, Matti April 7, 2021, 5:02 a.m. UTC | #1
Morning Andy,

Thanks for the review! By the way, is it me or did your mail-client
spill this out using HTML?

On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> On Tuesday, April 6, 2021, Matti Vaittinen <
> matti.vaittinen@fi.rohmeurope.com> wrote:

> > +static void die_loudly(const char *msg)
> > +{
> > +       pr_emerg(msg);
> 
> Oh là là, besides build bot complaints, this has serious security
> implications. Never do like this.
 
I'm not even trying to claim that was correct. And I did send a fixup -
sorry for this. I don't intend to do this again.

Now, when this is said - If you have a minute, please educate me.
Assuming we know all the callers and that all the callers use this as

die_loudly("foobarfoo\n");
- what is the exploit mechanism?

> > +       BUG();
> > +}
> > +


> > +/**
> > + * regulator_irq_helper - register IRQ based regulator event/error
> > notifier
> > + *
> > + * @dev:               device to which lifetime the helper's
> > lifetime is
> > + *                     bound.
> > + * @d:                 IRQ helper descriptor.
> > + * @irq:               IRQ used to inform events/errors to be
> > notified.
> > + * @irq_flags:         Extra IRQ flags to be OR's with the default
> > IRQF_ONESHOT
> > + *                     when requesting the (threaded) irq.
> > + * @common_errs:       Errors which can be flagged by this IRQ for
> > all rdevs.
> > + *                     When IRQ is re-enabled these errors will be
> > cleared
> > + *                     from all associated regulators
> > + * @per_rdev_errs:     Optional error flag array describing errors
> > specific
> > + *                     for only some of the regulators. These
> > errors will be
> > + *                     or'ed with common erros. If this is given
> > the array
> > + *                     should contain rdev_amount flags. Can be
> > set to NULL
> > + *                     if there is no regulator specific error
> > flags for this
> > + *                     IRQ.
> > + * @rdev:              Array of regulators associated with this
> > IRQ.
> > + * @rdev_amount:       Amount of regulators associated wit this
> > IRQ.
> > + */
> > +void *regulator_irq_helper(struct device *dev,
> > +                           const struct regulator_irq_desc *d, int
> > irq,
> > +                           int irq_flags, int common_errs, int
> > *per_rdev_errs,
> > +                           struct regulator_dev **rdev, int
> > rdev_amount)
> > +{
> > +       struct regulator_irq *h;
> > +       int ret;
> > +
> > +       if (!rdev_amount || !d || !d->map_event || !d->name)
> > +               return ERR_PTR(-EINVAL);
> > +
> > +       if (irq <= 0) {
> > +               dev_err(dev, "No IRQ\n");
> > +               return ERR_PTR(-EINVAL);
> 
> Why shadowing error code? Negative IRQ is anything but “no IRQ”.

This was a good point. The irq is passed here as parameter. From this
function's perspective the negative irq is invalid parameter - we don't
know how the caller has obtained it. Print could show the value
contained in irq though.

Now that you pointed this out I am unsure if this check is needed here.
If we check it, then I still think we should report -EINVAL for invalid
parameter. Other option is to just call the request_threaded_irq() -
log the IRQ request failure and return what request_threaded_irq()
returns. Do you think that would make sense?

> > +
> > +/**
> > + * regulator_irq_helper_cancel - drop IRQ based regulator
> > event/error notifier
> > + *
> > + * @handle:            Pointer to handle returned by a successful
> > call to
> > + *                     regulator_irq_helper(). Will be NULLed upon
> > return.
> > + *
> > + * The associated IRQ is released and work is cancelled when the
> > function
> > + * returns.
> > + */
> > +void regulator_irq_helper_cancel(void **handle)
> > +{
> > +       if (handle && *handle) {
> 
> Can handle ever be NULL here ? (Yes, I understand that you export
> this)

To tell the truth - I am not sure. I *guess* that if we allow this to
be NULL, then one *could* implement a driver for IC where IRQs are
optional, in a way that when IRQs are supported the pointer to handle
is valid, when IRQs aren't supported the pointer is NULL. (Why) do you
think we should skip the check?

>  
> > +               struct regulator_irq *h = *handle;
> > +
> > +               free_irq(h->irq, h);
> > +               if (h->desc.irq_off_ms)
> > +                       cancel_delayed_work_sync(&h->isr_work);
> > +
> > +               h = NULL;
> > +       }
> > +}
> > +EXPORT_SYMBOL_GPL(regulator_irq_helper_cancel);
> > +
> > +static void regulator_irq_helper_drop(struct device *dev, void
> > *res)
> > +{
> > +       regulator_irq_helper_cancel(res);
> > +}
> > +
> > +void *devm_regulator_irq_helper(struct device *dev,
> > +                                const struct regulator_irq_desc
> > *d, int irq,
> > +                                int irq_flags, int common_errs,
> > +                                int *per_rdev_errs,
> > +                                struct regulator_dev **rdev, int
> > rdev_amount)
> > +{
> > +       void **ptr;
> > +
> > +       ptr = devres_alloc(regulator_irq_helper_drop, sizeof(*ptr),
> > GFP_KERNEL);
> > +       if (!ptr)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       *ptr = regulator_irq_helper(dev, d, irq, irq_flags,
> > common_errs,
> > +                                   per_rdev_errs, rdev,
> > rdev_amount);
> > +
> > +       if (IS_ERR(*ptr))
> > +               devres_free(ptr);
> > +       else
> > +               devres_add(dev, ptr);
> > +
> > +       return *ptr;
> 
> Why not to use devm_add_action{_or_reset}()?

I just followed the same approach that has been used in other regulator
functions. (drivers/regulator/devres.c)
OTOH, the devm_add_action makes this little bit simpler so I'll convert
to use it.

Mark, do you have a reason of not using devm_add_action() in devres.c?
Should devm_add_action() be used in some other functions there? And
should this be moved to devres.c?

Best Regards
	Matti Vaittinen
Andy Shevchenko April 7, 2021, 9:10 a.m. UTC | #2
On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
<matti.vaittinen@fi.rohmeurope.com> wrote:
>
> Morning Andy,
>
> Thanks for the review! By the way, is it me or did your mail-client
> spill this out using HTML?

It's Gmail from my mobile phone, sorry for that. We have to blame
Google that they don't think through.

> On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > On Tuesday, April 6, 2021, Matti Vaittinen <
> > matti.vaittinen@fi.rohmeurope.com> wrote:

...

> > > +       pr_emerg(msg);
> >
> > Oh là là, besides build bot complaints, this has serious security
> > implications. Never do like this.
>
> I'm not even trying to claim that was correct. And I did send a fixup -
> sorry for this. I don't intend to do this again.
>
> Now, when this is said - If you have a minute, please educate me.
> Assuming we know all the callers and that all the callers use this as
>
> die_loudly("foobarfoo\n");
> - what is the exploit mechanism?

Not a security guy, but my understanding is that this code may be used
as a gadget in ROP technique of attacks.
In that case msg can be anything. On top of that, somebody may
mistakenly (inadvertently) put the code that allows user controller
input to go to this path.

And last but not least, that some newbies might copy'n'paste bad
examples where they will expose security breach.

With the modern world of Spectre, rowhammer, and other side channel
attacks I may believe that one may exhaust the regulator for getting
advantage on an attack vector.

But again, not a security guy here.

> > > +       BUG();
> > > +}
> > > +

...

> > > errors will be
> > > + *                     or'ed with common erros. If this is given

errors ?

...

> > > +       if (irq <= 0) {
> > > +               dev_err(dev, "No IRQ\n");
> > > +               return ERR_PTR(-EINVAL);
> >
> > Why shadowing error code? Negative IRQ is anything but “no IRQ”.
>
> This was a good point. The irq is passed here as parameter. From this
> function's perspective the negative irq is invalid parameter - we don't
> know how the caller has obtained it. Print could show the value
> contained in irq though.

> Now that you pointed this out I am unsure if this check is needed here.
> If we check it, then I still think we should report -EINVAL for invalid
> parameter. Other option is to just call the request_threaded_irq() -
> log the IRQ request failure and return what request_threaded_irq()
> returns. Do you think that would make sense?

Why is the parameter signed type then?
Shouldn't the caller take care of it?

Otherwise, what is the difference between passing negative IRQ to
request_irq() call?
As you said, you shouldn't make assumptions about what caller meant by this.

So, I would simply drop the check (from easiness of the code perspective).

...

> > > +void regulator_irq_helper_cancel(void **handle)
> > > +{
> > > +       if (handle && *handle) {
> >
> > Can handle ever be NULL here ? (Yes, I understand that you export
> > this)
>
> To tell the truth - I am not sure. I *guess* that if we allow this to
> be NULL, then one *could* implement a driver for IC where IRQs are
> optional, in a way that when IRQs are supported the pointer to handle
> is valid, when IRQs aren't supported the pointer is NULL. (Why) do you
> think we should skip the check?

Just my guts feeling. I don't remember that I ever saw checks like
this for indirect pointers.
Of course it doesn't mean there are no such checks present or may be present.

...

> > Why not to use devm_add_action{_or_reset}()?
>
> I just followed the same approach that has been used in other regulator
> functions. (drivers/regulator/devres.c)
> OTOH, the devm_add_action makes this little bit simpler so I'll convert
> to use it.
>
> Mark, do you have a reason of not using devm_add_action() in devres.c?
> Should devm_add_action() be used in some other functions there? And
> should this be moved to devres.c?

I think the reason for this is as simple as a historical one, i.e.
there was no such API that time.
Vaittinen, Matti April 7, 2021, 9:49 a.m. UTC | #3
Hello Andy,

On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> <matti.vaittinen@fi.rohmeurope.com> wrote:
> > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > matti.vaittinen@fi.rohmeurope.com> wrote:
> 
> ...
> 
> > > > +       pr_emerg(msg);
> > > 
> > > Oh là là, besides build bot complaints, this has serious security
> > > implications. Never do like this.
> > 
> > I'm not even trying to claim that was correct. And I did send a
> > fixup -
> > sorry for this. I don't intend to do this again.
> > 
> > Now, when this is said - If you have a minute, please educate me.
> > Assuming we know all the callers and that all the callers use this
> > as
> > 
> > die_loudly("foobarfoo\n");
> > - what is the exploit mechanism?
> 
> Not a security guy, but my understanding is that this code may be
> used
> as a gadget in ROP technique of attacks.

Thanks Andy. It'd be interesting to learn more details as I am not a
security expert either :)

> In that case msg can be anything. On top of that, somebody may
> mistakenly (inadvertently) put the code that allows user controller
> input to go to this path.

Yes. This is a good reason to not to do this - but I was interested in
knowing if there is a potential risk even if:

> > all the callers use this
> > as
> > 
> > die_loudly("foobarfoo\n");


> And last but not least, that some newbies might copy'n'paste bad
> examples where they will expose security breach.

Yes yes. As I said, I am not trying to say it is Ok. I was just
wondering what are the risks if users of the print function were known.

> With the modern world of Spectre, rowhammer, and other side channel
> attacks I may believe that one may exhaust the regulator for getting
> advantage on an attack vector.
> 
> But again, not a security guy here.

Thanks anyways :)

> 
> > > > +       BUG();
> > > > +}
> > > > +
> 
> ...
> 
> > > > errors will be
> > > > + *                     or'ed with common erros. If this is
> > > > given
> 
> errors ?

Thanks. I didn't first spot the typo even though you pointed it to me.
Luckily my evolution has occasional problems when communicating with
the mail server. I had enough time to hit the cancel before sending out
a message where I wondered how I should clarify this :]

> ...
> 
> > > > +       if (irq <= 0) {
> > > > +               dev_err(dev, "No IRQ\n");
> > > > +               return ERR_PTR(-EINVAL);
> > > 
> > > Why shadowing error code? Negative IRQ is anything but “no IRQ”.
> > 
> > This was a good point. The irq is passed here as parameter. From
> > this
> > function's perspective the negative irq is invalid parameter - we
> > don't
> > know how the caller has obtained it. Print could show the value
> > contained in irq though.
> > Now that you pointed this out I am unsure if this check is needed
> > here.
> > If we check it, then I still think we should report -EINVAL for
> > invalid
> > parameter. Other option is to just call the request_threaded_irq()
> > -
> > log the IRQ request failure and return what request_threaded_irq()
> > returns. Do you think that would make sense?
> 
> Why is the parameter signed type then?
> Shouldn't the caller take care of it?
> 
> Otherwise, what is the difference between passing negative IRQ to
> request_irq() call?
> As you said, you shouldn't make assumptions about what caller meant
> by this.
> 
> So, I would simply drop the check (from easiness of the code
> perspective).

Yep. I was going to drop the check. Good point. Thanks.
I'll send v6 shortly to address the issues you spotted Andy. Thanks.

> 
> ...
> 
> > > > +void regulator_irq_helper_cancel(void **handle)
> > > > +{
> > > > +       if (handle && *handle) {
> > > 
> > > Can handle ever be NULL here ? (Yes, I understand that you export
> > > this)
> > 
> > To tell the truth - I am not sure. I *guess* that if we allow this
> > to
> > be NULL, then one *could* implement a driver for IC where IRQs are
> > optional, in a way that when IRQs are supported the pointer to
> > handle
> > is valid, when IRQs aren't supported the pointer is NULL. (Why) do
> > you
> > think we should skip the check?
> 
> Just my guts feeling. I don't remember that I ever saw checks like
> this for indirect pointers.
> Of course it doesn't mean there are no such checks present or may be
> present.

I think I'll keep the check unless there is some reason why it should
be omitted.

> > > Why not to use devm_add_action{_or_reset}()?
> > 
> > I just followed the same approach that has been used in other
> > regulator
> > functions. (drivers/regulator/devres.c)
> > OTOH, the devm_add_action makes this little bit simpler so I'll
> > convert
> > to use it.
> > 
> > Mark, do you have a reason of not using devm_add_action() in
> > devres.c?
> > Should devm_add_action() be used in some other functions there? And
> > should this be moved to devres.c?
> 
> I think the reason for this is as simple as a historical one, i.e.
> there was no such API that time.

Right. This is probably the reason why they were written as they are. I
was just wondering if Mark had a reason to keep them that way - or if
he would appreciate it if one converted them to use the
devm_add_action() family of functions.

Best Regards
  Matti.
Andy Shevchenko April 7, 2021, 12:50 p.m. UTC | #4
On Wed, Apr 7, 2021 at 12:49 PM Vaittinen, Matti
<Matti.Vaittinen@fi.rohmeurope.com> wrote:
> On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> > On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> > <matti.vaittinen@fi.rohmeurope.com> wrote:
> > > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > > matti.vaittinen@fi.rohmeurope.com> wrote:

Kees, there are two non-security guys discussing potential security
matters. Perhaps you may shed a light on this and tell which of our
stuff is risky and which is not and your recommendations on it.

> > > > > +       pr_emerg(msg);
> > > >
> > > > Oh là là, besides build bot complaints, this has serious security
> > > > implications. Never do like this.
> > >
> > > I'm not even trying to claim that was correct. And I did send a
> > > fixup -
> > > sorry for this. I don't intend to do this again.
> > >
> > > Now, when this is said - If you have a minute, please educate me.
> > > Assuming we know all the callers and that all the callers use this
> > > as
> > >
> > > die_loudly("foobarfoo\n");
> > > - what is the exploit mechanism?
> >
> > Not a security guy, but my understanding is that this code may be
> > used
> > as a gadget in ROP technique of attacks.
>
> Thanks Andy. It'd be interesting to learn more details as I am not a
> security expert either :)
>
> > In that case msg can be anything. On top of that, somebody may
> > mistakenly (inadvertently) put the code that allows user controller
> > input to go to this path.
>
> Yes. This is a good reason to not to do this - but I was interested in
> knowing if there is a potential risk even if:
>
> > > all the callers use this
> > > as
> > >
> > > die_loudly("foobarfoo\n");

I don't see direct issues, only indirect ones, for example, if by some
reason the memory of this message appears writable. So, whoever
controls the format string of printf() controls a lot. That's why it's
preferable to spell out exact intentions in the explicit format
string.

> > And last but not least, that some newbies might copy'n'paste bad
> > examples where they will expose security breach.
>
> Yes yes. As I said, I am not trying to say it is Ok. I was just
> wondering what are the risks if users of the print function were known.
>
> > With the modern world of Spectre, rowhammer, and other side channel
> > attacks I may believe that one may exhaust the regulator for getting
> > advantage on an attack vector.
> >
> > But again, not a security guy here.
>
> Thanks anyways :)

> > > > > +       BUG();
> > > > > +}
Kees Cook April 9, 2021, 3:20 a.m. UTC | #5
On Wed, Apr 07, 2021 at 03:50:15PM +0300, Andy Shevchenko wrote:
> On Wed, Apr 7, 2021 at 12:49 PM Vaittinen, Matti
> <Matti.Vaittinen@fi.rohmeurope.com> wrote:
> > On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> > > <matti.vaittinen@fi.rohmeurope.com> wrote:
> > > > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > > > matti.vaittinen@fi.rohmeurope.com> wrote:
> 
> Kees, there are two non-security guys discussing potential security
> matters. Perhaps you may shed a light on this and tell which of our
> stuff is risky and which is not and your recommendations on it.

Hi!

> > > > > > +       pr_emerg(msg);
> > > > >
> > > > > Oh là là, besides build bot complaints, this has serious security
> > > > > implications. Never do like this.
> > > >
> > > > I'm not even trying to claim that was correct. And I did send a
> > > > fixup -
> > > > sorry for this. I don't intend to do this again.
> > > >
> > > > Now, when this is said - If you have a minute, please educate me.
> > > > Assuming we know all the callers and that all the callers use this
> > > > as
> > > >
> > > > die_loudly("foobarfoo\n");
> > > > - what is the exploit mechanism?

I may not be following the thread exactly, here, but normally the issue
is just one of robustness and code maintainability. You can't be sure all
future callers will always pass in a const string, so better to always do:

	pr_whatever("%s\n", string_var);

> > > Not a security guy, but my understanding is that this code may be
> > > used
> > > as a gadget in ROP technique of attacks.

The primary concern is with giving an attacker control over a format
string (which can be used to expose kernel memory). It used to be much
more serious when the kernel still implemented %n, which would turn such
things into a potential memory _overwrite_. We removed %n a long time
ago now. :)

> > Thanks Andy. It'd be interesting to learn more details as I am not a
> > security expert either :)
> >
> > > In that case msg can be anything. On top of that, somebody may
> > > mistakenly (inadvertently) put the code that allows user controller
> > > input to go to this path.
> >
> > Yes. This is a good reason to not to do this - but I was interested in
> > knowing if there is a potential risk even if:
> >
> > > > all the callers use this
> > > > as
> > > >
> > > > die_loudly("foobarfoo\n");
> 
> I don't see direct issues, only indirect ones, for example, if by some
> reason the memory of this message appears writable. So, whoever
> controls the format string of printf() controls a lot. That's why it's
> preferable to spell out exact intentions in the explicit format
> string.

Right.

> > > > > > +       BUG();
> > > > > > +}

This, though, are you sure you want to use BUG()? Linus gets upset about
such things:
https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
Vaittinen, Matti April 9, 2021, 7:08 a.m. UTC | #6
On Thu, 2021-04-08 at 20:20 -0700, Kees Cook wrote:
> On Wed, Apr 07, 2021 at 03:50:15PM +0300, Andy Shevchenko wrote:
> > On Wed, Apr 7, 2021 at 12:49 PM Vaittinen, Matti
> > <Matti.Vaittinen@fi.rohmeurope.com> wrote:
> > > On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> > > > On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> > > > <matti.vaittinen@fi.rohmeurope.com> wrote:
> > > > > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > > > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > > > > matti.vaittinen@fi.rohmeurope.com> wrote:
> > > > > > > +       BUG();
> > > > > > > +}
> 
> This, though, are you sure you want to use BUG()? Linus gets upset
> about
> such things:
> https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
> 

I see. I am unsure of what would be the best action in the regulator
case we are handling here. To give the context, we assume here a
situation where power has gone out of regulation and the hardware is
probably failing. First countermeasure to protect what is left of HW is
to shut-down the failing regulator. BUG() was called here as a last
resort if shutting the power via regulator interface was not
implemented or working.

Eg, we try to take what ever last measure we can to minimize the HW
damage - and BUG() was used for this in the qcom driver where I stole
the idea. Judging the comment related to BUG() in asm-generic/bug.h

/*
 * Don't use BUG() or BUG_ON() unless there's really no way out; one
 
* example might be detecting data structure corruption in the middle
 *
of an operation that can't be backed out of.  If the (sub)system
 * can
somehow continue operating, perhaps with reduced functionality,
 * it's
probably not BUG-worthy.
 *
 * If you're tempted to BUG(), think
again:  is completely giving up
 * really the *only* solution?  There
are usually better options, where
 * users don't need to reboot ASAP and
can mostly shut down cleanly.
 */
https://elixir.bootlin.com/linux/v5.12-rc6/source/include/asm-generic/bug.h#L55

this really might be valid use-case.

To me the real question is what happens after the BUG() - and if there
is any generic handling or if it is platform/board specific? Does it
actually have any chance to save the HW?

Mark already pointed that we might need to figure a way to punt a
"failing event" to the user-space to initiate better "safety shutdown".
Such event does not currently exist so I think the main use-case here
is to do logging and potentially prevent enabling any further actions
in the failing HW.

So - any better suggestions?

Best Regards
	Matti Vaittinen
Vaittinen, Matti April 12, 2021, 12:24 p.m. UTC | #7
On Fri, 2021-04-09 at 10:08 +0300, Matti Vaittinen wrote:
> On Thu, 2021-04-08 at 20:20 -0700, Kees Cook wrote:
> > On Wed, Apr 07, 2021 at 03:50:15PM +0300, Andy Shevchenko wrote:
> > > On Wed, Apr 7, 2021 at 12:49 PM Vaittinen, Matti
> > > <Matti.Vaittinen@fi.rohmeurope.com> wrote:
> > > > On Wed, 2021-04-07 at 12:10 +0300, Andy Shevchenko wrote:
> > > > > On Wed, Apr 7, 2021 at 8:02 AM Matti Vaittinen
> > > > > <matti.vaittinen@fi.rohmeurope.com> wrote:
> > > > > > On Wed, 2021-04-07 at 01:44 +0300, Andy Shevchenko wrote:
> > > > > > > On Tuesday, April 6, 2021, Matti Vaittinen <
> > > > > > > matti.vaittinen@fi.rohmeurope.com> wrote:
> > > > > > > > +       BUG();
> > > > > > > > +}
> > 
> > This, though, are you sure you want to use BUG()? Linus gets upset
> > about
> > such things:
> > https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on
> > 
> 
> I see. I am unsure of what would be the best action in the regulator
> case we are handling here. To give the context, we assume here a
> situation where power has gone out of regulation and the hardware is
> probably failing. First countermeasure to protect what is left of HW
> is
> to shut-down the failing regulator. BUG() was called here as a last
> resort if shutting the power via regulator interface was not
> implemented or working.
> 
> Eg, we try to take what ever last measure we can to minimize the HW
> damage - and BUG() was used for this in the qcom driver where I stole
> the idea. Judging the comment related to BUG() in asm-generic/bug.h
> 
> /*
>  * Don't use BUG() or BUG_ON() unless there's really no way out; one
>  
> * example might be detecting data structure corruption in the middle
>  *
> of an operation that can't be backed out of.  If the (sub)system
>  * can
> somehow continue operating, perhaps with reduced functionality,
>  * it's
> probably not BUG-worthy.
>  *
>  * If you're tempted to BUG(), think
> again:  is completely giving up
>  * really the *only* solution?  There
> are usually better options, where
>  * users don't need to reboot ASAP and
> can mostly shut down cleanly.
>  */
> https://elixir.bootlin.com/linux/v5.12-rc6/source/include/asm-generic/bug.h#L55
> 
> this really might be valid use-case.
> 
> To me the real question is what happens after the BUG() - and if
> there
> is any generic handling or if it is platform/board specific? Does it
> actually have any chance to save the HW?
> 
> Mark already pointed that we might need to figure a way to punt a
> "failing event" to the user-space to initiate better "safety
> shutdown".
> Such event does not currently exist so I think the main use-case here
> is to do logging and potentially prevent enabling any further actions
> in the failing HW.
> 
> So - any better suggestions?
> 

Maybe we should take same approach as is taken in thermal_core? Quoting
the thermal documentation:

"On an event of critical trip temperature crossing. Thermal
framework             
allows the system to shutdown gracefully by calling
orderly_poweroff().          
In the event of a failure of orderly_poweroff() to shut down the
system          
we are in danger of keeping the system alive at undesirably
high                 
temperatures. To mitigate this high risk scenario we program a
work              
queue to fire after a pre-determined number of seconds to
start                  
an emergency shutdown of the device using the
kernel_power_off()                 
function. In case kernel_power_off() fails then
finally                          
emergency_restart() is called in the worst case."

Maybe this 'hardware protection, in-kernel, emergency HW saving
shutdown' - logic, should be pulled out of thermal_core.c (or at least
exported) for (other parts like) the regulators to use?

I don't like the idea relying in the user-space to be in shape it can
handle the situation. I may be mistaken but I think a quick action
might be required. Hence the in-kernel handling does not sound so bad
to me.

I am open to all education and suggestions. Meanwhile I am planning to
just convert the BUG() to WARN(). I don't claim I know how BUG() is
implemented on each platform - but my understanding is that it does not
guarantee any power to be cut but just halts the calling process(?). I
guess this does not guarantee what happens next - maybe it even keeps
the power enabled and end up just deadlocking the system by reserved
locks? I think thermal guys have been pondering this scenario for
severe temperature protection shutdown so I would like to hear your
opinions.


Best Regards
Matti Vaittinen
Mark Brown April 12, 2021, 1:09 p.m. UTC | #8
On Mon, Apr 12, 2021 at 03:24:16PM +0300, Matti Vaittinen wrote:

> Maybe this 'hardware protection, in-kernel, emergency HW saving
> shutdown' - logic, should be pulled out of thermal_core.c (or at least
> exported) for (other parts like) the regulators to use?

That sounds sensible.
diff mbox series

Patch

diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 44d2f8bf4b74..e25f1c2d6c9b 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -4,7 +4,7 @@ 
 #
 
 
-obj-$(CONFIG_REGULATOR) += core.o dummy.o fixed-helper.o helpers.o devres.o
+obj-$(CONFIG_REGULATOR) += core.o dummy.o fixed-helper.o helpers.o devres.o irq_helpers.o
 obj-$(CONFIG_OF) += of_regulator.o
 obj-$(CONFIG_REGULATOR_FIXED_VOLTAGE) += fixed.o
 obj-$(CONFIG_REGULATOR_VIRTUAL_CONSUMER) += virtual.o
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 16114aea099a..fabc83288e1b 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4388,20 +4388,37 @@  unsigned int regulator_get_mode(struct regulator *regulator)
 }
 EXPORT_SYMBOL_GPL(regulator_get_mode);
 
+static int rdev_get_cached_err_flags(struct regulator_dev *rdev)
+{
+	int ret = 0;
+
+	if (rdev->use_cached_err) {
+		spin_lock(&rdev->err_lock);
+		ret = rdev->cached_err;
+		spin_unlock(&rdev->err_lock);
+	}
+	return ret;
+}
+
 static int _regulator_get_error_flags(struct regulator_dev *rdev,
 					unsigned int *flags)
 {
-	int ret;
+	int ret, tmpret;
 
 	regulator_lock(rdev);
 
+	ret = rdev_get_cached_err_flags(rdev);
+
 	/* sanity check */
-	if (!rdev->desc->ops->get_error_flags) {
+	if (rdev->desc->ops->get_error_flags) {
+		tmpret = rdev->desc->ops->get_error_flags(rdev, flags);
+		if (tmpret > 0)
+			ret |= tmpret;
+	} else if (!rdev->use_cached_err) {
 		ret = -EINVAL;
 		goto out;
 	}
 
-	ret = rdev->desc->ops->get_error_flags(rdev, flags);
 out:
 	regulator_unlock(rdev);
 	return ret;
@@ -5236,6 +5253,7 @@  regulator_register(const struct regulator_desc *regulator_desc,
 		goto rinse;
 	}
 	device_initialize(&rdev->dev);
+	spin_lock_init(&rdev->err_lock);
 
 	/*
 	 * Duplicate the config so the driver could override it after
diff --git a/drivers/regulator/irq_helpers.c b/drivers/regulator/irq_helpers.c
new file mode 100644
index 000000000000..cc0c23c23b5f
--- /dev/null
+++ b/drivers/regulator/irq_helpers.c
@@ -0,0 +1,432 @@ 
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2021 ROHM Semiconductors
+// regulator IRQ based event notification helpers
+//
+// Logic has been partially adapted from qcom-labibb driver.
+//
+// Author: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/of_irq.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/regulator/driver.h>
+
+struct regulator_irq {
+	struct regulator_irq_data rdata;
+	struct regulator_irq_desc desc;
+	int irq;
+	int retry_cnt;
+	struct delayed_work isr_work;
+};
+
+/*
+ * Should only be called from threaded handler to prevent potential deadlock
+ */
+static void rdev_flag_err(struct regulator_dev *rdev, int err)
+{
+	spin_lock(&rdev->err_lock);
+	rdev->cached_err |= err;
+	spin_unlock(&rdev->err_lock);
+}
+
+static void rdev_clear_err(struct regulator_dev *rdev, int err)
+{
+	spin_lock(&rdev->err_lock);
+	rdev->cached_err &= ~err;
+	spin_unlock(&rdev->err_lock);
+}
+
+static void die_loudly(const char *msg)
+{
+	pr_emerg(msg);
+	BUG();
+}
+
+static void regulator_notifier_isr_work(struct work_struct *work)
+{
+	struct regulator_irq *h;
+	struct regulator_irq_desc *d;
+	struct regulator_irq_data *rid;
+	int ret = 0;
+	int tmo, i;
+	int num_rdevs;
+
+	h = container_of(work, struct regulator_irq,
+			    isr_work.work);
+	d = &h->desc;
+	rid = &h->rdata;
+	num_rdevs = rid->num_states;
+
+reread:
+	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
+		if (d->die)
+			ret = d->die(rid);
+		else
+			die_loudly("Regulator HW failure? - no IC recovery\n");
+
+		/*
+		 * If the 'last resort' IC recovery failed we will have
+		 * nothing else left to do...
+		 */
+		if (ret)
+			die_loudly("Regulator HW failure? - IC recovery failed\n");
+
+		/*
+		 * If h->die() was implemented we assume recovery has been
+		 * attempted (probably regulator was shut down)
+		 * and we just enable IRQ and bail-out.
+		 */
+		goto enable_out;
+	}
+	if (d->renable) {
+		ret = d->renable(rid);
+
+		if (ret == REGULATOR_FAILED_RETRY) {
+			/* Driver could not get current status */
+			h->retry_cnt++;
+			if (!d->reread_ms)
+				goto reread;
+
+			tmo = d->reread_ms;
+			goto reschedule;
+		}
+
+		if (ret) {
+			/*
+			 * IC status reading succeeded. update error info
+			 * just in case the renable changed it.
+			 */
+			for (i = 0; i < num_rdevs; i++) {
+				struct regulator_err_state *stat;
+				struct regulator_dev *rdev;
+
+				stat = &rid->states[i];
+				rdev = stat->rdev;
+				rdev_clear_err(rdev, (~stat->errors) &
+						      stat->possible_errs);
+			}
+			h->retry_cnt++;
+			/*
+			 * The IC indicated problem is still ON - no point in
+			 * re-enabling the IRQ. Retry later.
+			 */
+			tmo = d->irq_off_ms;
+			goto reschedule;
+		}
+	}
+
+	/*
+	 * Either IC reported problem cleared or no status checker was provided.
+	 * If problems are gone - good. If not - then the IRQ will fire again
+	 * and we'll have new nice loop. In any case we should clear error flags
+	 * here and re-enable IRQs.
+	 */
+	for (i = 0; i < num_rdevs; i++) {
+		struct regulator_err_state *stat;
+		struct regulator_dev *rdev;
+
+		stat = &rid->states[i];
+		rdev = stat->rdev;
+		rdev_clear_err(rdev, stat->possible_errs);
+	}
+
+	/*
+	 * Things have been seemingly successful => zero retry-counter.
+	 */
+	h->retry_cnt = 0;
+
+enable_out:
+	enable_irq(h->irq);
+
+	return;
+
+reschedule:
+	if (!d->high_prio)
+		mod_delayed_work(system_wq, &h->isr_work,
+				 msecs_to_jiffies(tmo));
+	else
+		mod_delayed_work(system_highpri_wq, &h->isr_work,
+				 msecs_to_jiffies(tmo));
+}
+
+static irqreturn_t regulator_notifier_isr(int irq, void *data)
+{
+	struct regulator_irq *h = data;
+	struct regulator_irq_desc *d;
+	struct regulator_irq_data *rid;
+	unsigned long rdev_map = 0;
+	int num_rdevs;
+	int ret, i, j;
+
+	d = &h->desc;
+	rid = &h->rdata;
+	num_rdevs = rid->num_states;
+
+	if (d->fatal_cnt)
+		h->retry_cnt++;
+
+	/*
+	 * we spare few cycles by not clearing statuses prior this call.
+	 * The IC driver must initialize the status buffers for rdevs
+	 * which it indicates having active events via rdev_map.
+	 *
+	 * Maybe we should just to be on a safer side(?)
+	 */
+	ret = d->map_event(irq, rid, &rdev_map);
+
+	/*
+	 * If status reading fails (which is unlikely) we don't ack/disable
+	 * IRQ but just increase fail count and retry when IRQ fires again.
+	 * If retry_count exceeds given safety limit we call IC specific die
+	 * handler which can try disabling regulator(s).
+	 *
+	 * If no die handler is given we will just bug() as a last resort.
+	 *
+	 * We could try disabling all associated rdevs - but we might shoot
+	 * ourself in the head and leave problematic regulator enabled. So
+	 * if IC has no die-handler populated we just assume the regulator
+	 * can't be disabled.
+	 */
+	if (unlikely(ret == REGULATOR_FAILED_RETRY))
+		goto fail_out;
+
+	h->retry_cnt = 0;
+	/*
+	 * Let's not disable IRQ if there was no status bits for us. We'd
+	 * better leave spurious IRQ handling to genirq
+	 */
+	if (ret || !rdev_map)
+		return IRQ_NONE;
+
+	/*
+	 * Some events are bogus if regulator is disabled. Skip such events
+	 * if all relevant regulators are disabled
+	 */
+	if (d->skip_off) {
+		int skip = 1;
+
+		for_each_set_bit(i, &rdev_map, num_rdevs) {
+			struct regulator_dev *rdev;
+			const struct regulator_ops *ops;
+
+			rdev = rid->states[i].rdev;
+			ops = rdev->desc->ops;
+
+			/*
+			 * If any of the flagged regulators is enabled we do
+			 * handle this
+			 */
+			if (ops->is_enabled(rdev)) {
+				skip = 0;
+				break;
+			}
+		}
+		if (skip)
+			return IRQ_NONE;
+	}
+
+	/* Disable IRQ if HW keeps line asserted */
+	if (d->irq_off_ms)
+		disable_irq_nosync(irq);
+
+	/*
+	 * IRQ seems to be for us. Let's fire correct notifiers / store error
+	 * flags
+	 */
+	for_each_set_bit(i, &rdev_map, num_rdevs) {
+		struct regulator_err_state *stat;
+		int len;
+		struct regulator_dev *rdev;
+
+		stat = &rid->states[i];
+		len = sizeof(stat->notifs);
+
+		rdev = stat->rdev;
+		for_each_set_bit(j, &stat->notifs, 8 * len)
+			regulator_notifier_call_chain(rdev, 1 << (j - 1), NULL);
+
+		rdev_flag_err(rdev, stat->errors);
+	}
+
+	if (d->irq_off_ms) {
+		if (!d->high_prio)
+			schedule_delayed_work(&h->isr_work,
+					      msecs_to_jiffies(d->irq_off_ms));
+		else
+			mod_delayed_work(system_highpri_wq,
+					 &h->isr_work,
+					 msecs_to_jiffies(d->irq_off_ms));
+	}
+
+	return IRQ_HANDLED;
+
+fail_out:
+	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
+		if (d->die)
+			ret = d->die(rid);
+
+		/*
+		 * If die() failed or was not implemented just BUG() as last
+		 * attemt to save HW.
+		 */
+		if (ret)
+			die_loudly("Regulator HW failure? - retry count exceeded\n");
+	}
+
+	return IRQ_NONE;
+}
+
+static int init_rdev_state(struct device *dev, struct regulator_irq *h,
+			   struct regulator_dev **rdev, int common_err,
+			   int *rdev_err, int rdev_amount)
+{
+	int i;
+
+	h->rdata.states = devm_kzalloc(dev, sizeof(*h->rdata.states) *
+				       rdev_amount, GFP_KERNEL);
+	if (!h->rdata.states)
+		return -ENOMEM;
+
+	h->rdata.num_states = rdev_amount;
+	h->rdata.data = h->desc.data;
+
+	for (i = 0; i < rdev_amount; i++) {
+		h->rdata.states[i].possible_errs = common_err;
+		if (rdev_err)
+			h->rdata.states[i].possible_errs |= *rdev_err++;
+		h->rdata.states[i].rdev = *rdev++;
+	}
+
+	return 0;
+}
+
+static void init_rdev_errors(struct regulator_irq *h)
+{
+	int i;
+
+	for (i = 0; i < h->rdata.num_states; i++) {
+		if (h->rdata.states[i].possible_errs)
+			/* Can we trust writing this boolean is atomic? */
+			h->rdata.states[i].rdev->use_cached_err = true;
+	}
+}
+
+/**
+ * regulator_irq_helper - register IRQ based regulator event/error notifier
+ *
+ * @dev:		device to which lifetime the helper's lifetime is
+ *			bound.
+ * @d:			IRQ helper descriptor.
+ * @irq:		IRQ used to inform events/errors to be notified.
+ * @irq_flags:		Extra IRQ flags to be OR's with the default IRQF_ONESHOT
+ *			when requesting the (threaded) irq.
+ * @common_errs:	Errors which can be flagged by this IRQ for all rdevs.
+ *			When IRQ is re-enabled these errors will be cleared
+ *			from all associated regulators
+ * @per_rdev_errs:	Optional error flag array describing errors specific
+ *			for only some of the regulators. These errors will be
+ *			or'ed with common erros. If this is given the array
+ *			should contain rdev_amount flags. Can be set to NULL
+ *			if there is no regulator specific error flags for this
+ *			IRQ.
+ * @rdev:		Array of regulators associated with this IRQ.
+ * @rdev_amount:	Amount of regulators associated wit this IRQ.
+ */
+void *regulator_irq_helper(struct device *dev,
+			    const struct regulator_irq_desc *d, int irq,
+			    int irq_flags, int common_errs, int *per_rdev_errs,
+			    struct regulator_dev **rdev, int rdev_amount)
+{
+	struct regulator_irq *h;
+	int ret;
+
+	if (!rdev_amount || !d || !d->map_event || !d->name)
+		return ERR_PTR(-EINVAL);
+
+	if (irq <= 0) {
+		dev_err(dev, "No IRQ\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	h = devm_kzalloc(dev, sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return ERR_PTR(-ENOMEM);
+
+	h->irq = irq;
+	h->desc = *d;
+
+	ret = init_rdev_state(dev, h, rdev, common_errs, per_rdev_errs,
+			      rdev_amount);
+	if (ret)
+		return ERR_PTR(ret);
+
+	init_rdev_errors(h);
+
+	if (h->desc.irq_off_ms)
+		INIT_DELAYED_WORK(&h->isr_work, regulator_notifier_isr_work);
+
+	ret = request_threaded_irq(h->irq, NULL, regulator_notifier_isr,
+				   IRQF_ONESHOT | irq_flags, h->desc.name, h);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return h;
+}
+EXPORT_SYMBOL_GPL(regulator_irq_helper);
+
+/**
+ * regulator_irq_helper_cancel - drop IRQ based regulator event/error notifier
+ *
+ * @handle:		Pointer to handle returned by a successful call to
+ *			regulator_irq_helper(). Will be NULLed upon return.
+ *
+ * The associated IRQ is released and work is cancelled when the function
+ * returns.
+ */
+void regulator_irq_helper_cancel(void **handle)
+{
+	if (handle && *handle) {
+		struct regulator_irq *h = *handle;
+
+		free_irq(h->irq, h);
+		if (h->desc.irq_off_ms)
+			cancel_delayed_work_sync(&h->isr_work);
+
+		h = NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(regulator_irq_helper_cancel);
+
+static void regulator_irq_helper_drop(struct device *dev, void *res)
+{
+	regulator_irq_helper_cancel(res);
+}
+
+void *devm_regulator_irq_helper(struct device *dev,
+				 const struct regulator_irq_desc *d, int irq,
+				 int irq_flags, int common_errs,
+				 int *per_rdev_errs,
+				 struct regulator_dev **rdev, int rdev_amount)
+{
+	void **ptr;
+
+	ptr = devres_alloc(regulator_irq_helper_drop, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return ERR_PTR(-ENOMEM);
+
+	*ptr = regulator_irq_helper(dev, d, irq, irq_flags, common_errs,
+				    per_rdev_errs, rdev, rdev_amount);
+
+	if (IS_ERR(*ptr))
+		devres_free(ptr);
+	else
+		devres_add(dev, ptr);
+
+	return *ptr;
+}
+EXPORT_SYMBOL_GPL(devm_regulator_irq_helper);
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index d7c77ee370f3..03a8eee9fca9 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -409,6 +409,128 @@  struct regulator_config {
 	struct gpio_desc *ena_gpiod;
 };
 
+/**
+ * struct regulator_err_state - regulator error/notification status
+ *
+ * @rdev:		Regulator which status the struct indicates.
+ * @notifs:		Events which have occurred on regulator.
+ * @errors:		Errors which are active on regulator.
+ * @possible_errs:	Errors which can be signaled (by given IRQ).
+ */
+struct regulator_err_state {
+	struct regulator_dev *rdev;
+	unsigned long notifs;
+	unsigned long errors;
+	int possible_errs;
+};
+
+/**
+ * struct regulator_irq_data - regulator error/notification status date
+ *
+ * @states:	Status structs for each of the associated regulators.
+ * @num_states:	Amount of associated regulators.
+ * @data:	Driver data pointer given at regulator_irq_desc.
+ * @opaque:	Value storage for IC driver. Core does not update this. ICs
+ *		may want to store status register value here at map_event and
+ *		compare contents at renable to see if new problems have been
+ *		added to status. If that is the case it may be desirable to
+ *		return REGULATOR_ERROR_CLEARED and not REGULATOR_ERROR_ON to
+ *		allow IRQ fire again and to generate notifications also for
+ *		the new issues.
+ *
+ * This structure is passed to map_event and renable for reporting regulator
+ * status to core.
+ */
+struct regulator_irq_data {
+	struct regulator_err_state *states;
+	int num_states;
+	void *data;
+	long opaque;
+};
+
+/**
+ * struct regulator_irq_desc - notification sender for IRQ based events.
+ *
+ * @name:	The visible name for the IRQ
+ * @fatal_cnt:	If this IRQ is used to signal HW damaging condition it may be
+ *		best to shut-down regulator(s) or reboot the SOC if error
+ *		handling is repeteadly failing. If fatal_cnt is given the IRQ
+ *		handling is aborted if it fails for fatal_cnt times and die()
+ *		callback (if populated) or BUG() is called to try to prevent
+ *		further damage.
+ * @reread_ms:	The time which is waited before attempting to re-read status
+ *		at the worker if IC reading fails. Immediate re-read is done
+ *		if time is not specified.
+ * @irq_off_ms:	The time which IRQ is kept disabled before re-evaluating the
+ *		status for devices which keep IRQ disabled for duration of the
+ *		error. If this is not given the IRQ is left enabled and renable
+ *		is not called.
+ * @skip_off:	If set to true the IRQ handler will attempt to check if any of
+ *		the associated regulators are enabled prior to taking other
+ *		actions. If no regulators are enabled and this is set to true
+ *		a spurious IRQ is assumed and IRQ_NONE is returned.
+ * @high_prio:	Boolean to indicate that high priority WQ should be used.
+ * @data:	Driver private data pointer which will be passed as such to
+ *		the renable, map_event and die callbacks in regulator_irq_data.
+ * @die:	Protection callback. If IC status reading or recovery actions
+ *		fail fatal_cnt times this callback or BUG() is called. This
+ *		callback should implement final protection attempt like
+ *		disabling the regulator. If protection succeeded this may
+ *		return 0. If anything else is returned the core assumes final
+ *		protection failed and calls BUG() as a last resort.
+ * @map_event:	Driver callback to map IRQ status into regulator devices with
+ *		events / errors. NOTE: callback MUST initialize both the
+ *		errors and notifs for all rdevs which it signals having
+ *		active events as core does not clean the map data.
+ *		REGULATOR_FAILED_RETRY can be returned to indicate that the
+ *		status reading from IC failed. If this is repeated for
+ *		fatal_cnt times the core will call die() callback or BUG()
+ *		as a last resort to protect the HW.
+ * @renable:	Optional callback to check status (if HW supports that) before
+ *		re-enabling IRQ. If implemented this should clear the error
+ *		flags so that errors fetched by regulator_get_error_flags()
+ *		are updated. If callback is not impelemted then errors are
+ *		assumed to be cleared and IRQ is re-enabled.
+ *		REGULATOR_FAILED_RETRY can be returned to
+ *		indicate that the status reading from IC failed. If this is
+ *		repeated for 'fatal_cnt' times the core will call die()
+ *		callback or BUG() as a last resort to protect the HW.
+ *		Returning zero indicates that the problem in HW has been solved
+ *		and IRQ will be re-enabled. Returning REGULATOR_ERROR_ON
+ *		indicates the error condition is still active and keeps IRQ
+ *		disabled. Please note that returning REGULATOR_ERROR_ON does
+ *		not retrigger evaluating what events are active or resending
+ *		notifications. If this is needed you probably want to return
+ *		zero and allow IRQ to retrigger causing events to be
+ *		re-evaluated and re-sent.
+ *
+ * This structure is used for registering regulator IRQ notification helper.
+ */
+struct regulator_irq_desc {
+	const char *name;
+	int irq_flags;
+	int fatal_cnt;
+	int reread_ms;
+	int irq_off_ms;
+	bool skip_off;
+	bool high_prio;
+	void *data;
+
+	int (*die)(struct regulator_irq_data *rid);
+	int (*map_event)(int irq, struct regulator_irq_data *rid,
+			  unsigned long *dev_mask);
+	int (*renable)(struct regulator_irq_data *rid);
+};
+
+/*
+ * Return values for regulator IRQ helpers.
+ */
+enum {
+	REGULATOR_ERROR_CLEARED,
+	REGULATOR_FAILED_RETRY,
+	REGULATOR_ERROR_ON,
+};
+
 /*
  * struct coupling_desc
  *
@@ -473,6 +595,9 @@  struct regulator_dev {
 
 	/* time when this regulator was disabled last time */
 	unsigned long last_off_jiffy;
+	int cached_err;
+	bool use_cached_err;
+	spinlock_t err_lock;
 };
 
 struct regulator_dev *
@@ -487,6 +612,16 @@  void devm_regulator_unregister(struct device *dev, struct regulator_dev *rdev);
 
 int regulator_notifier_call_chain(struct regulator_dev *rdev,
 				  unsigned long event, void *data);
+void *devm_regulator_irq_helper(struct device *dev,
+				const struct regulator_irq_desc *d, int irq,
+				int irq_flags, int common_errs,
+				int *per_rdev_errs, struct regulator_dev **rdev,
+				int rdev_amount);
+void *regulator_irq_helper(struct device *dev,
+			   const struct regulator_irq_desc *d, int irq,
+			   int irq_flags, int common_errs, int *per_rdev_errs,
+			   struct regulator_dev **rdev, int rdev_amount);
+void regulator_irq_helper_cancel(void **handle);
 
 void *rdev_get_drvdata(struct regulator_dev *rdev);
 struct device *rdev_get_dev(struct regulator_dev *rdev);