diff mbox series

[2/3] checkkconfigsymbols.py: Fix Kconfig parsing to find 'if' lines

Message ID 20210822192205.43210-3-arielmarcovitch@gmail.com (mailing list archive)
State New, archived
Headers show
Series checkkconfigsymbols.py: Fix various bugs | expand

Commit Message

Ariel Marcovitch Aug. 22, 2021, 7:22 p.m. UTC
When parsing Kconfig files to find symbol definitions and references,
lines after a 'help' line are skipped until a new config definition
starts.

However, it is quite common to define a config and then make some other
configs depend on it by adding an 'if' line. This kind of kconfig
statement usually appears after a config definition which might contain
a 'help' section. The 'if' line is skipped in parse_kconfig_file()
because it is not a config definition.

This means that symbols referenced in this kind of statements are
ignored by this function and thus are not considered undefined
references in case the symbol is not defined.

The REGEX_KCONFIG_STMT regex can't be used because the other types of
statements can't break help lines.

Define a new regex for matching 'if' statements and stop the 'help'
skipping in case it is encountered.

Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
---
 scripts/checkkconfigsymbols.py | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Masahiro Yamada Aug. 24, 2021, 1:30 p.m. UTC | #1
On Mon, Aug 23, 2021 at 4:22 AM Ariel Marcovitch
<arielmarcovitch@gmail.com> wrote:
>
> When parsing Kconfig files to find symbol definitions and references,
> lines after a 'help' line are skipped until a new config definition
> starts.
>
> However, it is quite common to define a config and then make some other
> configs depend on it by adding an 'if' line. This kind of kconfig
> statement usually appears after a config definition which might contain
> a 'help' section. The 'if' line is skipped in parse_kconfig_file()
> because it is not a config definition.
>
> This means that symbols referenced in this kind of statements are
> ignored by this function and thus are not considered undefined
> references in case the symbol is not defined.
>
> The REGEX_KCONFIG_STMT regex can't be used because the other types of
> statements can't break help lines.
>
> Define a new regex for matching 'if' statements and stop the 'help'
> skipping in case it is encountered.
>
> Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
> ---
>  scripts/checkkconfigsymbols.py | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> index b9b0f15e5880..875e9a2c14b2 100755
> --- a/scripts/checkkconfigsymbols.py
> +++ b/scripts/checkkconfigsymbols.py
> @@ -26,6 +26,7 @@ EXPR = r"(?:" + OPERATORS + r"|\s|" + SYMBOL + r")+"
>  DEFAULT = r"default\s+.*?(?:if\s.+){,1}"
>  STMT = r"^\s*(?:if|select|imply|depends\s+on|(?:" + DEFAULT + r"))\s+" + EXPR
>  SOURCE_SYMBOL = r"(?:\W|\b)+[D]{,1}CONFIG_(" + SYMBOL + r")"
> +IF_LINE = r"^\s*(?:if)\s+" + EXPR


Why is it enclosed by "(?: )"   ?

"(?:if)"  seems to the same as "if"






>
>  # regex objects
>  REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
> @@ -35,11 +36,11 @@ REGEX_KCONFIG_DEF = re.compile(DEF)
>  REGEX_KCONFIG_EXPR = re.compile(EXPR)
>  REGEX_KCONFIG_STMT = re.compile(STMT)
>  REGEX_KCONFIG_HELP = re.compile(r"^\s+help\s*$")
> +REGEX_KCONFIG_IF_LINE = re.compile(IF_LINE)
>  REGEX_FILTER_SYMBOLS = re.compile(r"[A-Za-z0-9]$")
>  REGEX_NUMERIC = re.compile(r"0[xX][0-9a-fA-F]+|[0-9]+")
>  REGEX_QUOTES = re.compile("(\"(.*?)\")")
>
> -
>  def parse_options():
>      """The user interface of this module."""
>      usage = "Run this tool to detect Kconfig symbols that are referenced but " \
> @@ -445,6 +446,11 @@ def parse_kconfig_file(kfile):
>          line = line.strip('\n')
>          line = line.split("#")[0]  # ignore comments
>
> +        # 'if EXPR' lines can be after help lines
> +        # The if line itself is handled later
> +        if REGEX_KCONFIG_IF_LINE.match(line):
> +            skip = False
> +


I do not think this is the right fix.
There are similar patterns where
config references are ignored.

For example, FOO and BAR are ignored
in the following cases.

ex1)

choice
          prompt "foo"
          default FOO



ex2)

menu "bar"
           depends on BAR




The help block ends with shallower indentation.




>          if REGEX_KCONFIG_DEF.match(line):
>              symbol_def = REGEX_KCONFIG_DEF.findall(line)
>              defined.append(symbol_def[0])
> --
> 2.25.1
>


--
Best Regards
Masahiro Yamada
Ariel Marcovitch Aug. 29, 2021, 1:17 p.m. UTC | #2
Hello again!

On 24/08/2021 16:30, Masahiro Yamada wrote:
 > On Mon, Aug 23, 2021 at 4:22 AM Ariel Marcovitch
 > <arielmarcovitch@gmail.com> wrote:
 >>
 >> When parsing Kconfig files to find symbol definitions and references,
 >> lines after a 'help' line are skipped until a new config definition
 >> starts.
 >>
 >> However, it is quite common to define a config and then make some other
 >> configs depend on it by adding an 'if' line. This kind of kconfig
 >> statement usually appears after a config definition which might contain
 >> a 'help' section. The 'if' line is skipped in parse_kconfig_file()
 >> because it is not a config definition.
 >>
 >> This means that symbols referenced in this kind of statements are
 >> ignored by this function and thus are not considered undefined
 >> references in case the symbol is not defined.
 >>
 >> The REGEX_KCONFIG_STMT regex can't be used because the other types of
 >> statements can't break help lines.
 >>
 >> Define a new regex for matching 'if' statements and stop the 'help'
 >> skipping in case it is encountered.
 >>
 >> Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
 >> ---
 >>  scripts/checkkconfigsymbols.py | 8 +++++++-
 >>  1 file changed, 7 insertions(+), 1 deletion(-)
 >>
 >> diff --git a/scripts/checkkconfigsymbols.py 
b/scripts/checkkconfigsymbols.py
 >> index b9b0f15e5880..875e9a2c14b2 100755
 >> --- a/scripts/checkkconfigsymbols.py
 >> +++ b/scripts/checkkconfigsymbols.py
 >> @@ -26,6 +26,7 @@ EXPR = r"(?:" + OPERATORS + r"|\s|" + SYMBOL + r")+"
 >>  DEFAULT = r"default\s+.*?(?:if\s.+){,1}"
 >>  STMT = r"^\s*(?:if|select|imply|depends\s+on|(?:" + DEFAULT + 
r"))\s+" + EXPR
 >>  SOURCE_SYMBOL = r"(?:\W|\b)+[D]{,1}CONFIG_(" + SYMBOL + r")"
 >> +IF_LINE = r"^\s*(?:if)\s+" + EXPR
 >
 >
 > Why is it enclosed by "(?: )"   ?
 >
 > "(?:if)"  seems to the same as "if"
Oh you are absolutely right.
I just mindlessly copied the STMT regex and removed the other types :)
 >
 >
 >
 >
 >
 >
 >>
 >>  # regex objects
 >>  REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
 >> @@ -35,11 +36,11 @@ REGEX_KCONFIG_DEF = re.compile(DEF)
 >>  REGEX_KCONFIG_EXPR = re.compile(EXPR)
 >>  REGEX_KCONFIG_STMT = re.compile(STMT)
 >>  REGEX_KCONFIG_HELP = re.compile(r"^\s+help\s*$")
 >> +REGEX_KCONFIG_IF_LINE = re.compile(IF_LINE)
 >>  REGEX_FILTER_SYMBOLS = re.compile(r"[A-Za-z0-9]$")
 >>  REGEX_NUMERIC = re.compile(r"0[xX][0-9a-fA-F]+|[0-9]+")
 >>  REGEX_QUOTES = re.compile("(\"(.*?)\")")
 >>
 >> -
 >>  def parse_options():
 >>      """The user interface of this module."""
 >>      usage = "Run this tool to detect Kconfig symbols that are 
referenced but " \
 >> @@ -445,6 +446,11 @@ def parse_kconfig_file(kfile):
 >>          line = line.strip('\n')
 >>          line = line.split("#")[0]  # ignore comments
 >>
 >> +        # 'if EXPR' lines can be after help lines
 >> +        # The if line itself is handled later
 >> +        if REGEX_KCONFIG_IF_LINE.match(line):
 >> +            skip = False
 >> +
 >
 >
 > I do not think this is the right fix.
 > There are similar patterns where
 > config references are ignored.
 >
 > For example, FOO and BAR are ignored
 > in the following cases.
 >
 > ex1)
 >
 > choice
 >           prompt "foo"
 >           default FOO
 >
 >
 >
 > ex2)
 >
 > menu "bar"
 >            depends on BAR
 >
 >
 >
 >
 > The help block ends with shallower indentation.
So IIUC we need to measure the indentation when we encounter a help
statement and in the next lines look for a line with a different depth
(which is not an empty line because these are allowed).
 >
 >
 >
 >
 >>          if REGEX_KCONFIG_DEF.match(line):
 >>              symbol_def = REGEX_KCONFIG_DEF.findall(line)
 >>              defined.append(symbol_def[0])
 >> --
 >> 2.25.1
 >>
 >
 >
 > --
 > Best Regards
 > Masahiro Yamada

Thanks for your time!

Ariel Marcovitch
Masahiro Yamada Aug. 29, 2021, 11:41 p.m. UTC | #3
On Sun, Aug 29, 2021 at 10:18 PM Ariel Marcovitch
<arielmarcovitch@gmail.com> wrote:
>
> Hello again!
>
> On 24/08/2021 16:30, Masahiro Yamada wrote:
>  > On Mon, Aug 23, 2021 at 4:22 AM Ariel Marcovitch
>  > <arielmarcovitch@gmail.com> wrote:
>  >>
>  >> When parsing Kconfig files to find symbol definitions and references,
>  >> lines after a 'help' line are skipped until a new config definition
>  >> starts.
>  >>
>  >> However, it is quite common to define a config and then make some other
>  >> configs depend on it by adding an 'if' line. This kind of kconfig
>  >> statement usually appears after a config definition which might contain
>  >> a 'help' section. The 'if' line is skipped in parse_kconfig_file()
>  >> because it is not a config definition.
>  >>
>  >> This means that symbols referenced in this kind of statements are
>  >> ignored by this function and thus are not considered undefined
>  >> references in case the symbol is not defined.
>  >>
>  >> The REGEX_KCONFIG_STMT regex can't be used because the other types of
>  >> statements can't break help lines.
>  >>
>  >> Define a new regex for matching 'if' statements and stop the 'help'
>  >> skipping in case it is encountered.
>  >>
>  >> Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
>  >> ---
>  >>  scripts/checkkconfigsymbols.py | 8 +++++++-
>  >>  1 file changed, 7 insertions(+), 1 deletion(-)
>  >>
>  >> diff --git a/scripts/checkkconfigsymbols.py
> b/scripts/checkkconfigsymbols.py
>  >> index b9b0f15e5880..875e9a2c14b2 100755
>  >> --- a/scripts/checkkconfigsymbols.py
>  >> +++ b/scripts/checkkconfigsymbols.py
>  >> @@ -26,6 +26,7 @@ EXPR = r"(?:" + OPERATORS + r"|\s|" + SYMBOL + r")+"
>  >>  DEFAULT = r"default\s+.*?(?:if\s.+){,1}"
>  >>  STMT = r"^\s*(?:if|select|imply|depends\s+on|(?:" + DEFAULT +
> r"))\s+" + EXPR
>  >>  SOURCE_SYMBOL = r"(?:\W|\b)+[D]{,1}CONFIG_(" + SYMBOL + r")"
>  >> +IF_LINE = r"^\s*(?:if)\s+" + EXPR
>  >
>  >
>  > Why is it enclosed by "(?: )"   ?
>  >
>  > "(?:if)"  seems to the same as "if"
> Oh you are absolutely right.
> I just mindlessly copied the STMT regex and removed the other types :)
>  >
>  >
>  >
>  >
>  >
>  >
>  >>
>  >>  # regex objects
>  >>  REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
>  >> @@ -35,11 +36,11 @@ REGEX_KCONFIG_DEF = re.compile(DEF)
>  >>  REGEX_KCONFIG_EXPR = re.compile(EXPR)
>  >>  REGEX_KCONFIG_STMT = re.compile(STMT)
>  >>  REGEX_KCONFIG_HELP = re.compile(r"^\s+help\s*$")
>  >> +REGEX_KCONFIG_IF_LINE = re.compile(IF_LINE)
>  >>  REGEX_FILTER_SYMBOLS = re.compile(r"[A-Za-z0-9]$")
>  >>  REGEX_NUMERIC = re.compile(r"0[xX][0-9a-fA-F]+|[0-9]+")
>  >>  REGEX_QUOTES = re.compile("(\"(.*?)\")")
>  >>
>  >> -
>  >>  def parse_options():
>  >>      """The user interface of this module."""
>  >>      usage = "Run this tool to detect Kconfig symbols that are
> referenced but " \
>  >> @@ -445,6 +446,11 @@ def parse_kconfig_file(kfile):
>  >>          line = line.strip('\n')
>  >>          line = line.split("#")[0]  # ignore comments
>  >>
>  >> +        # 'if EXPR' lines can be after help lines
>  >> +        # The if line itself is handled later
>  >> +        if REGEX_KCONFIG_IF_LINE.match(line):
>  >> +            skip = False
>  >> +
>  >
>  >
>  > I do not think this is the right fix.
>  > There are similar patterns where
>  > config references are ignored.
>  >
>  > For example, FOO and BAR are ignored
>  > in the following cases.
>  >
>  > ex1)
>  >
>  > choice
>  >           prompt "foo"
>  >           default FOO
>  >
>  >
>  >
>  > ex2)
>  >
>  > menu "bar"
>  >            depends on BAR
>  >
>  >
>  >
>  >
>  > The help block ends with shallower indentation.
> So IIUC we need to measure the indentation when we encounter a help
> statement and in the next lines look for a line with a different depth
> (which is not an empty line because these are allowed).



If you want to implement it precisely, yes.


Or, if you want to adopt a simpler
solution, detect the following keywords.

comment
if
menu
choice


This is not precise, but it will work
in most cases.



In the following example, the first 'menu'
is just a comment.
The second 'menu' is a keyword since it has
a shallower indentation.



    help
       blah blah
       menu blah blah
       blah blah
  menu "menu prompt"
     depends on FOO












--
Best Regards
Masahiro Yamada
Ariel Marcovitch Sept. 1, 2021, 3:17 p.m. UTC | #4
Hi again!


On 30/08/2021 2:41, Masahiro Yamada wrote:
 > On Sun, Aug 29, 2021 at 10:18 PM Ariel Marcovitch
 > <arielmarcovitch@gmail.com> wrote:
 >>
 >> Hello again!
 >>
 >> On 24/08/2021 16:30, Masahiro Yamada wrote:
 >>  > On Mon, Aug 23, 2021 at 4:22 AM Ariel Marcovitch
 >>  > <arielmarcovitch@gmail.com> wrote:
 >>  >>
 >>  >> When parsing Kconfig files to find symbol definitions and 
references,
 >>  >> lines after a 'help' line are skipped until a new config definition
 >>  >> starts.
 >>  >>
 >>  >> However, it is quite common to define a config and then make 
some other
 >>  >> configs depend on it by adding an 'if' line. This kind of kconfig
 >>  >> statement usually appears after a config definition which might 
contain
 >>  >> a 'help' section. The 'if' line is skipped in parse_kconfig_file()
 >>  >> because it is not a config definition.
 >>  >>
 >>  >> This means that symbols referenced in this kind of statements are
 >>  >> ignored by this function and thus are not considered undefined
 >>  >> references in case the symbol is not defined.
 >>  >>
 >>  >> The REGEX_KCONFIG_STMT regex can't be used because the other 
types of
 >>  >> statements can't break help lines.
 >>  >>
 >>  >> Define a new regex for matching 'if' statements and stop the 'help'
 >>  >> skipping in case it is encountered.
 >>  >>
 >>  >> Signed-off-by: Ariel Marcovitch <arielmarcovitch@gmail.com>
 >>  >> ---
 >>  >>  scripts/checkkconfigsymbols.py | 8 +++++++-
 >>  >>  1 file changed, 7 insertions(+), 1 deletion(-)
 >>  >>
 >>  >> diff --git a/scripts/checkkconfigsymbols.py
 >> b/scripts/checkkconfigsymbols.py
 >>  >> index b9b0f15e5880..875e9a2c14b2 100755
 >>  >> --- a/scripts/checkkconfigsymbols.py
 >>  >> +++ b/scripts/checkkconfigsymbols.py
 >>  >> @@ -26,6 +26,7 @@ EXPR = r"(?:" + OPERATORS + r"|\s|" + SYMBOL + 
r")+"
 >>  >>  DEFAULT = r"default\s+.*?(?:if\s.+){,1}"
 >>  >>  STMT = r"^\s*(?:if|select|imply|depends\s+on|(?:" + DEFAULT +
 >> r"))\s+" + EXPR
 >>  >>  SOURCE_SYMBOL = r"(?:\W|\b)+[D]{,1}CONFIG_(" + SYMBOL + r")"
 >>  >> +IF_LINE = r"^\s*(?:if)\s+" + EXPR
 >>  >
 >>  >
 >>  > Why is it enclosed by "(?: )"   ?
 >>  >
 >>  > "(?:if)"  seems to the same as "if"
 >> Oh you are absolutely right.
 >> I just mindlessly copied the STMT regex and removed the other types :)
 >>  >
 >>  >
 >>  >
 >>  >
 >>  >
 >>  >
 >>  >>
 >>  >>  # regex objects
 >>  >>  REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
 >>  >> @@ -35,11 +36,11 @@ REGEX_KCONFIG_DEF = re.compile(DEF)
 >>  >>  REGEX_KCONFIG_EXPR = re.compile(EXPR)
 >>  >>  REGEX_KCONFIG_STMT = re.compile(STMT)
 >>  >>  REGEX_KCONFIG_HELP = re.compile(r"^\s+help\s*$")
 >>  >> +REGEX_KCONFIG_IF_LINE = re.compile(IF_LINE)
 >>  >>  REGEX_FILTER_SYMBOLS = re.compile(r"[A-Za-z0-9]$")
 >>  >>  REGEX_NUMERIC = re.compile(r"0[xX][0-9a-fA-F]+|[0-9]+")
 >>  >>  REGEX_QUOTES = re.compile("(\"(.*?)\")")
 >>  >>
 >>  >> -
 >>  >>  def parse_options():
 >>  >>      """The user interface of this module."""
 >>  >>      usage = "Run this tool to detect Kconfig symbols that are
 >> referenced but " \
 >>  >> @@ -445,6 +446,11 @@ def parse_kconfig_file(kfile):
 >>  >>          line = line.strip('\n')
 >>  >>          line = line.split("#")[0]  # ignore comments
 >>  >>
 >>  >> +        # 'if EXPR' lines can be after help lines
 >>  >> +        # The if line itself is handled later
 >>  >> +        if REGEX_KCONFIG_IF_LINE.match(line):
 >>  >> +            skip = False
 >>  >> +
 >>  >
 >>  >
 >>  > I do not think this is the right fix.
 >>  > There are similar patterns where
 >>  > config references are ignored.
 >>  >
 >>  > For example, FOO and BAR are ignored
 >>  > in the following cases.
 >>  >
 >>  > ex1)
 >>  >
 >>  > choice
 >>  >           prompt "foo"
 >>  >           default FOO
 >>  >
 >>  >
 >>  >
 >>  > ex2)
 >>  >
 >>  > menu "bar"
 >>  >            depends on BAR
 >>  >
 >>  >
 >>  >
 >>  >
 >>  > The help block ends with shallower indentation.
 >> So IIUC we need to measure the indentation when we encounter a help
 >> statement and in the next lines look for a line with a different depth
 >> (which is not an empty line because these are allowed).
 >
 >
 >
 > If you want to implement it precisely, yes.
 >
 >
 > Or, if you want to adopt a simpler
 > solution, detect the following keywords.
 >
 > comment
 > if
 > menu
 > choice

Actually, it seems that all statements are legal in this context.

So we can just use the STMT regex!

It does require reshuffling the logic there a little but this should
do.

 >
 >
 >
 > This is not precise, but it will work
 > in most cases.
 >
 >
 >
 > In the following example, the first 'menu'
 > is just a comment.
 > The second 'menu' is a keyword since it has
 > a shallower indentation.
 >
 >
 >
 >     help
 >        blah blah
 >        menu blah blah
 >        blah blah
 >   menu "menu prompt"
 >      depends on FOO
 >
 >

Yeah this will probably drive the parser crazy (even without my
changes, although my changes might make it worse).

But the indentation solution is kinda nasty.

Do you think the STMT regex check is enough or should I handle the
cases you mentioned with the indentation check?


 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
 > --
 > Best Regards
 > Masahiro Yamada

Thanks again for your time :)
Ariel Marcovitch
diff mbox series

Patch

diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
index b9b0f15e5880..875e9a2c14b2 100755
--- a/scripts/checkkconfigsymbols.py
+++ b/scripts/checkkconfigsymbols.py
@@ -26,6 +26,7 @@  EXPR = r"(?:" + OPERATORS + r"|\s|" + SYMBOL + r")+"
 DEFAULT = r"default\s+.*?(?:if\s.+){,1}"
 STMT = r"^\s*(?:if|select|imply|depends\s+on|(?:" + DEFAULT + r"))\s+" + EXPR
 SOURCE_SYMBOL = r"(?:\W|\b)+[D]{,1}CONFIG_(" + SYMBOL + r")"
+IF_LINE = r"^\s*(?:if)\s+" + EXPR
 
 # regex objects
 REGEX_FILE_KCONFIG = re.compile(r".*Kconfig[\.\w+\-]*$")
@@ -35,11 +36,11 @@  REGEX_KCONFIG_DEF = re.compile(DEF)
 REGEX_KCONFIG_EXPR = re.compile(EXPR)
 REGEX_KCONFIG_STMT = re.compile(STMT)
 REGEX_KCONFIG_HELP = re.compile(r"^\s+help\s*$")
+REGEX_KCONFIG_IF_LINE = re.compile(IF_LINE)
 REGEX_FILTER_SYMBOLS = re.compile(r"[A-Za-z0-9]$")
 REGEX_NUMERIC = re.compile(r"0[xX][0-9a-fA-F]+|[0-9]+")
 REGEX_QUOTES = re.compile("(\"(.*?)\")")
 
-
 def parse_options():
     """The user interface of this module."""
     usage = "Run this tool to detect Kconfig symbols that are referenced but " \
@@ -445,6 +446,11 @@  def parse_kconfig_file(kfile):
         line = line.strip('\n')
         line = line.split("#")[0]  # ignore comments
 
+        # 'if EXPR' lines can be after help lines
+        # The if line itself is handled later
+        if REGEX_KCONFIG_IF_LINE.match(line):
+            skip = False
+
         if REGEX_KCONFIG_DEF.match(line):
             symbol_def = REGEX_KCONFIG_DEF.findall(line)
             defined.append(symbol_def[0])