Message ID | 20221031074317.377366-1-sw@weilnetz.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] Add new build target 'check-spelling' | expand |
On 31/10/2022 08.43, Stefan Weil wrote: > `make check-spelling` can now be used to get a list of spelling errors. > It uses the latest version of codespell, a spell checker implemented in Python. > > Signed-off-by: Stefan Weil <sw@weilnetz.de> > --- > > This RFC can already be used for manual tests, but still reports false > positives, mostly because some variable names are interpreted as words. > These words can either be ignored in the check, or in some cases the code > might be changed to use different variable names. > > The check currently only skips a few directories and files, so for example > checked out submodules are also checked. > > The rule can be extended to allow user provided ignore and skip lists, > for example by introducing Makefile variables CODESPELL_SKIP=userfile > or CODESPELL_IGNORE=userfile. A limited check could be implemented by > providing a base directory CODESPELL_START=basedirectory, for example > CODESPELL_START=docs. > > Regards, > Stefan > > tests/Makefile.include | 10 ++++++++++ > tests/codespell/README.rst | 18 ++++++++++++++++++ > tests/codespell/exclude-file | 3 +++ > tests/codespell/ignore-words | 19 +++++++++++++++++++ > tests/requirements.txt | 1 + > 5 files changed, 51 insertions(+) > create mode 100644 tests/codespell/README.rst > create mode 100644 tests/codespell/exclude-file > create mode 100644 tests/codespell/ignore-words > > diff --git a/tests/Makefile.include b/tests/Makefile.include > index 9422ddaece..b9daeda932 100644 > --- a/tests/Makefile.include > +++ b/tests/Makefile.include > @@ -155,6 +155,16 @@ check-acceptance-deprecated-warning: > > check-acceptance: check-acceptance-deprecated-warning | check-avocado > > +.PHONY: check-spelling > +CODESPELL_DIR=tests/codespell > +check-spelling: check-venv > + source $(TESTS_VENV_DIR)/bin/activate && \ > + cd "$(SRC_PATH)" && \ > + codespell -s . \ > + --exclude-file=$(CODESPELL_DIR)/exclude-file \ > + --ignore-words=$(CODESPELL_DIR)/ignore-words \ > + --skip="./.git,./bin,./build,./linux-headers,*.patch,nohup.out" I like the idea, but I think it's unlikely that we can make this work for the whole source tree any time soon. So maybe it makes more sense to start with some few directories first (e.g. docs/ ) and then the maintainers can opt-in by cleaning up their directories first and then by adding their directories to this target here? Thomas
Am 31.10.22 um 08:52 schrieb Thomas Huth: > On 31/10/2022 08.43, Stefan Weil wrote: >> `make check-spelling` can now be used to get a list of spelling errors. >> It uses the latest version of codespell, a spell checker implemented >> in Python. >> >> Signed-off-by: Stefan Weil <sw@weilnetz.de> >> --- >> >> This RFC can already be used for manual tests, but still reports false >> positives, mostly because some variable names are interpreted as words. >> These words can either be ignored in the check, or in some cases the >> code >> might be changed to use different variable names. >> >> The check currently only skips a few directories and files, so for >> example >> checked out submodules are also checked. >> >> The rule can be extended to allow user provided ignore and skip lists, >> for example by introducing Makefile variables CODESPELL_SKIP=userfile >> or CODESPELL_IGNORE=userfile. A limited check could be implemented by >> providing a base directory CODESPELL_START=basedirectory, for example >> CODESPELL_START=docs. >> >> Regards, >> Stefan [...] >> I like the idea, but I think it's unlikely that we can make this work >> for the whole source tree any time soon. So maybe it makes more sense >> to start with some few directories first (e.g. docs/ ) and then the >> maintainers can opt-in by cleaning up their directories first and >> then by adding their directories to this target here? > > Thomas Even without implementing CODESPELL_START as described above, the script can already be used and integrated into CI scripts. It takes about 60 seconds to check the whole source tree including submodules on my (slow) virtual machine. The resulting output has about 20000 lines or 1272 KiB. It can be filtered for relevant parts of the source tree or used for a summary. Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | sed s/:.*// | sort | uniq -c This produces a summary for the top level hierarchy of files and directories: 3 accel 1 audio 1 backends 77 block 7 block.c 20 bsd-user 386 capstone 12 chardev 1 configure 8 contrib 6 crypto 64 disas 32 docs 31 dtc 8 fpu 1 gdbstub 1 gdb-xml 1 .github 537 hw 7 inc 114 include 1 libdecnumber 33 linux-user 1 MAINTAINERS 150 meson 6 meson.build 16 migration 1 nbd 5 net 12 pc-bios 7 python 3 qapi 2 qemu 5 qemu-options.hx 22 qga 14175 roms 43 scripts 3 semihosting 18 slirp 2 softmmu 59 subprojects 504 target 6 tcg 3 test.rb 175 tests 6 tools 20 ui 8 util It shows that "roms" contributes by far the most typos. Omitting it would reduce the required time to 22 seconds and the number of typos found (2947 lines in output) very much. "capstone" (which has no entry in MAINTAINERS), "target" and "hw" also contribute more than 300 hits each, therefore cc'ing Richard. Stefan
On 31/10/2022 11.44, Stefan Weil wrote: > Am 31.10.22 um 08:52 schrieb Thomas Huth: > >> On 31/10/2022 08.43, Stefan Weil wrote: >>> `make check-spelling` can now be used to get a list of spelling errors. >>> It uses the latest version of codespell, a spell checker implemented in >>> Python. >>> >>> Signed-off-by: Stefan Weil <sw@weilnetz.de> >>> --- >>> >>> This RFC can already be used for manual tests, but still reports false >>> positives, mostly because some variable names are interpreted as words. >>> These words can either be ignored in the check, or in some cases the code >>> might be changed to use different variable names. >>> >>> The check currently only skips a few directories and files, so for example >>> checked out submodules are also checked. >>> >>> The rule can be extended to allow user provided ignore and skip lists, >>> for example by introducing Makefile variables CODESPELL_SKIP=userfile >>> or CODESPELL_IGNORE=userfile. A limited check could be implemented by >>> providing a base directory CODESPELL_START=basedirectory, for example >>> CODESPELL_START=docs. >>> >>> Regards, >>> Stefan > [...] >>> I like the idea, but I think it's unlikely that we can make this work for >>> the whole source tree any time soon. So maybe it makes more sense to >>> start with some few directories first (e.g. docs/ ) and then the >>> maintainers can opt-in by cleaning up their directories first and then by >>> adding their directories to this target here? >> >> Thomas > > > Even without implementing CODESPELL_START as described above, the script can > already be used and integrated into CI scripts. > > It takes about 60 seconds to check the whole source tree including > submodules on my (slow) virtual machine. > > The resulting output has about 20000 lines or 1272 KiB. It can be filtered > for relevant parts of the source tree or used for a summary. > > Sample script: grep "^[.]" spellcheck.log | sed s/^..// | sed 's/\/.*//' | > sed s/:.*// | sort | uniq -c > > This produces a summary for the top level hierarchy of files and directories: > > 3 accel > 1 audio > 1 backends > 77 block > 7 block.c > 20 bsd-user > 386 capstone > 12 chardev > 1 configure > 8 contrib > 6 crypto > 64 disas > 32 docs > 31 dtc > 8 fpu > 1 gdbstub > 1 gdb-xml > 1 .github > 537 hw > 7 inc > 114 include > 1 libdecnumber > 33 linux-user > 1 MAINTAINERS > 150 meson > 6 meson.build > 16 migration > 1 nbd > 5 net > 12 pc-bios > 7 python > 3 qapi > 2 qemu > 5 qemu-options.hx > 22 qga > 14175 roms > 43 scripts > 3 semihosting > 18 slirp > 2 softmmu > 59 subprojects > 504 target > 6 tcg > 3 test.rb > 175 tests > 6 tools > 20 ui > 8 util > > It shows that "roms" contributes by far the most typos. Omitting it would > reduce the required time to 22 seconds and the number of typos found (2947 > lines in output) very much. "roms" mostly consists of third-party submodules that we do not have direct control of. I think this should definitely be omitted. > "capstone" (which has no entry in MAINTAINERS) That's likely because it has been a submodule that has been removed a while ago. "rm -rf capstone" should solve that issue on your local buildtree ;-) (yes, that's another nuisance of submodules - the checked out files don't go away when the submodule gets removed) Thomas
On Mon, Oct 31, 2022 at 11:44:48AM +0100, Stefan Weil via wrote: > Am 31.10.22 um 08:52 schrieb Thomas Huth: > > > On 31/10/2022 08.43, Stefan Weil wrote: > > > `make check-spelling` can now be used to get a list of spelling errors. > > > It uses the latest version of codespell, a spell checker implemented > > > in Python. > > > > > > Signed-off-by: Stefan Weil <sw@weilnetz.de> > > > --- > > > > > > This RFC can already be used for manual tests, but still reports false > > > positives, mostly because some variable names are interpreted as words. > > > These words can either be ignored in the check, or in some cases the > > > code > > > might be changed to use different variable names. > > > > > > The check currently only skips a few directories and files, so for > > > example > > > checked out submodules are also checked. > > > > > > The rule can be extended to allow user provided ignore and skip lists, > > > for example by introducing Makefile variables CODESPELL_SKIP=userfile > > > or CODESPELL_IGNORE=userfile. A limited check could be implemented by > > > providing a base directory CODESPELL_START=basedirectory, for example > > > CODESPELL_START=docs. > > > > > > Regards, > > > Stefan > [...] > > > I like the idea, but I think it's unlikely that we can make this > > > work for the whole source tree any time soon. So maybe it makes more > > > sense to start with some few directories first (e.g. docs/ ) and > > > then the maintainers can opt-in by cleaning up their directories > > > first and then by adding their directories to this target here? > > > > Thomas > > > Even without implementing CODESPELL_START as described above, the script can > already be used and integrated into CI scripts. To get most value from CI, we strongly prefer the test to be a clear pass/fail. We do have some jobs that are marked non-gating, since they have false failures and need manual inspection of results. The effect is those jobs are largely ignored by everyone, so not really of significant benefit. So I'd agree with Thomas about starting with a config that can get a clear pass/fail, and expanding from there if we can't get the full-tree clean from the start. > It shows that "roms" contributes by far the most typos. Omitting it would > reduce the required time to 22 seconds and the number of typos found (2947 > lines in output) very much. We should not look at 'roms' at all since it is just a place for git submodulees, and build system integration. No interesting code lives there. > "capstone" (which has no entry in MAINTAINERS), "target" and "hw" also > contribute more than 300 hits each, therefore cc'ing Richard. We should completely ignoring 'capstone' and any other git submodule as those are 3rd party codebases we don't maintain ourselves. With regards, Daniel
On 31/10/22 08:43, Stefan Weil via wrote: > `make check-spelling` can now be used to get a list of spelling errors. > It uses the latest version of codespell, a spell checker implemented in Python. > > Signed-off-by: Stefan Weil <sw@weilnetz.de> > --- > > This RFC can already be used for manual tests, but still reports false > positives, mostly because some variable names are interpreted as words. > These words can either be ignored in the check, or in some cases the code > might be changed to use different variable names. > > The check currently only skips a few directories and files, so for example > checked out submodules are also checked. > > The rule can be extended to allow user provided ignore and skip lists, > for example by introducing Makefile variables CODESPELL_SKIP=userfile > or CODESPELL_IGNORE=userfile. A limited check could be implemented by > providing a base directory CODESPELL_START=basedirectory, for example > CODESPELL_START=docs. > > Regards, > Stefan > > tests/Makefile.include | 10 ++++++++++ > tests/codespell/README.rst | 18 ++++++++++++++++++ > tests/codespell/exclude-file | 3 +++ > tests/codespell/ignore-words | 19 +++++++++++++++++++ > tests/requirements.txt | 1 + > 5 files changed, 51 insertions(+) > create mode 100644 tests/codespell/README.rst > create mode 100644 tests/codespell/exclude-file > create mode 100644 tests/codespell/ignore-words Just wondering about this list... > +++ b/tests/codespell/ignore-words > @@ -0,0 +1,19 @@ > +buid What is 'buid'? PPC-specific apparently. > +busses > +dout > +falt > +fpr > +hace > +hax > +hda > +nd Apparently 'NIC info'... > +ot Is 'ot' MemOp? > +pard > +parm > +ptd > +ser > +som > +synopsys > +te Is that 'target endianness'? > +toke Where is 'toke'? > +ue Where is 'ue'?
Below I added some examples for the words which are currently ignored by codespell. Am 31.10.22 um 16:40 schrieb Philippe Mathieu-Daudé: > On 31/10/22 08:43, Stefan Weil via wrote: >> `make check-spelling` can now be used to get a list of spelling errors. >> It uses the latest version of codespell, a spell checker implemented >> in Python. >> >> Signed-off-by: Stefan Weil <sw@weilnetz.de> >> --- >> >> This RFC can already be used for manual tests, but still reports false >> positives, mostly because some variable names are interpreted as words. >> These words can either be ignored in the check, or in some cases the >> code >> might be changed to use different variable names. >> >> The check currently only skips a few directories and files, so for >> example >> checked out submodules are also checked. >> >> The rule can be extended to allow user provided ignore and skip lists, >> for example by introducing Makefile variables CODESPELL_SKIP=userfile >> or CODESPELL_IGNORE=userfile. A limited check could be implemented by >> providing a base directory CODESPELL_START=basedirectory, for example >> CODESPELL_START=docs. >> >> Regards, >> Stefan >> >> tests/Makefile.include | 10 ++++++++++ >> tests/codespell/README.rst | 18 ++++++++++++++++++ >> tests/codespell/exclude-file | 3 +++ >> tests/codespell/ignore-words | 19 +++++++++++++++++++ >> tests/requirements.txt | 1 + >> 5 files changed, 51 insertions(+) >> create mode 100644 tests/codespell/README.rst >> create mode 100644 tests/codespell/exclude-file >> create mode 100644 tests/codespell/ignore-words > > Just wondering about this list... > >> +++ b/tests/codespell/ignore-words >> @@ -0,0 +1,19 @@ >> +buid > > What is 'buid'? PPC-specific apparently. hw/ppc/spapr_pci.c:SpaprPhbState *spapr_pci_find_phb(SpaprMachineState *spapr, uint64_t buid) include/hw/ppc/xics.h: * We currently only support one BUID which is our interrupt base [...] >> +busses >> +dout >> +falt >> +fpr >> +hace >> +hax >> +hda >> +nd > > Apparently 'NIC info'... hw/arm/aspeed.c: NICInfo *nd = &nd_table[0]; hw/display/macfb.c: NubusDevice *nd = NUBUS_DEVICE(s); [...] >> +ot > > Is 'ot' MemOp? target/i386/tcg/decode-new.c.inc:static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp *ot) [...] >> +pard >> +parm >> +ptd >> +ser >> +som >> +synopsys >> +te > > Is that 'target endianness'? accel/tcg/cputlb.c: * @te: pointer to CPUTLBEntry hw/audio/cs4231a.c:#define TE (1 << 6) [...] >> +toke > > Where is 'toke'? This one is no longer needed. It was used in the old capstone code which I still had in my local sources. >> +ue > Where is 'ue'? tests/tcg/i386/test-i386-fp-exceptions.c:#define UE (1 << 4) tests/unit/test-keyval.c: qdict = keyval_parse("val,,ue", "implied", NULL, &err); [...] I simply had added some examples of "words" which occurred often and which were reported by codespell as typos. These "typos" occur at least 10 times (list produced with `grep "^[a-z]" codespell.log | sort -n +1`): statics 10 regiser 11 usig 11 inh 12 tne 12 overriden 13 inactivate 15 upto 15 hsa 16 useable 17 daa 18 crate 21 endianess 22 olt 22 sring 23 vill 25 keypairs 35 gir 46 sav 47 asign 120 inflight 191 Some of them are real typos, others like aSign or statics are variable names and should be ignored, too. Stefan
diff --git a/tests/Makefile.include b/tests/Makefile.include index 9422ddaece..b9daeda932 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -155,6 +155,16 @@ check-acceptance-deprecated-warning: check-acceptance: check-acceptance-deprecated-warning | check-avocado +.PHONY: check-spelling +CODESPELL_DIR=tests/codespell +check-spelling: check-venv + source $(TESTS_VENV_DIR)/bin/activate && \ + cd "$(SRC_PATH)" && \ + codespell -s . \ + --exclude-file=$(CODESPELL_DIR)/exclude-file \ + --ignore-words=$(CODESPELL_DIR)/ignore-words \ + --skip="./.git,./bin,./build,./linux-headers,*.patch,nohup.out" + # Consolidated targets .PHONY: check check-clean get-vm-images diff --git a/tests/codespell/README.rst b/tests/codespell/README.rst new file mode 100644 index 0000000000..67e070d631 --- /dev/null +++ b/tests/codespell/README.rst @@ -0,0 +1,18 @@ +============================= +Check spelling with codespell +============================= + +`make check-spelling` can be used to get a list of spelling errors. +It reports files with spelling errors and a summary of all misspelled words. +The report is generated by the latest version of codespell, a spell checker +implemented in Python. + +See https://github.com/codespell-project/codespell for more information. + +Some file patterns are excluded from the check. + +In addition tests/codespell includes several files which are used to +suppress certain false positives in the codespell report. + +exclude-file - complete lines which should be ignored +ignore-words - list of words which should be ignored diff --git a/tests/codespell/exclude-file b/tests/codespell/exclude-file new file mode 100644 index 0000000000..57de81a4eb --- /dev/null +++ b/tests/codespell/exclude-file @@ -0,0 +1,3 @@ + * VAS controller. +number generator daemon such as the one found in the vhost-device crate of +introspection. The latter can conceivably confuse clients, so tread diff --git a/tests/codespell/ignore-words b/tests/codespell/ignore-words new file mode 100644 index 0000000000..4d336a2f44 --- /dev/null +++ b/tests/codespell/ignore-words @@ -0,0 +1,19 @@ +buid +busses +dout +falt +fpr +hace +hax +hda +nd +ot +pard +parm +ptd +ser +som +synopsys +te +toke +ue diff --git a/tests/requirements.txt b/tests/requirements.txt index 0ba561b6bd..dd44e6768f 100644 --- a/tests/requirements.txt +++ b/tests/requirements.txt @@ -4,3 +4,4 @@ # Note that qemu.git/python/ is always implicitly installed. avocado-framework==88.1 pycdlib==1.11.0 +codespell
`make check-spelling` can now be used to get a list of spelling errors. It uses the latest version of codespell, a spell checker implemented in Python. Signed-off-by: Stefan Weil <sw@weilnetz.de> --- This RFC can already be used for manual tests, but still reports false positives, mostly because some variable names are interpreted as words. These words can either be ignored in the check, or in some cases the code might be changed to use different variable names. The check currently only skips a few directories and files, so for example checked out submodules are also checked. The rule can be extended to allow user provided ignore and skip lists, for example by introducing Makefile variables CODESPELL_SKIP=userfile or CODESPELL_IGNORE=userfile. A limited check could be implemented by providing a base directory CODESPELL_START=basedirectory, for example CODESPELL_START=docs. Regards, Stefan tests/Makefile.include | 10 ++++++++++ tests/codespell/README.rst | 18 ++++++++++++++++++ tests/codespell/exclude-file | 3 +++ tests/codespell/ignore-words | 19 +++++++++++++++++++ tests/requirements.txt | 1 + 5 files changed, 51 insertions(+) create mode 100644 tests/codespell/README.rst create mode 100644 tests/codespell/exclude-file create mode 100644 tests/codespell/ignore-words