corpus arguments
Corpus switches control XML data for output to a local search engine.
corpus.article | Prefer the content of <ARTICLE> when gathering corpus text. |
corpus.body | Prefer the content of <BODY> when gathering corpus text. This is the default. |
corpus.main | Prefer the content of <MAIN> when gathering corpus text. |
corpus.output file | Dump XML corpus of site into file. This is intended for use by a local search engine. If none of --corpus.article, --corpus.body, or --corpus.main are specified, the content of <BODY> is used. If more than one are specified, then the text collected depends on a page’s content. This is incompatible with --shadow.update. |
Dylan Harris
December 2024