riff on name

corpus arguments

Corpus switches control XML data for output to a local search engine.

corpus.article Prefer the content of <ARTICLE> when gathering corpus text.
corpus.body Prefer the content of <BODY> when gathering corpus text. This is the default.
corpus.main Prefer the content of <MAIN> when gathering corpus text.
corpus.output file Dump XML corpus of site into file. This is intended for use by a local search engine. If none of --corpus.article, --corpus.body, or --corpus.main are specified, the content of <BODY> is used. If more than one are specified, then the text collected depends on a page’s content. This is incompatible with --shadow.update.

Dylan Harris
December 2024