site

This panel, which is part of the Settings Dialogue, allows you to provide important details regarding the website to be analysed.

Default Root Directory

Here you specify the root, the topmost directory, of the site you want to anaylse. This directory contains the source HTML, and probably many other files, that make up the website.

Websites (normally) have domains, often more than one domain. You can specify those here. For example, the domains for SSC are ssc.lu and www.ssc.lu.

Some websites also have virtual directories, which you can specify in the virtual and validation panel.

checkboxen

There are a number of options associated with link checking.

check links

SSC can validate links found on your site. It verifies that a link points to a page that exists, and that the link is good (for example, a link to a valid page that refers to a HIDDEN element is illegal).

Check external links

Select this option if you want to test external site links.

Report example domains

Certain domain names are defined to be used in sample text, and should never be used in a live page, so their use on a live web page is an error. For example, your web page should never actually reference example.com.

Report forwards

When SSC checks an external domain, that domain may reply with a forward message, indicating that the page has been forwarded to another place. This is often a temporary measure, for example when a site is being refreshed, but may be a permament change. You’ll have to use your best judgement to decide which is which.

Report local domains

A number of domain names are actually local to your network, by convention. Select this option to have SSC report them.

Report special domains

Certain domain names are special, and should never be used. Again, SSC can report if it finds them on the website.

Test an external link once

If you select this option, then SSC will keep track of which links it has checked as it analyses a site, and, if it encounters that link again, it won’t repeat the check.

This can be rather important. If you have a large site, and many pages reference another site many times, as mine does at the bottom of each page with a reference to github.org, then having SSC check that link on each page risks the linked site concluding that you are conducted a Denial Of Service attack. If that happens, those link checks are blocked, and SSC thinks the page has a bad link.

Check crosslinked IDs

Links between pages on your website might include IDs, which reference particular points on a webpage. SSC can check whether that ID is valid on that page.

Test revocation

Links to external websites usually use the https protocol. This depends on the public key infrastructure to secure access to that website. This is achived by the use of certificates. Sometimes, something can go wrong somewhere, which means a certificate has to be revoked. Select this option to have SSC check for revocation for https links.

This option can reduce performance.

Output search engine corpus

SSC can output an XML file containing important content from the website, for use in contructing data used by local search engines.

If you decide to do so, you must specify the name of the output file.

You should also specify which element SSC should use to determine the content to export, from the choice of <ARTICLE>, <BODY> and <MAIN>.