SANH Documentation

Basic Search

The most basic search can be made simply by putting any text string into the search bar, such as sa-an-ah-mi and pressing the Enter key or clicking the magnifying glass icon. Spaces are respected in searches, to ensure that multi-word searches such as sa-an-ah-mi GIM are possible.

Advanced Search Options

Strict Characters

By default, "strict character" searches are disabled. This is to facilitate searches where the user is unconcerned with differences, for example, between a and á, as well as to avoid the annoyance of utilizing characters such as ḫ, which have no functional differentiation from h in Hittite studies. Some characters for which there are no good equivalents (i.e. the single-character elipse …, and the two varieties of glossenkeilen 𒑱 , 𒀹) must still be entered exactly. This is made easier by the virtual keyboard below the search bar. The normalization of characters provides the following correspondences:

Plain Character	Normalized Correspondences
h	ḫ
s	s, š, ṣ
a	a, á, à
i	i, í, ì
u	u, ú, ù
e	e, é, è
t	t, ṭ
'	', ʾ

Warning

When using special characters, you MUST enable strict character searching for accurate results.

Should a user require the differentiation of characters having accent marks or other diacritics, the Strict characters toggle may be activated. Keep in mind that activating this setting also requires the use of ḫ, made easier by a small menu offering virtual input of some of the most specific Hittite-studies characters available directly below the search bar, indicated by a keyboard icon. Visually identical characters like š (composed character) are always normalized to š (single Unicode character - the corpus default) in order to return proper results, so that users may compose characters freely.

Tip

Regardless of strict character setting, searches like ka4 are configured to always normalize and return ka₄. Likewise, searching for fractions like 1/2 will always correctly return ½.

Case sensitive

This, quite self-explanitorily, toggles case sensitivity. This is especially useful for isolating Akkado- and Sumerograms.

Search Language

The available search languages are:

Akkadian
Hattian
Hittite
Hurrian
Kalašmaic
Luwian
Palaic
Sumerian

Language is not always consistently marked in the corpus, thus anything not marked is considered Hittite by default. Any combination of search languages may be used, and a word-by-word analysis is conducted to see if word-level language information is encoded, and if it matches any one of the selected languages. Following this, the language encoding for the line is examined to see if the line as a whole is identified as being one of the search languages. Searches are returned if the either the individual word matches one of the search languages OR if the line data matches the search languages.

Regular Expression (RegEx) Search

Ticking this box enables searching using Regular Expression syntax. This conflicts with strict character and case sensitive searches, as both are required, and will disable these options in the Advanced Search Options menu. All searches conducted using RegEx syntax require strict character usage, as well as case sensitivity by default (although case sensitivity can be controlled via RegEx expressions themselves). Likewise, the type of search is restricted to strict transcription, RegEx search is unavailable for lemmata or cuneiform searching.

RegEx syntax is a method of conducting extremely specific searches by controlling nearly every variable of a search string including being able to create groups of options, controlling for specific sequences, modifying case sensitivity, character exclusions, whitespace checks, and more. To see the specifics of the myriad options made available by regular expressions, please see the Mozilla documentation at MDN.

In the search box, RegEx searches should be entered in the form /ab+c/g where the // slashes contain the search string, here specifying "the string a, a minimum of one b up to an infinite quantity, and then c." Any flags (such as g for "global") will come after the slashes.

Another sample search would be /(^|\s)da-ḫu-u/g, which searches for the string da-ḫu-u given that it is preceded by a space or a line beginning. In effect, this allows one to search for word-initial instances of the relevant string (barring instances in which it is preceded by a determiner, which could be controlled for by a slightly more complex expression).

Warning

Keep in mind that using the word-boundary sequence \b will produce unexpected results, as the - hyphen used to separate signs is considered a word-separating symbol by RegEx syntax. The better sequence to use is generally something like the above example that captures both spaces and line-initial instances (again with modifications to account for edge cases like determiners, if needed).

Search Type

There are three options for search type, of which only one may be selected at a time:

Transcription - i.e., the exact contents of the tablet
Bound Transcription - i.e., a more legible format which often takes the form of a lemma as shown in the bolded portions of the tagging hover boxes found in the results screen
Cuneiform Text - actual Unicode cuneiform may be used as search text

Note

searching with Cuneiform Text respects the ▒ character often found in the corpus, which may appear one or more times to represent gaps or damage in the text.

Linguistic Tags

This second search bar allows for searching linguistic tags in association with a given search text. Linguistic tagging is extremely inconsistent in the corpus, and this feature will ONLY return those words where tagging is available. Furthermore, since some tagging is not yet validated by the HethPort team and created by an autotagger, this feature searches ALL possible tags for a given word.

The tags may be entered several ways. They may be entered singly, as in ADJ, or they may be entered as a comma-separated list, ADJ, VB, NUM for a logical OR association. If an AND search is required, tags can be entered as D/L.SG using standard morphological dot notation, and these may also be placed in comma-separated lists. ORDER MATTERS. These must match exactly the order of tags found in the corpus. Tagging is NOT case-sensitive.

Info

A full list of available tags is found at the HethPort reference page.

CTH No(s).

The third and final search bar allows for filtering by CTH (Catalogue des Textes Hittites) number. As with linguistic tags, these may be entered singly or in ordered lists. Only the numbers are required: 625, 241. Input validation is in place to ensure that only valid CTH numbers may be used.