TM Editor Guidelines, Pt 3 (addendums from Esukhia)

Guide to Further Splitting TM Segments

Start by reading 84000’s guidelines for TM editors if you haven’t already. It provides an introduction to creating TM segments for Tibetan; we will follow 84000’s lead in standards for TMs, especially in that we must consider cues from source Tibetan as the key to proper splitting. We add the following addendums to 84000’s standards:

  1. Valid segments need not only be “complete clauses” chosen on the basis of verbs (as in 84000’s standards); instead, we also consider “complete phrases” as valid segments (this will help TM editors make shorter, and thus more useful, segments)

  2. This means you can often make breaks at a variety of conjunctions, prepositions, and punctuation whenever there is a complete phrase on either side; it’s important to note that:

  3. a phrase is a group of words that functions as a complete grammatical unit within a sentence (Tibetan takes precedence over the English in this evaluation);

  4. this may be a noun phrase or a verb phrase; a prepositional phrase or a conjunctional phrase; or an independent or dependent phrase

  5. a complete phrase will have more than one word, and more than one POS (part of speech) — for example, a noun with its adjectives, a verb with its agent or object, a preposition with a noun phrase, and so on…

  6. Identifying phrases in Tibetan often hinges on frequent conjunctions (words that connect phrases) and prepositions (words that show relationships between phrases); they may or may not be marked by punctuation (a ཤད་); this includes:

  7. Postpositional phrases – look for phrases ending in postpositional words like “ནི་”, “ལ་” (or a ལ་དོན་ of any kind), “-ཡིས་” (བྱེད་སྒྲ་), “ནས་”, etc. as possible splits

  8. Conjunctional phrases – look for phrases ending in words like “དང་”, “ཡང་”, “ན་” as possible splits

  9. Note that it’s usually not good to split at a “-འི་” / “གི་” / etc. (ie, at a འབྲེལ་སྒྲ་), as these often connect words within a phrase (rather than phrases themselves)

  10. Also be careful of postpositions & conjunctions that occur within a lexicalized term or set phrase (like the དང་ in “དོན་དང་ལྡན་པ་”, “meaningful” or the ན་ in “བླ་ན་མེད་པ་”, “unsurpassable”); these also aren’t split points

With those basics covered, let’s look at some specific examples.

Splitting at the ནི་སྒྲ་

Using the “topicalizer” as a Splitting Guide

“ནི་” is a postposition that usually marks a noun phrase as the subject of a sentence. Here, we have a long sentence that’s 37 Tibetan syllables, but you’ll see it starts with a postpositional phrase…

Great bodhisattva beings who are not separated from this meditative stability will swiftly attain manifestly perfect buddhahood in unsurpassed and genuinely perfect enlightenment. ཏིང་ངེ་འཛིན་འདི་ཉིད་དང་མ་བྲལ་བའི་བྱང་ཆུབ་སེམས་དཔའ་སེམས་དཔའ་ཆེན་པོ་ནི་མྱུར་དུ་བླ་ན་མེད་པ་ཡང་དག་པར་རྫོགས་པའི་བྱང་ཆུབ་མངོན་པར་རྫོགས་པར་འཚང་རྒྱའོ།།

…so we can easily split it into two phrases, one of 18 and another of 19 syllables:

Great bodhisattva beings who are not separated from this meditative stability ཏིང་ངེ་འཛིན་འདི་ཉིད་དང་མ་བྲལ་བའི་བྱང་ཆུབ་སེམས་དཔའ་སེམས་དཔའ་ཆེན་པོ་ནི་
will swiftly attain manifestly perfect buddhahood in unsurpassed and genuinely perfect enlightenment. མྱུར་དུ་བླ་ན་མེད་པ་ཡང་དག་པར་རྫོགས་པའི་བྱང་ཆུབ་མངོན་པར་རྫོགས་པར་འཚང་རྒྱའོ།།

Splitting at the དང་སྒྲ་

Using Conjunctions & Punctuation as a Splitting Guide

Likewise…

If you ask why, it is because they have followed the principle that all things have the essential nature of non-entity, and therefore they have not appropriated them. དེ་ཅིའི་ཕྱིར་ཞེ་ན། འདི་ལྟར་ཆོས་ཐམས་ཅད་དངོས་པོ་མེད་པའི་ངོ་བོ་ཉིད་ཀྱི་རྗེས་སུ་ཞུགས་པ་དང་། གཟུང་བ་མེད་པའི་ཕྱིར་ཏེ།

…is easily cut into three segments. You can see how the ཤད་s in the Tibetan align with the commas in the English; the Tibetan grammar gives additional clues in its connectives (conjunctions ན་ and དང་). In this case, the English even follows the Tibetan sentence order phrase-by-phrase:

If you ask why, དེ་ཅིའི་ཕྱིར་ཞེ་ན།
it is because they have followed the principle that all things have the essential nature of non-entity, and འདི་ལྟར་ཆོས་ཐམས་ཅད་དངོས་པོ་མེད་པའི་ངོ་བོ་ཉིད་ཀྱི་རྗེས་སུ་ཞུགས་པ་དང་།
therefore they have not appropriated them. གཟུང་བ་མེད་པའི་ཕྱིར་ཏེ།

Be careful of དང་སྒྲ་s that connect list items rather than phrases. You should keep lists intact whenever it is reasonable to do so. Instead, look for དང་s that are used as conjunctions between phrases (above, two verb-final phrases).

Splitting at the ལ་དོན་ / བྱེད་སྒྲ་ / etc.

Using Prepositional Phrases as a Splitting Guide

As the young brahmin Upatiṣya enjoyed solitude, he had gone to live in the forest, བྲམ་ཟེའི་ ཁྱེའུ་ ཉེ་ རྒྱལ་ དབེན་པ་ ལ་ དགའ་བས་ གནས་མལ་ དགོན་པར་ སོང་ སྟེ །

This example sentence is already a reasonable length; but, since splitting it into two phrases is easily possible (as the Tibetan and English phrases have clear correspondences), it’s best to do so:

As the young brahmin Upatiṣya enjoyed solitude, བྲམ་ཟེའི་ ཁྱེའུ་ ཉེ་ རྒྱལ་ དབེན་པ་ ལ་ དགའ་བས་
he had gone to live in the forest, གནས་མལ་ དགོན་པར་ སོང་ སྟེ །

Splitting Out-of-order English

In this example,

The Blessed One was dwelling on the banks of the great Nairañjanā River, together with seven thousand bodhisattvas. Among them were the Noble Avalokiteśvara, Vajrapāṇi, Maitreya, and Mañjuśrī, and all the great śrāvakas like Subhūti, Śāriputra, and Maudgalyāyana. བཅོམ་ལྡན་འདས་ཆུ་བོ་ཆེན་པོ་ཀླུང་ནཻ་རཉྫ་ནཱའི་འགྲམ་ན། འཕགས་པ་སྤྱན་རས་གཟིགས་དབང་ཕྱུག་དང་། ལག་ན་རྡོ་རྗེ་དང་། བྱམས་པ་དང་། འཇམ་དཔལ་ལ་སོགས་པ་བྱང་ཆུབ་སེམས་དཔའ་བདུན་སྟོང་དང་། རབ་འབྱོར་དང་། ཤཱ་རིའི་བུ་དང་། མཽད་གལ་གྱི་བུ་ལ་སོགས་པ་��ན་ཐོས་ཆེན་པོ་ཐམས་ཅད་དང་ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ།

This is a particularly tricky segment. The final verb phrase “dwelling together with” (ཐབས་ཅིག་ཏུ་བཞུགས་) is split across the first sentence; meanwhile, the “seven thousand bodhisattvas” (བྱང་ཆུབ་སེམས་དཔའ་བདུན་སྟོང་) is in a different order in the English. If you feel a sentence is too complicated for you to split, you may leave it as-is.

Alternatively, the segment-length bonus is there to encourage you to try and make useful segments out of sentences like these. A 67-syllable sentence is not a useful segment, because it is too long to be a probable match for future translations (again, the longer a Tibetan sentence gets, the less likely it is to be useful).

The way to proceed is to make logical, phrase-level breaks in the Tibetan (corresponding to the English whenever possible), and then match the English as best you can. Brackets and ellipses are used to give necessary context (in the English), while we keep the source Tibetan intact:

The Blessed One [was dwelling] on the banks of the great Nairañjanā River, བཅོམ་ལྡན་འདས་ཆུ་བོ་ཆེན་པོ་ཀླུང་ནཻ་རཉྫ་ནཱའི་འགྲམ་ན།
[together with] seven thousand bodhisattvas. Among them were the Noble Avalokiteśvara, Vajrapāṇi, Maitreya, and Mañjuśrī, and འཕགས་པ་སྤྱན་རས་གཟིགས་དབང་ཕྱུག་དང་། ལག་ན་རྡོ་རྗེ་དང་། བྱམས་པ་དང་། འཇམ་དཔལ་ལ་སོགས་པ་བྱང་ཆུབ་སེམས་དཔའ་བདུན་སྟོང་དང་།
all the great śrāvakas like Subhūti, Śāriputra, and Maudgalyāyana. རབ་འབྱོར་དང་། ཤཱ་རིའི་བུ་དང་། མཽད་གལ་གྱི་བུ་ལ་སོགས་པ་ཉན་ཐོས་ཆེན་པོ་ཐམས་ཅད་དང་
[The Blessed One] was dwelling … together with [seven thousand bodhisattvas] ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ།

We propose splitting these kinds of sentences because the benefit (of creating segments of a usable length) outweighs the cost (of partially “incomplete” segments). From the perspective of TM creation, this is simply an unfortunate consequence of Tibetan’s SOV sentence order clashing with English’s SVO.

Note that trying to split into even smaller segments here would result in glossary-like segments. This should be avoided:

The Blessed One བཅོམ་ལྡན་འདས་
of the great Nairañjanā River, ཆུ་བོ་ཆེན་པོ་ཀླུང་ནཻ་རཉྫ་ནཱའི་
on the banks འགྲམ་ན།
the Noble Avalokiteśvara

…|འཕགས་པ་སྤྱན་རས་གཟིགས་དབང་ཕྱུག་དང་།

…|

In general, remember that a good segment is a phrase of significant length (longer than a glossary or dictionary entry), but shorter than a sentence (unless, of course, it is a simple sentence).

Splitting Long Lists

Sentences with long lists may also have logical splitting points. Here, the sentence opens with a postpositional phrase (bold); the list has a logical breakpoint between a list of deities and a list of humans (starting w/ kings); and it ends in a short verb phrase:

Blessed One, the bodhisattva mahāsattvas, the great śrāvakas, gods, nāgas, yakṣas, gandharvas, asuras, garuḍas, [146.b] kinnaras, mahoragas, kings, ministers, brahmins, householders, monks, nuns, and male and female lay vow holders have gathered here in great numbers through the strength of the Tathāgata’s majesty and supernatural powers. བཅོམ་ལྡན་འདས་ དེ་བཞིན་ གཤེགས་པ འི་ གཟི་བརྗིད་ དང་ རྫུ་འཕྲུལ་ གྱི་ མཐུ ས་ བྱང་ཆུབ་ སེམས་དཔའ་ སེམས་དཔའ་ ཆེན་པོ་ དང་ །_ ཉན་ཐོས་ ཆེན་པོ་ དང་ །_ ལྷ་ དང་ །_ ཀླུ་ དང་ །_ གནོད་སྦྱིན་ དང་ །_ དྲི་ཟ་ དང་ །_ ལྷ་མ་ ཡིན་ དང་ །_ ནམ་མཁའ་ ལྡིང་ [146b]དང་ །_ མི འམ་ ཅི་ དང་ །_ ལྟོ་འཕྱེ་ ཆེན་པོ་ དང་ །_ རྒྱལ་པོ་ དང་ །_ བློན་པོ་ དང་ །_ བྲམ་ཟེ་ དང་ །_ ཁྱིམ་བདག་ དང་ །_ དགེ་སློང་ དང་ །_ དགེ་སློང་ མ་ དང་ །_ དགེ་བསྙེན་ དང་ །_ དགེ་བསྙེན་མ་ རྣམས་ མང་ དུ་ འདུས་ སོ །_།

We split the Tibetan into its natural breaks, and drag the corresponding English to the appropriate segments (adding the ellipses to the English to mark the fragments as non-sequential):

Blessed One, … through the strength of the Tathāgata’s majesty and supernatural powers. བཅོམ་ལྡན་འདས་ དེ་བཞིན་ གཤེགས་པ འི་ གཟི་བརྗིད་ དང་ རྫུ་འཕྲུལ་ གྱི་ མཐུ ས་
the bodhisattva mahāsattvas, the great śrāvakas, gods, nāgas, yakṣas, gandharvas, asuras, garuḍas, [146.b] kinnaras, mahoragas, བྱང་ཆུབ་ སེམས་དཔའ་ སེམས་དཔའ་ ཆེན་པོ་ དང་ །_ ཉན་ཐོས་ ཆེན་པོ་ དང་ །_ ལྷ་ དང་ །_ ཀླུ་ དང་ །_ གནོད་སྦྱིན་ དང་ །_ དྲི་ཟ་ དང་ །_ ལྷ་མ་ ཡིན་ དང་ །_ ནམ་མཁའ་ ལྡིང་ [146b]དང་ །_ མི འམ་ ཅི་ དང་ །_ ལྟོ་འཕྱེ་ ཆེན་པོ་ དང་ །_
kings, ministers, brahmins, householders, monks, nuns, and male and female lay vow holders རྒྱལ་པོ་ དང་ །_ བློན་པོ་ དང་ །_ བྲམ་ཟེ་ དང་ །_ ཁྱིམ་བདག་ དང་ །_ དགེ་སློང་ དང་ །_ དགེ་སློང་ མ་ དང་ །_ དགེ་བསྙེན་ དང་ །_ དགེ་བསྙེན་མ་ རྣམས་
have gathered here in great numbers མང་ དུ་ འདུས་ སོ །_།