TM Editor Guidelines, Pt 2 (from 84000)

Where to Break (Making Addition Breaks Missed by the Script):

In addition to correcting unwanted breaks made by the script, you should add additional breaks following the guidelines that follow.

The “$” symbol will be used in all the following examples to show where a manually entered break is needed:

Misidentified Words:

Again, the script will miss some inflected verbs, usually because they were misidentified as nouns or particles. For instance, currently the script will miss the imperative verb form, གྱིས་ of the verb, བགྱིད་ (“to do, perform”) because it will consider it to be the instrumental case particle. We haven’t adjusted the script to account for this because the instrumental particle is more common.

མི་མཆོག་ ལ་ ནི་ མྱུར་ དུ་ བལྟ་བ ར་ གྱིས །_།$
Quickly, behold the supreme person!

The break here (“$”) needs to be added manually.

Inflected Verbs Followed by Other Case or Non-case Particles that Are Not Segmented by The Script:

The script does not break after an inflected verb followed by a case particle other than the ablative ནས་/ ལས་. It will be less common for a complete thought to end on such clauses, however, there will be some cases where it does, and you should inspect every inflected verb and add the break if the preceding clause can stand on its own.

This will happen frequently with the relational particle (ཀྱི་/གྱི་/གི་/-འི་/ཡི་) and 2nd/4th/7th case particle (ལ་/སུ་/ན་/-ར་/སུ་/དུ་/རུ་/ཏུ་), and sometimes, though less frequently with the instrumental (ཀྱིས་/གྱིས་/གིས་/-ས་/ཡིས་). For example:

སེམས་ ཀྱི་ ངོ་བོ་ སྟོང་ ཡིན་ ལ་$
The essence of mind is empty, but

སེམས་ ཀྱི་ རང་བཞིན་ གསལ་ ཡིན །
The nature of mind is luminous.

བདག་ རྟག་པ་ མེད་པ་ ཡིན་ གྱིས་$
A permanent self is non-existent but

བདག་ཉིད་དེ་ སེམས་ འཁུལ་བར་ སྣང་།
That very self appears to mistaken mind

ཏིང་འཛིན་ ཞི་བ་ བསྒོམ་པ་ ཐོབ་ འགྱུར་ གྱི །_།$
You will attain the state of concentration, the cultivation of peace, and

For other non-case particles breaks may similarly be applied, but in each case be sure that what precedes and follows the break can stand on its own. There should be no adverbs or modifiers distributing between both clauses, nor should the preceding clause be the object/subject of the following one or vice versa. Some examples:

བཅོམ་ལྡན་འདས་ ཆུ་བོ་ ཆེན་པོ་ ཀླུང་ ནཻ་ རཉྫ་ ནཱའི་ འགྲམ་ ལ་ གཤེགས་ དང་
The Blessed One went to the banks of the great Nairañjanā River, and

དེར་རྒྱལ་པོ་ དང་ །_ བློན་པོ་ དང་ །_ བྲམ་ཟེ་ དང་ །_ ཁྱིམ་བདག་ ཐམས་ཅད་ ལ་ ཆོས་ བསྟེན་ ནོ །
There he taught the dharma to kings, ministers, brahmins, and householders.

ལྷ འི་ བུ་ དག་ གང་ སེམས་ཅན་ ལ་ལ ས་ སངས་རྒྱས་ བཅོམ་ལྡན་འདས་ མྱ་ངན་ ལས་ འདས་ ནས་* ལོ་ བརྒྱ་ སྟོང་ ངམ །_ བསྐལ་པ་ བྱེ་བ་ ལོན་ ཡང་$
Gods, even though one thousand years, an eon, or even ten million eons may have elapsed since the Bhagavān Buddha entered parinirvāṇa,

Note, here the segment before “འདས་ ནས་” was merged because the following verb “ལོན་” modifies it by indicating its location in time (thus the English translation “since”).

In general, try to apply the breaks to non-case particles less often, since these scenarios are often less clear, it’s best to avoid breaking on them if you have any doubts. But, if it is sensibly clear that a break should be there, go ahead and add it. Also, if a segment is getting very long, in the 40+ syllable range, it may be good to see if there are any of these additional breaks that may be applied in a valid way.

In general we should not place breaks after nominalized verbs (+ -པ་/བ་/པར་ etc…). However, there are two cases where a break after a nominalized verb makes sense. Consider the two following cases:

Verbs of Quoted Speech and Thought:

Usually verbs of speech such as “བཅོམ་ལྡན་འདས་ཀྱིས་བཀའ་སྩལ་པ། The Blessed One said” or “དེས་སྨྲས་པ། They replied.” Are stated before or after quoted speech and the verb is nominalized (“བཀའ་སྩལ་པ།” or “སྨྲས་པ།”). In these cases we actually should break these phrases off as their own segments separated from their quoted speech. The same applies to verbs of quoted though “སྙམ་པ་”. This is because they don’t need to be included in the quoted passage to be understood; often times they are quite verbose and omitted in the English translation; and so including them in the quoted speech would make those TMs longer and less likely to trigger fuzzy matches.

Therefore these statements of thinking or speech should be sectioned off into their own segments:

འདི་ལྟ་ སྟེ །_ བཅོམ་ལྡན་ ཀྱིས་ བཀའ་ སྩལ་པ །
For the Buddha has declared,

དེ་ ནས་ འཇམ་དཔལ་ གཞོན་ནུར་ གྱུར་ པས་ ལིད་ཙ་བཱི་ དྲི་ མ་ མེད་པར་ གྲགས་པ་ ལ་ སྨྲས་པ །
Thereupon, Mañjuśrī, the crown prince, addressed the Licchavi Vimalakīrti,

If the statement is entirely omitted in the English, as is often the case when such statements become redundant, then leave that segment blank in the English cell:

འཇམ་དཔལ་ གྱིས་སྨྲས་པ །

There are of course exceptions to this rule: if a modifier is inserted in the middle of the speech; if clause of speech is exceedingly short; or there is some other unexpected but sensible reason, then the verb should be included with the quoted speech:

བདག་ ནི་ དེར་ འགྲོའོ་ སྙམ་ མོ །
he thinks, ‘I shall go there.’

ཚིག་ ཏུ་ སྨྲས་ པས་ འདི་ ང་ ཡི་ དཔང་ ཡིན་ ཏེ །
And said, “This earth is my witness.

ཤེས་ལྡན་ དག་ ཨེ་མ་འདི་ དམག་ མང་པོ་ དང་ ལྡན་ནོ་ ཞེས་ སྨྲས་པས།
He said, “Gentlemen! It is awesome to behold!”

Long Descriptive Passages Using Nominalized Verbs

There are sometimes long passages of text containing a continuous chain of nominalized verbs giving a description of some place or object or describing a sequence of events. Generally we want to avoid breaking on nominalized verbs, but if these passages are particularly long (running for 40+ syllables), then it should be examined to see if parts of the passage can be separated according to clear themes. This is one exception where we should look to the English translation to help identify themes, as the translator has likely already identified any themes in a run-on phrase and broken them into sentences or with a semicolon. For example:

ས་གཞི་ ལག་མཐིལ་ ལྟར་ མཉམ་ ལ །_ ལྷ འི་ ཡིད་ དུ་ འོང་བ་ ཁ་དོག་ དང་ ལྡན་པ །_ དྲི་ དང་ ལྡན་པ །_
The ground became as smooth as the palm of a hand, divinely pleasing to the mind, colorful, and fragrant.

ལྷ འི་ མེ་ཏོག་ གི་ ཤིང་ དང་ །_ འབྲས་བུ འི་ ཤིང་ དང་ །_ སྤོས་ ཀྱི་ ཤིང་ དང་ །_ རིན་པོ་ཆེ འི་ ཤིང་ དང་ །_ དཔག་བསམ་ གྱི་ ཤིང་ དང་ །_ གོས་ ཀྱི་ ཤིང་ རྣམས་ ཀྱིས་ མཛེས་པ ར་ བྱས་པ །_
It was ornamented with heavenly flower trees, fruit trees, fragrant trees, jewel trees, wish-fulfilling trees, and trees bearing garments.

ལྷ འི་ སེང་གེ འི་ ཁྲི་ དང་ ལྡན་པ །_ རིན་པོ་ཆེ་ དང་ །_ དར་ དང་ །_ མེ་ཏོག་ གི་ ཆུན་པོ་ རབ་ ཏུ་ དཔྱངས་པ །_ ལྷ འི་ དྲིལ་བུ འི་ སྒྲ ས་ བརྒྱན་པ ར་ གྱུར་ ཏེ
It supported heavenly lion thrones with hanging garlands of jewels, cloth, and flowers, and was suffused with the sound of divine bells.

This long descriptive passage can be separated into three themes “ground,” “trees,” “throne.” Although it can be debated whether the first two should go together. There should be a clearly distinguished theme for each segment that sensibly makes up a complete thought. You can break up large passages in this manner, but if you have any doubts whether the segments might be misunderstood on their own, then it is better to leave it as a larger segment.

iii Editing English Segmentation:

The following section explains all the guidelines regarding the English segments and how they should be matched to the Tibetan. Note that the script for the 84000 project will add several references to the English including: folio references, [1.b], [2.a], etc…; milestone references, $1, $2, $3, etc…; and note references #1, #2, #3, etc. (Other TM projects may wish to use similar notation, as these references may be converted into suitable markup in the finalized .tmx file)

In general you can ignore these, except the note references will refer you to that endnote as it appears in the 84000 Reading Room, and it should be checked if you suspect there is an error or alternative source used in the passage (see instructions below).

Do keep in mind that the milestone references (marked with “$” signs) should never be placed at the end of segment, if it seems to be placed there, it should instead it should come at the beginning of the following one. A note reference on the other hand (marked with the “#” sign), if it appears in between two segments should always come at the end of segment e.g.,

NOT like this:

leads them to the extinction of their suffering in the sphere of remainderless parinirvāṇa.#4 $30

“He liberates them from the eight unwholesome factors and sets each of them on the eightfold noble path.

like this:

leads them to the extinction of their suffering in the sphere of remainderless parinirvāṇa.#4

$30 “He liberates them from the eight unwholesome factors and sets each of them on the eightfold noble path.

Changing the Sentence Order in the English

Since the segments themselves are the foundational units to be documented, the order they appear is not important. So often the TM editors should reorder the English phrases or sentences; this is perfectly acceptable and necessary in many cases especially since the grammatical word order is so different in Tibetan and English. So for example:

An English and Tibetan passage with three phrases A,B, and C could be arranged in the .tmx file as follows:

Tibetan English
A B
B C
C A

As described in the InterText demo, CTRL-X may be used to swap and English segment with the one above, although the “cross-align” feature needs to be enabled.

For example, consider the following two segments:

བསྐལ་པ་ བྱེ་བ་ ཕྲག་ བརྒྱ་ གྲངས་མེད་ དུ ། _ ། སྲིད་པ འི་ རྒྱ་མཚོ ར་ ངེས་པ ར་ འཁོར་བ་ དང་ ། _ ། ཉོན་མོངས་ དོག་པ ར་ རྟག་ ཏུ་ འཁོར་ ན ། _ །

འཚོ་བ་ མ་རུངས་པ་ ནི་ སྐྲག་ མི་ འགྱུར ། _ །

“Those of unwholesome livelihood have no fear,

Even though they are bound to circle In saṃsāra’s ocean for countless eons, Always ensnared by afflictions.

The English must be matched to the segments as follows:

བསྐལ་པ་ བྱེ་བ་ ཕྲག་ བརྒྱ་ གྲངས་མེད་ དུ ། _ ། སྲིད་པ འི་ རྒྱ་མཚོ ར་ ངེས་པ ར་ འཁོར་བ་ དང་ ། _ ། ཉོན་མོངས་ དོག་པ ར་ རྟག་ ཏུ་ འཁོར་ ན ། _ །
Even though they are bound to circle In saṃsāra’s ocean for countless eons, Always ensnared by afflictions.

འཚོ་བ་ མ་རུངས་པ་ ནི་ སྐྲག་ མི་ འགྱུར ། _ །
“Those of unwholesome livelihood have no fear,

The order of the English segments as they appear from InterText is no issue, so long as they are completely correlated to the Tibetan segments they are matched to.

Separating Compounded English Segments

As has been mentioned, sometimes the English translation has compounded two Tibetan segments together and intermingled the words in the English, which would prevent you from being able to make a clean break. Sometimes it makes sense to just bend the rules a little bit or make a larger segment, as long as the resultant TM isn’t too long and can be clearly understood. However, if this can’t be done in an eloquent way, there is another solution:

  1. From InterText you should copy the full English passage in which the Tibetan segments have been intermingled.
  2. Create a new translation unit in InterText (Edit → “Insert Element” or I) then paste a copy of that full passage into the new alignment box so the same passage is matched with both the Tibetan segments and add square brackets for any English words that are not found in the matched Tibetan segment.

For example, consider the following four Tibetan segments which needs to be linked to the English passage:

ཡོངས་སུ་ མྱ་ངན་ ལས་ འདའ་བ་ ཡང་ སྟོན །
སྐྱེ་བ་ ཡང་ སྟོན །
འཁོར་ལོ ས་ སྒྱུར་བ འི་ རྒྱལ་པོ་ ཡང་ སྟོན །
རྩེ་བ་ དང་ ། དགའ་བ་ དང་ ། བུད་མེད་ ཀྱི་ བཞད་གད་ དང་ ། ཀུ་རེ་ དང་ ། དྲི་ དང་ ། ཕྲེང་བ་ དང་ ། དགའ་ ཞིང་ རྩེ་བ་ ཡང་ སྟོན ། །

He manifests as being in the state of parinirvāṇa, as being born, as a universal monarch, and also as someone joyful who is entertained by amusements and pleasures such as women’s laughter, play, perfume, and garlands.

Since each of these Tibetan segments is a complete clause with the final inflected verb “སྟོན་”, The English should be reduplicated and bracketed when matched to each Tibetan segment in the following manner:

ཡོངས་སུ་ མྱ་ངན་ ལས་ འདའ་བ་ ཡང་ སྟོན །_
He manifests as being in the state of parinirvāṇa,

སྐྱེ་བ་ ཡང་ སྟོན །_
He manifests [as being in the state of parinirvāṇa,] as being born,

འཁོར་ལོ ས་ སྒྱུར་བ འི་ རྒྱལ་པོ་ ཡང་ སྟོན །_
He manifests [as being in the state of parinirvāṇa, as being born,] as a universal monarch,

རྩེ་བ་ དང་ །_ དགའ་བ་ དང་ །_ བུད་མེད་ ཀྱི་ བཞད་གད་ དང་ །_ ཀུ་རེ་ དང་ །_ དྲི་ དང་ །_ ཕྲེང་བ་ དང་ །_ དགའ་ ཞིང་ རྩེ་བ་ ཡང་ སྟོན །_།
He manifests [as being in the state of parinirvāṇa, as being born, as a universal monarch, and also] as someone joyful who is entertained by amusements and pleasures such as women’s laughter, play, perfume, and garlands.

You can see this process demonstrated in the InterText tutorial from minute 17:18.

Note that the first three segments do not require the full English passage, it is sufficient to simply end the English passage when all the Tibetan words have been matched, however the last segment requires the full passage in order to state the verb “manifests”, “སྟོན་” as it is stated in the Tibetan.

With this format, the next translator using the TMs will then be able to discern that the bracketed text is not actually contained in the Tibetan, but contained in another segment, and the additional text will not be confused and even provide further context that may prove to be useful information.

As a side note, in general you will notice that the English translation will often exchange pronouns for proper names and vice versa, state the subject when it is only implied in the Tibetan, or omit the subject when it can be implied in the English. You need not mind such scenarios, as they are common and easily understood from the context of the text. There is no need to bracket or alter the English in such cases. The technique described above is really only necessary when you need to–in a sense–pass through one segment to reach another in the English.

Under no circumstances should you retranslate the English. For this technique, you should only copy and paste the full English passage and bracket any text that is not included in the linked Tibetan. If you believe there are actual errors in the translation, please follow the guidelines for this case in the next topic below.

Punctuation:

Include all of the English punctuation and quote marks as is. They should be fit into the segments as is sensible, and it is no problem if an opening quote mark is contained in a segment without a matching closing quote mark.

Conjunctions “and” and “but” translated for “ནས་” “ལས་” and similar particles:

As the conjunctive particles that govern Tibetan clauses come attached immediately after the final inflected verb, and since we are segmenting from the perspective of the Tibetan grammar, when this particle is translated in English as a conjunction such as “and”, “but”, “then”, etc… This should be reflected as it is in the English, even if it seems to be hanging at the end of a clause from the perspective of the English:

ངན་འགྲོ་ ཐམས་ཅད་ ལས་ ཡོངས་སུ་ ཐར་བ ར་ མཛད་ ནས་
He liberated them from all the negative migrations, and

སེམས་ ཀྱི་ ངོ་བོ་ སྟོང་ ཡིན་ ལ་
The essence of mind is empty, but

Verses

The formatting of lines into verses should not influence your choices for segmenting the Tibetan, nor will it be affected by the script. So we aren’t directly taking the structure of the verses into account as a cue for our segmentation, although you may very well find that they will tend to naturally match up with the clauses and segments.

Note this is contrary to the older version of the TM guidelines.

Words or Phrases Omitted or Added within the English Translation:

We would like to keep a comprehensive and complete bitext of the translation in our record. Therefore all the text for both the Tibetan and English need to be complete in the .xml alignments file. As mentioned before, for phrases that are omitted in the English, it is permissible to leave empty entries in the Tibetan or English, e.g.:

འཇམ་དཔལ་ གྱིས་སྨྲས་པ །
[Here the cell in the English column should be intentionally left blank]

Presumably the English translator left out “Mañjuśrī said,” because they deemed it redundant or unnecessary in the English reading.

Generally added phrases or elaborations found in the English should be included if that elaboration corresponds to a matching Tibetan segment. However, there may be rare cases where there is a heading or something similar that has no matching Tibetan, and in such a case the English may be matched to an empty Tibetan entry.

Although all these segments with empty entries will be preserved in the record of .xml alignment, by default they will be removed by InterText in the exported .tmx record, which shouldn’t be a problem since these empty TMs aren’t actually useful for translators working on CAT platforms.

iv Flagging Problematic Segments

In addition to segmenting the TMs you should also be checking for and flagging any potentially problematic segments. This includes erroneous translations or translations that were made from an alternate source. The flagging system we use is as follows:

Marking Errors:

While editing the TM segments, you may occasionally come across seeming translation errors in the published texts. 84000 publications go through a rigorous editorial review, but occasionally there may be some translation errors. Since you are closely reviewing the Tibetan-English correlation, you are in an advantaged position for noticing errors that may have slipped through into the final publication.

If you believe you have seen such an error, I have set up a shared google sheet where we can keep a record of them, and then at some point the editorial committee can review them and see if an update to the publication is necessary.

My hope is that the translations are well polished and this won’t be necessary, but if you do believe you have found an error please first check:

  1. Is this feasibly a stylistic choice made by the translator, in which the English translation doesn’t follow the literal wording or grammar found in the Tibetan source but does represent a sensible understanding of the meaning?
  2. Is there an end note in this passage found in the Reading Room that explains an alternate source or other reasoning for omitting or changing a passage? In general you should keep a window open of the note’s section of the published text in the Reading Room in order to reference the end notes if something seems off. All the notes will be marked in the English following a hashtag “#” sign, so you will be able to easily check the corresponding note from the Reading Room page.
  3. Is this a redundant phrase such as a part of speech (such as “The Blessed One said,”). That could sensibly be omitted from the English for better readability, since it can be obvious from context?

If the answer to these three questions is no, then this is likely an error that should be addressed and you should first enter it into the revision sheet, and then a single % character should be typed into the beginning of the English segment. This will create the flag that indicates there is an error to be reviewed:

%It dispels all obscurity and drkness,
མུན་ནག་ ཐིབས་པོ་ ཐམས་ཅད་ ཀྱང་ རྣམ་པར་ སེལ །_

“darkness” is misspelled. This can be used to flag common typos as well. Please don’t correct the typo because it will need to be corrected in the publication as well, and we need to be able to see it specifically in order to do so.

Note, that the editorial review process is slow, so it may take a while to correct the passage in the publication, but you will be making a beneficial contribution to the final publication of the Buddhavacana.

TM editors will be provided a link to the corrections spreadsheet.

Alternative Sources:

Sometimes there will be a sentence or phrase where the translator has used a different source then the Tibetan such as Sanskrit, Chinese, or a Kangyur edition different from the Derge. Such instances should be declared in the translators introduction and notes. Please use the endnote references “#” to check the notes for any declared alternate sources in the reading room, and if you find an alternative source add a ! character to the beginning of the Tibetan segment, and it will be reviewed when finalizing the text. Even though the TM will be matched to the Derge eKangyur, it may in fact still be a useful TM, however it definitely needs to be flagged if it is translated from an alternate source. Please match the segment as best you can, it is fine if all the words don’t match as long as it is flagged. For example:

!བག་མེད་པ་ ཡི་ དབང་ དུ་མ་ འགྲོ་ ཞིག _།
Do not fall under the power of delusion!#3

Here note 3 in the Reading Room says: Translated from the Cone edition གཏི་མུག་. Derge reads “བག་མེད་པ་ ”carelessness”.

The Tibetan is flagged with a ! character.

Note that 84000 publications use the Vienna sigla conventions for referring to the different versions of the Kangyur, in which the Derge is indicated by a “D”.

v Judgment Calls

This concludes our guidelines for creating TMs. The purpose for making them so precise is because following a standard will optimize the TMs value and usefulness when used from CAT platforms. Having the pybo-catscript makes it very convenient for translators to pre-segment the text in an efficient way, and then they will get the most out of the TM resources.

Following these standards will make the TMs fairly consistent, but we also realize a degree of subjectivity is inevitable, so it is fine to use your own judgement and common sense. The TM should essentially be the fundamental unit or block that translators will use to process their work. It is good to imagine how you would reference the TMs in your own work. If you want some more samples for how to edit the segments, please see the following examples from Toh 186, The Teaching on the Extraordinary Transformation that is the Miracle of Attaining the Buddha’s Powers in this google sheet:

We would also like to hear from you if you have any feedback or suggestions for how you might standardize TMs or set up the script in your own work. If so please contact Celso at celso.wilkinson@gmail.com

vi Cheat Sheet

A summary of these segmentation standards are outlined in this cheat sheet here.