Reranked Translation Suggestions

#translation #ai

In the field of machine translation, ensuring the accuracy and context-relevance of translations is critical. One approach to address this challenge is through the use of semantic similarity between source verses as a tool for suggesting a disambiguated rendering from multiple possible options. (example below)

This idea, which I will refer to as “Reranked Translation Suggestions”, can be particularly effective when dealing with semantic units, but it can also be implemented with words only.

A Practical Example

Consider two translation units:

“Jesus walked on the water” -> “Jesus walked on the big blue wet thing”
“Jesus said, give me some water to drink” -> “Jesus said, give me some drinking stuff”
“Peter walked on the water” -> “Peter walked on the ?”

Let’s imagine that in the target language, the source word ‘water’ is translated as ‘big blue wet thing’ in the first verse and as ‘drinking stuff’ in the second verse.

In this scenario, a dictionary is updated to note that there are two possible renderings of the ‘water’ entity (this is a semantic unit that is not a process, not a trait, etc.—see below) in the source, and it notes the verse context for each rendering.

Later, when a rendering for ‘water’ is needed, the method checks the new source verse and determines whether the new verse is most similar to the verse that uses ‘big blue wet thing’ or the verse that uses ‘drinking stuff’. The most likely option from the project dictionary is then suggested accordingly.

Put even more simply, the method uses semantic similarity between the original verses to determine which of the two renderings for ‘water’ is most likely to be the correct rendering in the current verse.

“Peter walked on the water” is a new verse that has not yet been translated. The method can suggest the two possible renderings for ‘water’ based on the two previous verses, and then the translator can choose the most appropriate rendering for the current verse. In this case, the translator would likely choose ‘big blue wet thing’ as the most appropriate rendering for ‘water’ in this verse.

By leveraging semantic similarity and understanding the context of the source verses, this method can significantly improve the accuracy and relevance of translation suggestions, leading to more precise and contextually appropriate translations.

In short, translation suggestions can be “reranked” based on semantic similarity between the source verses.

Special Note on Semantic Units

Semantic units are often deeply nested. For example, “Jesus” is an entity, “Christ” is an entity, and “Jesus Christ” is an entity as well. When working with glosses for semantic units, each glosses should be assumed to refer to the largest entity in nested entities and all children until there is a separate case where there’s only partial overlap between the source and target. Previous glosses apply to these semantic units, further refining the translation suggestions.

For example, assume I have a semantic dictionary in which I know the “Jesus” has gloss A, “Christ” has gloss B, and thus “Jesus Christ” cannot be simply assigned either gloss A or B. On the contrary, if “automobile” has gloss C, then the high-level entity “the automobile” might be assigned gloss C as well, since it has no inherent conflict in its constituent children (assuming ‘the’ is simply a grammatical marker that shouldn’t always be glossed).

This seems a bit abstract at the moment, so I’ll give a more concrete example in the future. See also Multi-Agent Translation Simulations for another approach to improving translation accuracy.

Reranked Translation Suggestions

Table of Contents

Reranked Translation Suggestions

A Practical Example

Special Note on Semantic Units