translation
Linguistic Anomaly Detection is a technique for similarity scoring (compression-based, embedding-based, or traditional NLP-based) between source-source and target-target pairs. This involves comparing the similarity of source texts and their corresponding target texts. Rather than being scored against a baseline, however, it has the essential feature of being relativized to the current state of the translation project, such that only the verses a human has already translated/edited serve as the gold standard. Here's an idea for how LAD might work in a translation project:
source1
-target1
pair that has been manually vetted and is considered the gold standard.source2
-target2
pair that has not been vetted yet.source1
and source2
. This could involve comparing the compression results or similarity between the sources and between the targets, doing TF-IDF intersection scoring against your current project, or some other method.source1
and source2
is 0.8
, then the similarity between target1
and target2
should be within a standard deviation of 0.8
. (This is my suspicion at least.)target2
). This allows for a quantitative measure of the translation's internal consistency, and by implication it should indicate something about its overall quality.There's a lot of room for variation here, but the basic idea is to use a partially complete translation to evaluate the consistency of the rest of the translation. This is a very different approach from the traditional BLEU score, which compares the translation to a baseline (e.g., a human translation or a machine translation). Traditional approaches require you to have a baseline and hypothesis (i.e., two separate versions) for the specific text you are translating. This approach only requires you to have a single target rendering. Instead, this approach compares the translation to itself, which, in my view, is a much more useful metric for translation quality assessment in the first place, since it doesn't necessarily prioritize word-for-word equivalence, which can be deceptive as a quantitative metric.