Evaluation
and validation
of machine
translation
Platform for the evaluation and validation of machine translation using automatic metrics and standardized human evaluation frameworks.
The quality of a machine translation system depends not only on the model used, but on how it performs in real contexts, specific languages and specialized domains.
Vera was created to solve that problem: a platform developed by imaxin to compare models, validate results and monitor the evolution of machine translation systems continuously and reliably.
The model defines
the translation
The opacity of neural machine translation (NMT) and large language models (LLM) makes automatic metrics insufficient, as models with similar scores perform very differently in production. Furthermore, constant updates can introduce invisible degradations. Without a solid evaluation, it is impossible to detect these quality losses in time.
Research focused
on real decisions
Vera integrates automatic evaluation, human evaluation and comparative analysis of results into a single web environment, facilitating processes that are usually carried out using separate tools and fragmented workflows.
The platform is used both for scientific research and validation and for the continuous quality control of our own machine translation systems and platforms such as Opentrad.
- Licensing for universities and research centres
- Continuous evaluation of translation models and engines
- Objective comparison between systems and versions
- Quality control before production deployments
Objective comparison
using metrics
Vera incorporates automatic evaluation metrics to measure precision, similarity and translation quality between different systems.
The platform allows performance comparison across different models, calculating statistical differences and detecting variations between models in a simple and visual way.
Quality requires
people
Although automatic metrics are essential for scaling processes, human evaluation remains the most reliable way to analyse the real quality of a translation.
Vera integrates the Multidimensional Quality Metrics (MQM) Core framework within the workflow itself, enabling collaborative annotation, error review and analysis of the correlation between human and automatic evaluation from a single platform.
Evaluation dataset sample
The percentage of the corpus to evaluate can be selected along with different sampling strategies adapted to the dataset.
Multi-user evaluation
Vera enables collaborative evaluation of models among different users and incorporates the calculation of Inter-Annotator Agreement (IAA) to measure consistency between annotations.
Detailed error classification
Segments can be filtered by text, error type and severity or review status, facilitating exploration of the evaluation corpus.
Creation and refinement of references
Evaluators can create or edit references in three ways: starting from scratch, modifying existing references or using the output of the best-scoring model as a base.
Applied research
for real problems
Vera is developed in collaboration with universities and international research centres, combining scientific rigour and practical application in production environments.
The platform has been presented at specialised forums for machine translation evaluation and is part of an industrial research line aimed at improving the quality and reliability of linguistic AI systems.
Key
features
Trust is also
evaluated
At imaxin we believe that artificial intelligence must be measurable, analysable and validated in a transparent way.
That is why Vera is not just an evaluation tool: it is the foundation that allows us to build more reliable, accurate machine translation systems adapted to each linguistic context.
It is not enough to translate. You need to know how good the translation is.
Do you work in evaluation
of machine translation?
If you are part of a university, research group or institution interested in the evaluation and validation of MT models, we can help you integrate VERA into your analysis and experimentation processes.