A Platform for
Automatic and

Human Evaluation
of MT

MT models evolve quickly
Automatic scores alone do not explain production behaviour.

Vera combines automatic and human evaluation to assess machine translation quality. Automatic metrics provide speed and consistency, while human review improves error detection. Together, they reduce uncertainty in model selection, help detect regressions, and support traceable decision-making.

The model defines
the translation

The opacity of neural machine translation (NMT) and large language models (LLM) makes automatic metrics insufficient, as models with similar scores perform very differently in production. Furthermore, constant updates can introduce invisible degradations. Without a solid evaluation, it is impossible to detect these quality losses in time.

Imaxe
Vera avaliación humana

Vera
workflow

Vera integrates automatic metrics, MQM-based human evaluation and comparative analysis into a single workflow. This unified approach combines the speed and reproducibility of automatic assessment with the linguistic precision of human review, enabling reliable comparison of translation systems and better detection of quality improvements or degradations.

Dataset 

Upload

Automatic 

Metrics

MQM Human
Evaluation

Model
Comparison

Best
Decision

Research focused
on real decisions

Vera integrates automatic evaluation, human evaluation and comparative analysis into a single web environment, facilitating processes that are usually carried out using separate tools and fragmented workflows.

The platform is used both for scientific research and validation and for the continuous quality control of our own machine translation systems, including platforms such as Opentrad.

  • Licensing for universities and research centres
  • Continuous evaluation of translation models and engines
  • Objective comparison between systems and versions
  • Quality control before production deployments
Imaxe de fondo

Key
features

Automatic Evaluation

Human Evaluation

Analysis

Automatic Metrics

Provides fast and consistent machine translation assessment through automatic metrics, enabling large-scale comparison of models, versions and configurations.

MQM Annotation

Enables detailed human evaluation using the MQM framework to identify and classify translation errors, explaining model strengths and weaknesses.

Correlations

Analyses the relationship between automatic scores and human judgements to understand when metrics reflect real translation quality.

Statistical Tests

Applies statistical analysis to determine whether differences between models are significant and support reliable decisions.

Sampling

Allows the creation of representative evaluation samples, reducing review effort while maintaining reliable quality analysis.

Error Filtering

Provides advanced filtering to explore errors by type, severity, model or criteria, helping identify improvement areas.

User-Friendly Interface

Offers a web-based environment to configure evaluations, manage projects and analyse results without fragmented workflows.

Reference Creation

Supports the creation and refinement of reference translations for more accurate evaluation in specific domains and scenarios.

Reporting

Generates structured reports and visual summaries to communicate results and support model selection, improvement and deployment decisions.

Additional
features

separador verde

Objective comparison  
using metrics

Vera incorporates automatic evaluation metrics to measure precision, similarity and translation quality between different systems.

The platform allows performance comparison across different models, calculating statistical differences and detecting variations between models in a simple and visual way.

Imaxe
vera

Trust is also
evaluated

At imaxin, we believe that artificial intelligence must be measurable, analysable and validated in a transparent way.

That is why Vera is not just an evaluation tool: it is the foundation that allows us to build more reliable, accurate machine translation systems adapted to each linguistic context.

It is not enough to translate. You need to know how good the translation is.

 

Applied research
for real problems

Vera is developed in collaboration with universities and international research centres, combining scientific rigour and practical application in production environments.

The platform has been presented at specialised forums for machine translation evaluation and is part of an industrial research line aimed at improving the quality and reliability of linguistic AI systems.

 

Metricas

Discover how
Vera can help
your organisation

Get a demo or a free trial

Book a demo to see how Vera can 
help you make the best decision.