A Platform for
Automatic and
Human Evaluation
of MT

MT models evolve quickly.
Automatic scores alone do not explain production behaviour.

Vera combines automatic and human evaluation to assess machine translation quality. Automatic metrics provide speed and consistency, while human review improves error detection. Together, they reduce uncertainty in model selection, help detect regressions, and support traceable decision-making.

Request a demo

The model defines
the translation

The opacity of neural machine translation (NMT) and large language models (LLM) makes automatic metrics insufficient, as models with similar scores perform very differently in production. Furthermore, constant updates can introduce invisible degradations. Without a solid evaluation, it is impossible to detect these quality losses in time.

Vera
workflow

Vera integrates automatic metrics, MQM-based human evaluation and comparative analysis into a single workflow. This unified approach combines the speed and reproducibility of automatic assessment with the linguistic precision of human review, enabling reliable comparison of translation systems and better detection of quality improvements or degradations.

Dataset  
Upload

Automatic  
Metrics

MQM Human
Evaluation

Model
Comparison

Best
Decision

Research focused
on real decisions

Vera integrates automatic evaluation, human evaluation and comparative analysis into a single web environment, facilitating processes that are usually carried out using separate tools and fragmented workflows.

The platform is used both for scientific research and validation and for the continuous quality control of our own machine translation systems, including platforms such as Opentrad.

Licensing for universities and research centres
Continuous evaluation of translation models and engines
Objective comparison between systems and versions
Quality control before production deployments

Key
features

Automatic Evaluation

Human Evaluation

Analysis

Automatic Metrics

Provides fast and consistent machine translation assessment through automatic metrics, enabling large-scale comparison of models, versions and configurations.

MQM Annotation

Enables detailed human evaluation using the MQM framework to identify and classify translation errors, explaining model strengths and weaknesses.

Correlations

Analyses the relationship between automatic scores and human judgements to understand when metrics reflect real translation quality.

Statistical Tests

Applies statistical analysis to determine whether differences between models are significant and support reliable decisions.

Sampling

Allows the creation of representative evaluation samples, reducing review effort while maintaining reliable quality analysis.

Error Filtering

Provides advanced filtering to explore errors by type, severity, model or criteria, helping identify improvement areas.

User-Friendly Interface

Offers a web-based environment to configure evaluations, manage projects and analyse results without fragmented workflows.

Reference Creation

Supports the creation and refinement of reference translations for more accurate evaluation in specific domains and scenarios.

Reporting

Generates structured reports and visual summaries to communicate results and support model selection, improvement and deployment decisions.

Additional
features

Multi-user evaluation

Vera enables collaborative evaluation of models among different users and incorporates the calculation of Inter-Annotator Agreement (IAA) to measure consistency between annotations.

Multilingual

Vera is language-independent and imposes no restrictions on language pairs, allowing the evaluation of any combination to be integrated into research.

Continuous improvement

The platform evolves constantly with the development of new functionalities aimed at improving the system and expanding its capabilities.

Objective comparison
using metrics

Vera incorporates automatic evaluation metrics to measure precision, similarity and translation quality between different systems.

The platform allows performance comparison across different models, calculating statistical differences and detecting variations between models in a simple and visual way.

Trust is also
evaluated

At imaxin, we believe that artificial intelligence must be measurable, analysable and validated in a transparent way.

That is why Vera is not just an evaluation tool: it is the foundation that allows us to build more reliable, accurate machine translation systems adapted to each linguistic context.

It is not enough to translate. You need to know how good the translation is.

Applied research
for real problems

Vera is developed in collaboration with universities and international research centres, combining scientific rigour and practical application in production environments.

The platform has been presented at specialised forums for machine translation evaluation and is part of an industrial research line aimed at improving the quality and reliability of linguistic AI systems.

Discover how
Vera can help
your organisation

Get a demo or a free trial

Book a demo to see how Vera can
help you make the best decision.

Vera Evaluation, comparison and improvement of machine translation models.

A Platform forAutomatic and Human Evaluationof MT

The model definesthe translation

Vera workflow

Dataset Upload

Automatic Metrics

MQM Human Evaluation

Model Comparison

Best Decision

Research focused on real decisions

Key features

Automatic Evaluation

Human Evaluation

Analysis

Automatic Metrics

MQM Annotation

Correlations

Statistical Tests

Sampling

Error Filtering

User-Friendly Interface

Reference Creation

Reporting

Additional features

Objective comparison using metrics

Trust is also evaluated

It is not enough to translate. You need to know how good the translation is.

Applied research for real problems

Discover how Vera can helpyour organisation

Get a demo or a free trial

Vera
Evaluation, comparison and improvement
of machine translation models.

A Platform for
Automatic and
Human Evaluation
of MT

The model defines
the translation

Vera
workflow

Dataset  
Upload

Automatic  
Metrics

MQM Human
Evaluation

Model
Comparison

Best
Decision

Research focused
on real decisions

Key
features

Additional
features

Objective comparison
using metrics

Trust is also
evaluated

Applied research
for real problems

Discover how
Vera can help
your organisation