A Platform for
Automatic and
Human Evaluation
of MT
MT models evolve quickly.
Automatic scores alone do not explain production behaviour.
Vera combines automatic and human evaluation to assess machine translation quality. Automatic metrics provide speed and consistency, while human review improves error detection. Together, they reduce uncertainty in model selection, help detect regressions, and support traceable decision-making.
The model defines
the translation
The opacity of neural machine translation (NMT) and large language models (LLM) makes automatic metrics insufficient, as models with similar scores perform very differently in production. Furthermore, constant updates can introduce invisible degradations. Without a solid evaluation, it is impossible to detect these quality losses in time.
Vera
workflow
Vera integrates automatic metrics, MQM-based human evaluation and comparative analysis into a single workflow. This unified approach combines the speed and reproducibility of automatic assessment with the linguistic precision of human review, enabling reliable comparison of translation systems and better detection of quality improvements or degradations.
Dataset
Upload
Automatic
Metrics
MQM Human
Evaluation
Model
Comparison
Best
Decision
Research focused
on real decisions
Vera integrates automatic evaluation, human evaluation and comparative analysis into a single web environment, facilitating processes that are usually carried out using separate tools and fragmented workflows.
The platform is used both for scientific research and validation and for the continuous quality control of our own machine translation systems, including platforms such as Opentrad.
- Licensing for universities and research centres
- Continuous evaluation of translation models and engines
- Objective comparison between systems and versions
- Quality control before production deployments
Key
features
Automatic Evaluation
Human Evaluation
Analysis
Automatic Metrics
Provides fast and consistent machine translation assessment through automatic metrics, enabling large-scale comparison of models, versions and configurations.
MQM Annotation
Enables detailed human evaluation using the MQM framework to identify and classify translation errors, explaining model strengths and weaknesses.
Correlations
Analyses the relationship between automatic scores and human judgements to understand when metrics reflect real translation quality.
Statistical Tests
Applies statistical analysis to determine whether differences between models are significant and support reliable decisions.
Sampling
Allows the creation of representative evaluation samples, reducing review effort while maintaining reliable quality analysis.
Error Filtering
Provides advanced filtering to explore errors by type, severity, model or criteria, helping identify improvement areas.
User-Friendly Interface
Offers a web-based environment to configure evaluations, manage projects and analyse results without fragmented workflows.
Reference Creation
Supports the creation and refinement of reference translations for more accurate evaluation in specific domains and scenarios.
Reporting
Generates structured reports and visual summaries to communicate results and support model selection, improvement and deployment decisions.
Additional
features
Objective comparison
using metrics
Vera incorporates automatic evaluation metrics to measure precision, similarity and translation quality between different systems.
The platform allows performance comparison across different models, calculating statistical differences and detecting variations between models in a simple and visual way.
Trust is also
evaluated
At imaxin, we believe that artificial intelligence must be measurable, analysable and validated in a transparent way.
That is why Vera is not just an evaluation tool: it is the foundation that allows us to build more reliable, accurate machine translation systems adapted to each linguistic context.
It is not enough to translate. You need to know how good the translation is.
Applied research
for real problems
Vera is developed in collaboration with universities and international research centres, combining scientific rigour and practical application in production environments.
The platform has been presented at specialised forums for machine translation evaluation and is part of an industrial research line aimed at improving the quality and reliability of linguistic AI systems.
Discover how
Vera can help
your organisation
Get a demo or a free trial
Book a demo to see how Vera can
help you make the best decision.