Curritext: artificial intelligence for fairer and more transparent selection processes

imaxin was awarded two projects from the Barcelona Supercomputing Center (BSC) within the framework of the AINA Challenge, an initiative aimed at promoting the use of Catalan in the field of artificial intelligence and natural language processing.

In a previous post we presented PlatAina, a platform for automatic translation and linguistic model evaluation. You can read the article here.

On this occasion, we want to share the details of Curritext, an initiative that applies language technologies to promote equity and transparency in personnel selection processes.

 

An intelligent system to ensure equity and privacy

Curritext was created with a clear objective: to drive fairer, more efficient and privacy-respecting selection processes through an intelligent system capable of automatically anonymising CVs in Catalan. The solution is especially designed for environments with a medium or high volume of applications, where manual CV management represents a high cost in time and resources. In these contexts, human resources teams often require anonymised versions for internal review phases. Curritext automates this process, ensuring the protection of personal data and promoting more equitable evaluations, free from bias based on gender, age or appearance.

In addition to its direct application in selection processes, another strategic objective of Curritext is to provide the BSC with a platform from which it can test, compare and validate different named entity recognition (NER) models. In this way, the system not only allows CVs to be anonymised, but also enables systematic review of model behaviour and contributes to their continuous improvement through the analysis of results.

In addition to anonymisation, the system incorporates normalisation and homogenisation functions, enabling the generation of CVs with a coherent and comparable structure, which facilitates their review, analysis and subsequent processing by selection teams.

 

Architecture and self-hosting

Curritext is based on a microservices architecture, where each component fulfils a specific function and communicates through APIs. The system is composed of the following main elements:

  • API Gateway: single entry point for all external requests, centralising management and security.
  • AnonymizerEngine: microservice responsible for executing anonymisation processes.
  • NEREngine: microservice dedicated to tagging and classifying entities in the original document.
  • Persistence system: Amazon S3-compatible object storage, used for the management and custody of documents.

This modular architecture allows each component to be scaled independently, optimising performance and facilitating maintenance and system evolution.

 

Self-hosted models: control and efficiency

One of the main advantages of Curritext is that all NER models run on its own infrastructure, without depending on external services. This provides key benefits:

  • Privacy and data control, as documents never leave the environment, complying with the highest security and confidentiality requirements in information processing.
  • Resource optimisation, thanks to the system's ability to dynamically adjust the launching of models according to actual demand, avoiding oversizing and reducing operational costs.
  • Flexibility and technological independence, which allows integrating new models or updating existing ones without depending on external providers or introducing structural changes to the platform, thus ensuring its continuous evolution and technological autonomy.

     

API First approach: facilitating integration

Curritext was designed following the API First approach, which offers clear advantages in terms of system integration and evolution:

  • Facilitates faster and simpler integration with other systems.
  • The existence of clear documentation and endpoint consistency improve the experience during integration processes.
  • API versioning and contracts protect against unexpected changes that could affect existing integrations.
  • Access to new functionalities is always done through the API, ensuring their availability from the very first moment.
  • Promotes interoperability, connecting proprietary systems, automations or external applications and expanding the value of the service.

 

Model evaluation and validation

During the project, a beta testing report will be produced with technical documentation, integration examples and a comparative evaluation of the performance of the different NER neural models, which will enable the best-performing model to be selected. These evaluations will be carried out on tests created at imaxin using synthetic CVs, specifically designed to measure model behaviour in a controlled environment. All of this will contribute to strengthening the technological foundations of the Aina ecosystem and promoting the real adoption of Catalan in applied artificial intelligence environments.

 

Commitment to co-official languages and responsible innovation

The development of Curritext is part of our ongoing commitment to incorporating co-official languages into advanced technological solutions. At imaxin we understand that linguistic inclusion is not only a cultural value, but also a key factor for innovation and competitiveness in the digital sphere.

Working in Catalan —and in other minority languages— involves addressing specific technical challenges, but also creating opportunities to build more representative and accessible technologies. This project reinforces our position as a company specialised in language processing and demonstrates that it is possible to develop useful, ethical artificial intelligence tools that are aligned with the linguistic diversity of the territory.

 

Do you have a project?

Request a no-obligation quote.