
To evaluate a trained model on a OpenBioLink dataset a subclass of Evaluator has to be created that implements the score_batch() function and initializes the Evaluator.


The Evaluator class uses an instantiation of DataLoader, which is used to download the specified version of the OpenBioLink dataset. Furthermore it maps the string labels of entities and relations to integer identifiers, either by passing the paths to dictionary files or, if no such files are provided, by creating a mapping from scratch.

Implementing score_batch()

The function score_batch() is used by Evaluator to retrieve the scores of a set of test triples. It gets called with a tensor containing the batch test triples and should return a tuple (pos_score_head, neg_score_head, pos_score_tail, neg_score_tail).

pos_score_head and pos_score_tail should contain the positive scores for the head and tail of all triples in the batch respectively. Therefore both tensors should be of size (batch_size,) where pos_score_head[i] should be the score for predicting the head of triple batch[i,:] and pos_score_tail[i] should be the score for predicting the tail of triple batch[i,:] (Note: In many embedding models pos_score_head == pos_score_tail).

neg_score_head and neg_score_tail should contain the scores of for corrupted triples of triples in the batch. OpenBioLink is evaluated by corrupting the head/tail of a triple with all possible entities in the dataset. Consider \(\mathcal{K} = (\mathcal{E},\mathcal{R},\mathcal{T})\) where \(\mathcal{E}\) is the set of unique entities, \(\mathcal{R}\) is the set of unique relations, and \(\mathcal{T}\) are the triples in the dataset. Then the corrupted tail triples for a positive triple \((h, r, t)\) are \((h, r, e)\) for \(e \in \mathcal{E}\) and the corrupted head triples are \((e, r, t)\) for \(e \in \mathcal{E}\). A 1-d tensor containing all possible entities can be retrieved from self.entities on the Evaluator. The length of this tensor is also stored in self.num_neg. The shape of the tensors neg_score_head and neg_score_tail therefore should be (batch_size, num_neg), where neg_score_head[i,j] is the score of the triple batch[i] which head got corrupted with the entity j.

The following two examples show how to evaluate a model trained with DGL-KE and a rule-based approach called SAFRAN.

Calling evaluate()

After the creation of a custom class that implements Evaluator, your approach can be evaluated by calling evaluate() with the desired batch_size and the amount of processors to use for evaluation.

The attribute filtering should only be set to false, if you are evaluating a filtered top-k file (see example SAFRAN)

Example DGL-KE

    Import dependencies, such as the DGL-KE ScoreInfer class and the OpenBioLink DataLoader and Evaluator."""
import torch
import os
import numpy as np

from openbiolink.evaluation.dataLoader import DataLoader
from openbiolink.evaluation.evaluation import Evaluator

from dglke.models.infer import ScoreInfer
from dglke.utils import load_model_config

"""As we do not create a DGLGraph Object, DGL-KE needs an auxilary class that stores the embeddings of the positive edges"""
class FakeEdge(object):
    def __init__(self, head_emb, rel_emb, tail_emb):
        self._hobj = {}
        self._robj = {}
        self._tobj = {}
        self._hobj['emb'] = head_emb
        self._robj['emb'] = rel_emb
        self._tobj['emb'] = tail_emb

    def src(self):
        return self._hobj

    def dst(self):
        return self._tobj

    def data(self):
        return self._robj

class DglkeEvaluator(Evaluator):

    def __init__(self, dataset_name, model_path, entity_to_id_path, relation_to_id_path):
        dl = DataLoader(dataset_name, entity_to_id_path=entity_to_id_path, relation_to_id_path=relation_to_id_path)

        config = load_model_config(os.path.join(model_path, 'config.json'))
        model = ScoreInfer(-1, config, model_path)
        self.model = model.model

        self.entity_emb = self.model.entity_emb(self.entities.long())
        self.relation_emb = self.model.relation_emb(self.relations.long())

    def score_batch(self, batch):
        head_neg_score = self.model.score_func.create_neg(True)
        tail_neg_score = self.model.score_func.create_neg(False)
        head_neg_prepare = self.model.score_func.create_neg_prepare(True)
        tail_neg_prepare = self.model.score_func.create_neg_prepare(False)

        pos_head_emb = self.entity_emb[batch[:, 0], :]
        pos_tail_emb = self.entity_emb[batch[:, 2], :]
        pos_rel = batch[:, 1].long()
        pos_rel_emb = self.model.relation_emb(pos_rel)

        edata = FakeEdge(pos_head_emb, pos_rel_emb, pos_tail_emb)
        pos_score = self.model.score_func.edge_func(edata)['score']

        neg_head, tail = head_neg_prepare(pos_rel, 1, self.entity_emb, pos_tail_emb, -1, False)
        neg_scores_head = head_neg_score(neg_head, pos_rel_emb, tail,
                                         1, len(batch), self.num_neg)

        head, neg_tail = tail_neg_prepare(pos_rel, 1, pos_head_emb, self.entity_emb, -1, False)
        neg_scores_tail = tail_neg_score(head, pos_rel_emb, neg_tail,
                                         1, len(batch), self.num_neg)

        return pos_score, neg_scores_head.squeeze(0), pos_score, neg_scores_tail.squeeze(0)

if __name__ == "__main__":

    model_path = r"G:\ckpts\TransE_l2_FB15k_0"
    entity_to_id_path = r"G:\triples\entities.tsv"
    relation_to_id_path = r"G:\triples\relations.tsv"

    evaluator = DglkeEvaluator("HQ_DIR", model_path, entity_to_id_path, relation_to_id_path)
    result = evaluator.evaluate(100, -1)

Example SAFRAN

SAFRAN is a rule-based approach, that creates a filtered top-k text file in the form of

Heads: DOID:14320    0.9824  DOID:4964       0.9713  DOID:594        0.7095  DOID:10763      0.6424  DOID:8986       0.6423  DOID:1596       0.5923  DOID:10825      0.4874  DOID:1825       0.3771  DOID:750        0.3608  DOID:1470       0.3416
Tails: PUBCHEM.COMPOUND:10240        0.6357  PUBCHEM.COMPOUND:122282 0.5567  PUBCHEM.COMPOUND:4585   0.4798  PUBCHEM.COMPOUND:2160   0.4310  PUBCHEM.COMPOUND:3696   0.3965  PUBCHEM.COMPOUND:2726   0.2493  PUBCHEM.COMPOUND:3559   0.2251  PUBCHEM.COMPOUND:2995   0.2008  PUBCHEM.COMPOUND:2520   0.0172  PUBCHEM.COMPOUND:3386   0.0142
Heads: DOID:10933    0.9822  DOID:594        0.9551  DOID:14320      0.8485  DOID:11257      0.7170  DOID:2055       0.4382  DOID:2030       0.4334  DOID:0060891    0.3585  DOID:0060895    0.2242  DOID:9970       0.1990  DOID:0060896    0.0977
Tails: PUBCHEM.COMPOUND:10240        0.8967  PUBCHEM.COMPOUND:4585   0.7613  PUBCHEM.COMPOUND:2160   0.7521  PUBCHEM.COMPOUND:3696   0.7000  PUBCHEM.COMPOUND:2726   0.6095  PUBCHEM.COMPOUND:3559   0.5914  PUBCHEM.COMPOUND:2995   0.4491  PUBCHEM.COMPOUND:2520   0.4050  PUBCHEM.COMPOUND:3386   0.3957  PUBCHEM.COMPOUND:5002   0.1693
Heads: DOID:240      0.8082  DOID:13603      0.7477  DOID:12030      0.7133  DOID:4353       0.7011  DOID:2089       0.6067  DOID:13141      0.5286  DOID:9741       0.3540  DOID:10808      0.3119  DOID:14320      0.2678  DOID:4964       0.0284
Tails: PUBCHEM.COMPOUND:2712 0.9847  PUBCHEM.COMPOUND:10240  0.7101  PUBCHEM.COMPOUND:4585   0.6751  PUBCHEM.COMPOUND:2160   0.6031  PUBCHEM.COMPOUND:3696   0.5430

To evaluate such a filtered top-k file, a custom class is needed that reads the file on initialization and implements the score_batch() function of the Evaluator. As the file contains filtered top-k predictions, the prediction of all negative entities and the filtering can be omitted.

import torch
from openbiolink.evaluation.dataLoader import DataLoader
from openbiolink.evaluation.evaluation import Evaluator

class SafranEvaluator(Evaluator):

    def __init__(self, dataset_name, evaluation_file_path):
        dl = DataLoader(dataset_name)

        with open(evaluation_file_path) as infile:
            content = infile.readlines()
        content = [x.strip() for x in content]

        self.predictions = dict()

        for i in range(0, len(content), 3):
            head, rel, tail = content[i].split(" ")

            head = self.dl.entity_to_id[head]
            rel = self.dl.relation_to_id[rel]
            tail = self.dl.entity_to_id[tail]

            pos_head = 0.0
            neg_head = []
            head_predictions = content[i+1]
            if(head_predictions == "Heads:"):
                head_predictions = head_predictions[len("Heads: "):].split("\t")
                for j in range(0, len(head_predictions), 2):
                    head_prediction = self.dl.entity_to_id[head_predictions[j]]
                    confidence = float(head_predictions[j+1])
                    if head == head_prediction:
                        # Correct prediction
                        pos_head = confidence
                        # False prediction
                        neg_head.append((head_prediction, confidence))

            pos_tail = 0.0
            neg_tail = []
            tail_predictions = content[i+2]
            if tail_predictions == "Tails:":
                tail_predictions = tail_predictions[len("Tails: "):].split("\t")
                for j in range(0, len(tail_predictions), 2):
                    tail_prediction = self.dl.entity_to_id[tail_predictions[j]]
                    confidence = float(tail_predictions[j+1])
                    if tail == tail_prediction:
                        # Correct prediction
                        pos_tail = confidence
                        # False prediction
                        neg_tail.append((tail_prediction, confidence))
            self.predictions[f"{str(head)};{str(rel)};{str(tail)}"] = (pos_head, neg_head, pos_tail, neg_tail)

    def score_batch(self, batch):
        pos_score_head = torch.zeros((len(batch),), dtype=torch.float)
        neg_score_head = torch.zeros((100, self.num_neg), dtype=torch.float)
        pos_score_tail = torch.zeros((len(batch),), dtype=torch.float)
        neg_score_tail = torch.zeros((100, self.num_neg), dtype=torch.float)

        for i in range(batch.shape[0]):
            head, rel, tail = batch[i,:]
            key = f"{str(head.item())};{str(rel.item())};{str(tail.item())}"

            if key in self.predictions:
                (pos_head, neg_heads, pos_tail, neg_tails) = self.predictions[key]
                pos_score_head[i] = pos_head
                for neg_head, confidence in neg_heads:
                    neg_score_head[i, neg_head] = confidence

                pos_score_tail[i] = pos_tail
                for neg_tail, confidence in neg_tails:
                    neg_score_tail[i, neg_tail] = confidence

        return pos_score_head, neg_score_head, pos_score_tail, neg_score_tail

if __name__ == "__main__":

    evaluation_file_path = r"G:\prediction.txt"

    evaluator = SafranEvaluator("HQ_DIR", evaluation_file_path)
    result = evaluator.evaluate(100, 1, filtering=False)