DataLoader

class openbiolink.evaluation.dataLoader.DataLoader(root='dataset', name='HQ_DIR', entity_to_id_path=None, relation_to_id_path=None)[source]
Parameters

root (str) – Pathlike string to directory in which dataset files should be stored

filter_scores(batch, scores, filter_col, filter_val=nan)[source]

Filters true positive .

Parameters
  • batch – Batch of triples. Shape (batch_size,3)

  • scores – Batch of triples. Shape (batch_size,num_entities)

  • filter_col – Batch of triples. Shape (batch_size,num_entities)

  • filter_val – Batch of triples. Shape (batch_size,num_entities), default NaN

Returns

filtered_scores: torch.tensor where the value at [i,j] is the score of the triple (j, batch[i][1], batch[i][2]). Shape (batch_size, num_entities)

get_test_batches(batch_size=100)[source]

Splits the test set into batches of fixed size :param batch_size:

Size of a batch

Return type

(int, Iterable[Tensor])

Returns

number of batches, iterable of batches

property num_entities: int

Number of entities in the dataset

Return type

int

property num_relations: int

Number of relations in the dataset

Return type

int

property testing: Tensor

Set of test triples. Shape (num_test, 3)

Return type

Tensor

property training: Tensor

Set of training triples. Shape (num_train, 3)

Return type

Tensor

property validation: Tensor

Set of validation triples. Shape (num_val, 3)

Return type

Tensor