libact.base package

Submodules

libact.base.dataset module

The dataset class used in this package. Datasets consists of data used for training, represented by a list of (feature, label) tuples. May be exported in different formats for application on other libraries.

class libact.base.dataset.Dataset(X=None, y=None)

Bases: object

libact dataset object

Parameters:
  • X ({array-like}, shape = (n_samples, n_features)) – Feature of sample set.
  • y (list of {int, None}, shape = (n_samples)) – The ground truth (label) for corresponding sample. Unlabeled data should be given a label None.
data

list, shape = (n_samples) – List of all sample feature and label tuple.

append(feature, label=None)

Add a (feature, label) entry into the dataset. A None label indicates an unlabeled entry.

Parameters:
  • feature ({array-like}, shape = (n_features)) – Feature of the sample to append to dataset.
  • label ({int, None}) – Label of the sample to append to dataset. None if unlabeled.
Returns:

entry_id – entry_id for the appened sample.

Return type:

{int}

format_sklearn()

Returns dataset in (X, y) format for use in scikit-learn. Unlabeled entries are ignored.

Returns:
  • X (numpy array, shape = (n_samples, n_features)) – Sample feature set.
  • y (numpy array, shape = (n_samples)) – Sample labels.
get_entries()

Return the list of all sample feature and ground truth tuple.

Returns:data – List of all sample feature and label tuple.
Return type:list, shape = (n_samples)
get_labeled_entries()

Returns list of labeled feature and their label

Returns:labeled_entries – Labeled entries
Return type:list of (feature, label) tuple
get_num_of_labels()

Number of distinct lebels in this object.

Returns:n_labels
Return type:int
get_unlabeled_entries()

Returns list of unlabeled features, along with their entry_ids

Returns:unlabeled_entries – Labeled entries
Return type:list of (entry_id, feature) tuple
labeled_uniform_sample(sample_size, replace=True)

Returns a Dataset object with labeled data only, which is resampled uniformly with given sample size. Parameter replace decides whether sampling with replacement or not.

Parameters:sample_size
len_labeled()

Number of labeled data entries in this object.

Returns:n_samples
Return type:int
len_unlabeled()

Number of unlabeled data entries in this object.

Returns:n_samples
Return type:int
on_update(callback)

Add callback function to call when dataset updated.

Parameters:callback (callable) – The function to be called when dataset is updated.
update(entry_id, new_label)

Updates an entry with entry_id with the given label

Parameters:
  • entry_id (int) – entry id of the sample to update.
  • label ({int, None}) – Label of the sample to be update.
libact.base.dataset.import_libsvm_sparse(filename)

Imports dataset file in libsvm sparse format

libact.base.dataset.import_scipy_mat(filename)

libact.base.interfaces module

Base interfaces for use in the package. The package works according to the interfaces defined below.

class libact.base.interfaces.ContinuousModel

Bases: libact.base.interfaces.Model

Classification Model with intermediate continuous output

A continuous classification model is able to output a real-valued vector for each features provided.

predict_real(feature, *args, **kwargs)

Predict confidence scores for samples.

Returns the confidence score for each (sample, class) combination.

The larger the value for entry (sample=x, class=k) is, the more confident the model is about the sample x belonging to the class k.

Take Logistic Regression as example, the return value is the signed distance of that sample to the hyperplane.

Parameters:feature (array-like, shape (n_samples, n_features)) – The samples whose confidence scores are to be predicted.
Returns:X – Each entry is the confidence scores per (sample, class) combination.
Return type:array-like, shape (n_samples, n_classes)
class libact.base.interfaces.Labeler

Bases: object

Label the queries made by QueryStrategies

Assign labels to the samples queried by QueryStrategies.

label(feature)

Return the class labels for the input feature array.

Parameters:feature (array-like, shape (n_features,)) – The feature vector whose label is to queried.
Returns:label – The class label of the queried feature.
Return type:int
class libact.base.interfaces.Model

Bases: object

Classification Model

A Model returns a class-predicting function for future samples after trained on a training dataset.

predict(feature, *args, **kwargs)

Predict the class labels for the input samples

Parameters:feature (array-like, shape (n_samples, n_features)) – The unlabeled samples whose labels are to be predicted.
Returns:y_pred – The class labels for samples in the feature array.
Return type:array-like, shape (n_samples,)
score(testing_dataset, *args, **kwargs)

Return the mean accuracy on the test dataset

Parameters:testing_dataset (Dataset object) – The testing dataset used to measure the perforance of the trained model.
Returns:score – Mean accuracy of self.predict(X) wrt. y.
Return type:float
train(dataset, *args, **kwargs)

Train a model according to the given training dataset.

Parameters:dataset (Dataset object) – The training dataset the model is to be trained on.
Returns:self – Returns self.
Return type:object
class libact.base.interfaces.MultilabelModel

Bases: libact.base.interfaces.Model

Multilabel Classification Model

A Model returns a multilabel-predicting function for future samples after trained on a training dataset.

class libact.base.interfaces.ProbabilisticModel

Bases: libact.base.interfaces.ContinuousModel

Classification Model with probability output

A probabilistic classification model is able to output a real-valued vector for each features provided.

predict_proba(feature, *args, **kwargs)

Predict probability estimate for samples.

Parameters:feature (array-like, shape (n_samples, n_features)) – The samples whose probability estimation are to be predicted.
Returns:X – Each entry is the prabablity estimate for each class.
Return type:array-like, shape (n_samples, n_classes)
predict_real(feature, *args, **kwargs)

Predict confidence scores for samples.

Returns the confidence score for each (sample, class) combination.

The larger the value for entry (sample=x, class=k) is, the more confident the model is about the sample x belonging to the class k.

Take Logistic Regression as example, the return value is the signed distance of that sample to the hyperplane.

Parameters:feature (array-like, shape (n_samples, n_features)) – The samples whose confidence scores are to be predicted.
Returns:X – Each entry is the confidence scores per (sample, class) combination.
Return type:array-like, shape (n_samples, n_classes)
class libact.base.interfaces.QueryStrategy(dataset, **kwargs)

Bases: object

Pool-based query strategy

A QueryStrategy advices on which unlabeled data to be queried next given a pool of labeled and unlabeled data.

dataset

The Dataset object that is associated with this QueryStrategy.

make_query()

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:int
update(entry_id, label)

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:
  • entry_id (int) – The index of the newly labeled sample.
  • label (float) – The label of the queried sample.

Module contents