libact.base package¶

Submodules¶

libact.base.dataset module¶

The dataset class used in this package. Datasets consists of data used for training, represented by a list of (feature, label) tuples. May be exported in different formats for application on other libraries.

class libact.base.dataset.Dataset(X=None, y=None)¶

Bases: object

libact dataset object

Parameters:	X ({array-like}, shape = (n_samples, n_features)) – Feature of sample set. y (list of {int, None}, shape = (n_samples)) – The ground truth (label) for corresponding sample. Unlabeled data should be given a label None.

data¶: list, shape = (n_samples) – List of all sample feature and label tuple.

append(feature, label=None)¶

Add a (feature, label) entry into the dataset. A None label indicates an unlabeled entry.

Parameters:	feature ({array-like}, shape = (n_features)) – Feature of the sample to append to dataset. label ({int, None}) – Label of the sample to append to dataset. None if unlabeled.
Returns:	entry_id – entry_id for the appened sample.
Return type:	{int}

format_sklearn()¶

Returns dataset in (X, y) format for use in scikit-learn. Unlabeled entries are ignored.

Returns:	X (numpy array, shape = (n_samples, n_features)) – Sample feature set. y (numpy array, shape = (n_samples)) – Sample labels.

get_entries()¶

Return the list of all sample feature and ground truth tuple.

Returns:	data – List of all sample feature and label tuple.
Return type:	list, shape = (n_samples)

get_labeled_entries()¶

Returns list of labeled feature and their label

Returns:	labeled_entries – Labeled entries
Return type:	list of (feature, label) tuple

get_num_of_labels()¶

Number of distinct lebels in this object.

Returns:	n_labels
Return type:	int

get_unlabeled_entries()¶

Returns list of unlabeled features, along with their entry_ids

Returns:	unlabeled_entries – Labeled entries
Return type:	list of (entry_id, feature) tuple

labeled_uniform_sample(sample_size, replace=True)¶

Returns a Dataset object with labeled data only, which is resampled uniformly with given sample size. Parameter replace decides whether sampling with replacement or not.

Parameters:	sample_size –

len_labeled()¶

Number of labeled data entries in this object.

Returns:	n_samples
Return type:	int

len_unlabeled()¶

Number of unlabeled data entries in this object.

Returns:	n_samples
Return type:	int

on_update(callback)¶

Add callback function to call when dataset updated.

Parameters:	callback (callable) – The function to be called when dataset is updated.

update(entry_id, new_label)¶

Updates an entry with entry_id with the given label

Parameters:	entry_id (int) – entry id of the sample to update. label ({int, None}) – Label of the sample to be update.

libact.base.dataset.import_libsvm_sparse(filename)¶: Imports dataset file in libsvm sparse format

libact.base.dataset.import_scipy_mat(filename)¶

libact.base.interfaces module¶

Base interfaces for use in the package. The package works according to the interfaces defined below.

class libact.base.interfaces.ContinuousModel¶

Bases: libact.base.interfaces.Model

Classification Model with intermediate continuous output

A continuous classification model is able to output a real-valued vector for each features provided.

predict_real(feature, *args, **kwargs)¶

Predict confidence scores for samples.

Returns the confidence score for each (sample, class) combination.

The larger the value for entry (sample=x, class=k) is, the more confident the model is about the sample x belonging to the class k.

Take Logistic Regression as example, the return value is the signed distance of that sample to the hyperplane.

Parameters:	feature (array-like, shape (n_samples, n_features)) – The samples whose confidence scores are to be predicted.
Returns:	X – Each entry is the confidence scores per (sample, class) combination.
Return type:	array-like, shape (n_samples, n_classes)

class libact.base.interfaces.Labeler¶

Bases: object

Label the queries made by QueryStrategies

Assign labels to the samples queried by QueryStrategies.

label(feature)¶

Return the class labels for the input feature array.

Parameters:	feature (array-like, shape (n_features,)) – The feature vector whose label is to queried.
Returns:	label – The class label of the queried feature.
Return type:	int

class libact.base.interfaces.Model¶

Bases: object

Classification Model

A Model returns a class-predicting function for future samples after trained on a training dataset.

predict(feature, *args, **kwargs)¶

Predict the class labels for the input samples

Parameters:	feature (array-like, shape (n_samples, n_features)) – The unlabeled samples whose labels are to be predicted.
Returns:	y_pred – The class labels for samples in the feature array.
Return type:	array-like, shape (n_samples,)

score(testing_dataset, *args, **kwargs)¶

Return the mean accuracy on the test dataset

Parameters:	testing_dataset (Dataset object) – The testing dataset used to measure the perforance of the trained model.
Returns:	score – Mean accuracy of self.predict(X) wrt. y.
Return type:	float

train(dataset, *args, **kwargs)¶

Train a model according to the given training dataset.

Parameters:	dataset (Dataset object) – The training dataset the model is to be trained on.
Returns:	self – Returns self.
Return type:	object

class libact.base.interfaces.MultilabelModel¶

Bases: libact.base.interfaces.Model

Multilabel Classification Model

A Model returns a multilabel-predicting function for future samples after trained on a training dataset.

class libact.base.interfaces.ProbabilisticModel¶

Bases: libact.base.interfaces.ContinuousModel

Classification Model with probability output

A probabilistic classification model is able to output a real-valued vector for each features provided.

predict_proba(feature, *args, **kwargs)¶

Predict probability estimate for samples.

Parameters:	feature (array-like, shape (n_samples, n_features)) – The samples whose probability estimation are to be predicted.
Returns:	X – Each entry is the prabablity estimate for each class.
Return type:	array-like, shape (n_samples, n_classes)

predict_real(feature, *args, **kwargs)¶

Predict confidence scores for samples.

Returns the confidence score for each (sample, class) combination.

The larger the value for entry (sample=x, class=k) is, the more confident the model is about the sample x belonging to the class k.

Take Logistic Regression as example, the return value is the signed distance of that sample to the hyperplane.

Parameters:	feature (array-like, shape (n_samples, n_features)) – The samples whose confidence scores are to be predicted.
Returns:	X – Each entry is the confidence scores per (sample, class) combination.
Return type:	array-like, shape (n_samples, n_classes)

class libact.base.interfaces.QueryStrategy(dataset, **kwargs)¶

Bases: object

Pool-based query strategy

A QueryStrategy advices on which unlabeled data to be queried next given a pool of labeled and unlabeled data.

dataset¶: The Dataset object that is associated with this QueryStrategy.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

update(entry_id, label)¶

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:	entry_id (int) – The index of the newly labeled sample. label (float) – The label of the queried sample.

libact.base package¶

Submodules¶

libact.base.dataset module¶

libact.base.interfaces module¶

Module contents¶