Develop with Libact¶
To develope active learning usage under libact framwork, you may implement your own oracle, active learning algorithm and machine learning algorithms.
Write your own models¶
To implement your own models, your model class should inherent from either
libact.base.interfaces.Model
or
libact.base.interfaces.ContinuousModel
. For regular model, there are
three methods to be implmented: train()
, predict()
, and
score()
. For learning models that supports continuous output, method
predict_real()
should be implemented for ContinuousModel
.
train¶
Method train
takes in a Dataset
object, which may include both
labeled and unlabeled data. With supervised learning models, labeled data can be
retrieved like this:
X, y = zip(*Dataset.get_labeled_entries())
X
, y
is the samples (shape=(n_samples, n_feature)) and labels
(shape=(n_samples)).
You should train your model in this method like the fit
method in
scikit-learn model.
predict¶
This method should work like the predict
method in scikit-learn model.
Takes in the feature of each sample and output the label of the prediction for
these samples.
score¶
This method should calculate the accuracy on a given dataset’s labeled data.
predict_real¶
For models that can generate continuous predictions (for example, the distance to boundary).
Examples¶
Take a look at libact.models.svm.SVM
, it serves as an interface of
scikit-learn’s SVC model. The train method is connected to scikit-learn’s fit
method and predict is connected to scikit-learn’s predict. For the predict_real
method, it represens the decision value to each label.
class SVM(ContinuousModel):
"""C-Support Vector Machine Classifier
When decision_function_shape == 'ovr', we use OneVsRestClassifier(SVC) from
sklearn.multiclass instead of the output from SVC directory since it is not
exactly the implementation of One Vs Rest.
References
----------
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
"""
def __init__(self, *args, **kwargs):
self.model = sklearn.svm.SVC(*args, **kwargs)
if self.model.decision_function_shape == 'ovr':
self.decision_function_shape = 'ovr'
# sklearn's ovr isn't real ovr
self.model = OneVsRestClassifier(self.model)
def train(self, dataset, *args, **kwargs):
return self.model.fit(*(dataset.format_sklearn() + args), **kwargs)
def predict(self, feature, *args, **kwargs):
return self.model.predict(feature, *args, **kwargs)
def score(self, testing_dataset, *args, **kwargs):
return self.model.score(*(testing_dataset.format_sklearn() + args),
**kwargs)
def predict_real(self, feature, *args, **kwargs):
dvalue = self.model.decision_function(feature, *args, **kwargs)
if len(np.shape(dvalue)) == 1: # n_classes == 2
return np.vstack((-dvalue, dvalue)).T
else:
if self.decision_function_shape != 'ovr':
LOGGER.warn("SVM model support only 'ovr' for multiclass"
"predict_real.")
return dvalue
Implement your active learning algorithm¶
You may implement your own active learning algorithm under QueryStrategy
classes. QueryStrategy class should inherent from
libact.base.interfaces.QueryStrategy
and add the following into your
__init__ method.
super(YourClassName, self).__init__(*args, **kwargs)
This would associate the given dataset with your query strategy and registers the update method under the associated dataset as a callback function.
The update()
method should be used if the active learning algorithm
wants to change its internal state after the dataset is updated with newly
retrieved label. Take ALBL’s update()
method as example:
@inherit_docstring_from(QueryStrategy)
def update(self, entry_id, label):
# Calculate the next query after updating the question asked with an
# answer.
ask_idx = self.unlabeled_invert_id_idx[entry_id]
self.W.append(1. / self.query_dist[ask_idx])
self.queried_hist_.append(entry_id)
make_query()
is another method need to be implmented. It calculates
which sample to query and outputs the entry id of that sample. Take the
uncertainty sampling algorithm as example:
def make_query(self, return_score=False):
"""Return the index of the sample to be queried and labeled and
selection score of each sample. Read-only.
No modification to the internal states.
Returns
-------
ask_id : int
The index of the next unlabeled sample to be queried and labeled.
score : list of (index, score) tuple
Selection score of unlabled entries, the larger the better.
"""
dataset = self.dataset
self.model.train(dataset)
unlabeled_entry_ids, X_pool = zip(*dataset.get_unlabeled_entries())
if isinstance(self.model, ProbabilisticModel):
dvalue = self.model.predict_proba(X_pool)
elif isinstance(self.model, ContinuousModel):
dvalue = self.model.predict_real(X_pool)
if self.method == 'lc': # least confident
score = -np.max(dvalue, axis=1)
elif self.method == 'sm': # smallest margin
if np.shape(dvalue)[1] > 2:
# Find 2 largest decision values
dvalue = -(np.partition(-dvalue, 2, axis=1)[:, :2])
score = -np.abs(dvalue[:, 0] - dvalue[:, 1])
elif self.method == 'entropy':
score = np.sum(-dvalue * np.log(dvalue), axis=1)
ask_id = np.argmax(score)
if return_score:
return unlabeled_entry_ids[ask_id], \
list(zip(unlabeled_entry_ids, score))
else:
return unlabeled_entry_ids[ask_id]
In uncertainty sampling, it asks the sample with the lowest decision value (the
output from predict_real()
of a ContinuousModel
).
Write your Oracle¶
Different usage requires different ways of retrieving the label for an unlabeled
sameple, therefore you may want to implement your own oracle for different
condition To implement Labeler class you should inherent from
libact.base.interfaces.Labeler
and implment the label()
function with how to retrieve the label of a given sample (feature).
Examples¶
We have provided two example labelers:
libact.labelers.IdealLabeler
and
libact.labelers.InteractiveLabeler
.
IdealLabeler
is usually used for testing the performance of a active
learning algorithm. You give it a fully-labeled dataset, simulating a oracle
that know the true label of all samples. Its label()
is simple
searching through the given feature in the fully-labeled dataset and return the
corresponding label.
class IdealLabeler(Labeler):
"""
Provide the errorless/noiseless label to any feature vectors being queried.
Parameters
----------
dataset: Dataset object
Dataset object with the ground-truth label for each sample.
"""
def __init__(self, dataset, **kwargs):
X, y = zip(*dataset.get_entries())
# make sure the input dataset is fully labeled
assert (np.array(y) != np.array(None)).all()
self.X = X
self.y = y
@inherit_docstring_from(Labeler)
def label(self, feature):
return self.y[np.where([np.array_equal(x, feature)
for x in self.X])[0][0]]
InteractiveLabeler
can be used in the situation where you want to
show your feature through image, let a human be the oracle and label the image
interactively. To implement its label()
method, it may include showing
the feature through image using matplotlib.pyplot.imshow()
and receive
input through command line interface:
class InteractiveLabeler(Labeler):
"""Interactive Labeler
InteractiveLabeler is a Labeler object that shows the feature through image
using matplotlib and lets human label each feature through command line
interface.
Parameters
----------
label_name: list
Let the label space be from 0 to len(label_name)-1, this list
corresponds to each label's name.
"""
def __init__(self, **kwargs):
self.label_name = kwargs.pop('label_name', None)
@inherit_docstring_from(Labeler)
def label(self, feature):
plt.imshow(feature, cmap=plt.cm.gray_r, interpolation='nearest')
plt.draw()
banner = "Enter the associated label with the image: "
if self.label_name is not None:
banner += str(self.label_name) + ' '
lbl = input(banner)
while (self.label_name is not None) and (lbl not in self.label_name):
print('Invalid label, please re-enter the associated label.')
lbl = input(banner)
return self.label_name.index(lbl)