Develop with Libact

To develope active learning usage under libact framwork, you may implement your own oracle, active learning algorithm and machine learning algorithms.

Write your own models

To implement your own models, your model class should inherent from either libact.base.interfaces.Model or libact.base.interfaces.ContinuousModel. For regular model, there are three methods to be implmented: train(), predict(), and score(). For learning models that supports continuous output, method predict_real() should be implemented for ContinuousModel.

train

Method train takes in a Dataset object, which may include both labeled and unlabeled data. With supervised learning models, labeled data can be retrieved like this:

X, y = zip(*Dataset.get_labeled_entries())

X, y is the samples (shape=(n_samples, n_feature)) and labels (shape=(n_samples)).

You should train your model in this method like the fit method in scikit-learn model.

predict

This method should work like the predict method in scikit-learn model. Takes in the feature of each sample and output the label of the prediction for these samples.

score

This method should calculate the accuracy on a given dataset’s labeled data.

predict_real

For models that can generate continuous predictions (for example, the distance to boundary).

Examples

Take a look at libact.models.svm.SVM, it serves as an interface of scikit-learn’s SVC model. The train method is connected to scikit-learn’s fit method and predict is connected to scikit-learn’s predict. For the predict_real method, it represens the decision value to each label.

class SVM(ContinuousModel):

    """C-Support Vector Machine Classifier

    When decision_function_shape == 'ovr', we use OneVsRestClassifier(SVC) from
    sklearn.multiclass instead of the output from SVC directory since it is not
    exactly the implementation of One Vs Rest.

    References
    ----------
    http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
    """

    def __init__(self, *args, **kwargs):
        self.model = sklearn.svm.SVC(*args, **kwargs)
        if self.model.decision_function_shape == 'ovr':
            self.decision_function_shape = 'ovr'
            # sklearn's ovr isn't real ovr
            self.model = OneVsRestClassifier(self.model)

    def train(self, dataset, *args, **kwargs):
        return self.model.fit(*(dataset.format_sklearn() + args), **kwargs)

    def predict(self, feature, *args, **kwargs):
        return self.model.predict(feature, *args, **kwargs)

    def score(self, testing_dataset, *args, **kwargs):
        return self.model.score(*(testing_dataset.format_sklearn() + args),
                                **kwargs)

    def predict_real(self, feature, *args, **kwargs):
        dvalue = self.model.decision_function(feature, *args, **kwargs)
        if len(np.shape(dvalue)) == 1:  # n_classes == 2
            return np.vstack((-dvalue, dvalue)).T
        else:
            if self.decision_function_shape != 'ovr':
                LOGGER.warn("SVM model support only 'ovr' for multiclass"
                            "predict_real.")
            return dvalue

Implement your active learning algorithm

You may implement your own active learning algorithm under QueryStrategy classes. QueryStrategy class should inherent from libact.base.interfaces.QueryStrategy and add the following into your __init__ method.

super(YourClassName, self).__init__(*args, **kwargs)

This would associate the given dataset with your query strategy and registers the update method under the associated dataset as a callback function.

The update() method should be used if the active learning algorithm wants to change its internal state after the dataset is updated with newly retrieved label. Take ALBL’s update() method as example:

    @inherit_docstring_from(QueryStrategy)
    def update(self, entry_id, label):
        # Calculate the next query after updating the question asked with an
        # answer.
        ask_idx = self.unlabeled_invert_id_idx[entry_id]
        self.W.append(1. / self.query_dist[ask_idx])
        self.queried_hist_.append(entry_id)

make_query() is another method need to be implmented. It calculates which sample to query and outputs the entry id of that sample. Take the uncertainty sampling algorithm as example:

    def make_query(self, return_score=False):
        """Return the index of the sample to be queried and labeled and
        selection score of each sample. Read-only.

        No modification to the internal states.

        Returns
        -------
        ask_id : int
            The index of the next unlabeled sample to be queried and labeled.

        score : list of (index, score) tuple
            Selection score of unlabled entries, the larger the better.

        """
        dataset = self.dataset
        self.model.train(dataset)

        unlabeled_entry_ids, X_pool = zip(*dataset.get_unlabeled_entries())

        if isinstance(self.model, ProbabilisticModel):
            dvalue = self.model.predict_proba(X_pool)
        elif isinstance(self.model, ContinuousModel):
            dvalue = self.model.predict_real(X_pool)

        if self.method == 'lc':  # least confident
            score = -np.max(dvalue, axis=1)

        elif self.method == 'sm':  # smallest margin
            if np.shape(dvalue)[1] > 2:
                # Find 2 largest decision values
                dvalue = -(np.partition(-dvalue, 2, axis=1)[:, :2])
            score = -np.abs(dvalue[:, 0] - dvalue[:, 1])

        elif self.method == 'entropy':
            score = np.sum(-dvalue * np.log(dvalue), axis=1)

        ask_id = np.argmax(score)

        if return_score:
            return unlabeled_entry_ids[ask_id], \
                   list(zip(unlabeled_entry_ids, score))
        else:
            return unlabeled_entry_ids[ask_id]

In uncertainty sampling, it asks the sample with the lowest decision value (the output from predict_real() of a ContinuousModel).

Write your Oracle

Different usage requires different ways of retrieving the label for an unlabeled sameple, therefore you may want to implement your own oracle for different condition To implement Labeler class you should inherent from libact.base.interfaces.Labeler and implment the label() function with how to retrieve the label of a given sample (feature).

Examples

We have provided two example labelers: libact.labelers.IdealLabeler and libact.labelers.InteractiveLabeler.

IdealLabeler is usually used for testing the performance of a active learning algorithm. You give it a fully-labeled dataset, simulating a oracle that know the true label of all samples. Its label() is simple searching through the given feature in the fully-labeled dataset and return the corresponding label.

class IdealLabeler(Labeler):

    """
    Provide the errorless/noiseless label to any feature vectors being queried.

    Parameters
    ----------
    dataset: Dataset object
        Dataset object with the ground-truth label for each sample.

    """

    def __init__(self, dataset, **kwargs):
        X, y = zip(*dataset.get_entries())
        # make sure the input dataset is fully labeled
        assert (np.array(y) != np.array(None)).all()
        self.X = X
        self.y = y

    @inherit_docstring_from(Labeler)
    def label(self, feature):
        return self.y[np.where([np.array_equal(x, feature)
                                for x in self.X])[0][0]]

InteractiveLabeler can be used in the situation where you want to show your feature through image, let a human be the oracle and label the image interactively. To implement its label() method, it may include showing the feature through image using matplotlib.pyplot.imshow() and receive input through command line interface:

class InteractiveLabeler(Labeler):

    """Interactive Labeler

    InteractiveLabeler is a Labeler object that shows the feature through image
    using matplotlib and lets human label each feature through command line
    interface.

    Parameters
    ----------
    label_name: list
        Let the label space be from 0 to len(label_name)-1, this list
        corresponds to each label's name.

    """

    def __init__(self, **kwargs):
        self.label_name = kwargs.pop('label_name', None)

    @inherit_docstring_from(Labeler)
    def label(self, feature):
        plt.imshow(feature, cmap=plt.cm.gray_r, interpolation='nearest')
        plt.draw()

        banner = "Enter the associated label with the image: "

        if self.label_name is not None:
            banner += str(self.label_name) + ' '

        lbl = input(banner)

        while (self.label_name is not None) and (lbl not in self.label_name):
            print('Invalid label, please re-enter the associated label.')
            lbl = input(banner)

        return self.label_name.index(lbl)