libact.query_strategies.multiclass package

Submodules

libact.query_strategies.multiclass.active_learning_with_cost_embedding module

Active Learning with Cost Embedding (ALCE)

class libact.query_strategies.multiclass.active_learning_with_cost_embedding.ActiveLearningWithCostEmbedding(dataset, cost_matrix, base_regressor, embed_dim=None, mds_params={}, nn_params={}, random_state=None)

Bases: libact.base.interfaces.QueryStrategy

Active Learning with Cost Embedding (ALCE)

Cost sensitive multi-class algorithm. Assume each class has at least one sample in the labeled pool.

Parameters:
nn_

sklearn.neighbors.NearestNeighbors object instance

Examples

Here is an example of declaring a ActiveLearningWithCostEmbedding query_strategy object:

import numpy as np
from sklearn.svm import SVR

from libact.query_strategies.multiclass import ActiveLearningWithCostEmbedding as ALCE

cost_matrix = 2000. * np.random.rand(n_classes, n_classes)
qs3 = ALCE(dataset, cost_matrix, SVR())

References

[1]Kuan-Hao, and Hsuan-Tien Lin. “A Novel Uncertainty Sampling Algorithm for Cost-sensitive Multiclass Active Learning”, In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016
make_query()

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:int

libact.query_strategies.uncertainty_sampling module

Uncertainty Sampling

This module contains a class that implements two of the most well-known uncertainty sampling query strategies: the least confidence method and the smallest margin method (margin sampling).

class libact.query_strategies.uncertainty_sampling.UncertaintySampling(*args, **kwargs)

Bases: libact.base.interfaces.QueryStrategy

Uncertainty Sampling

This class implements Uncertainty Sampling active learning algorithm [1].

Parameters:
model

libact.base.interfaces.ContinuousModel or libact.base.interfaces.ProbabilisticModel object instance – The model trained in last query.

Examples

Here is an example of declaring a UncertaintySampling query_strategy object:

from libact.query_strategies import UncertaintySampling
from libact.models import LogisticRegression

qs = UncertaintySampling(
         dataset, # Dataset object
         model=LogisticRegression(C=0.1)
     )

Note that the model given in the model parameter must be a ContinuousModel which supports predict_real method.

References

[1]Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.
make_query(return_score=False)

Return the index of the sample to be queried and labeled and selection score of each sample. Read-only.

No modification to the internal states.

Returns:
  • ask_id (int) – The index of the next unlabeled sample to be queried and labeled.
  • score (list of (index, score) tuple) – Selection score of unlabled entries, the larger the better.

libact.query_strategies.multiclass.expected_error_reduction module

Expected Error Reduction

class libact.query_strategies.multiclass.expected_error_reduction.EER(dataset, model=None, loss='log', random_state=None)

Bases: libact.base.interfaces.QueryStrategy

Expected Error Reduction(EER)

This class implements EER active learning algorithm [1].

Parameters:
  • model (libact.base.interfaces.ProbabilisticModel object instance) – The base model used for training.
  • loss ({'01', 'log'}, optional (default='log')) – The loss function expected to reduce
  • random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
model

libact.base.interfaces.ProbabilisticModel object instance – The model trained in last query.

Examples

Here is an example of declaring a UncertaintySampling query_strategy object:

from libact.query_strategies import EER
from libact.models import LogisticRegression

qs = EER(dataset, model=LogisticRegression(C=0.1))

Note that the model given in the model parameter must be a ContinuousModel which supports predict_real method.

References

[1]Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.
make_query()

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:int

libact.query_strategies.multiclass.hierarchical_sampling module

Hierarchical Sampling for Active Learning (HS)

This module contains a class that implements Hierarchical Sampling for Active Learning (HS).

class libact.query_strategies.multiclass.hierarchical_sampling.HierarchicalSampling(dataset, classes, active_selecting=True, subsample_qs=None, random_state=None)

Bases: libact.base.interfaces.QueryStrategy

Hierarchical Sampling for Active Learning (HS)

HS is an active learning scheme that exploits cluster structure in data. The original C++ implementation by the authors can be found at: http://www.cs.columbia.edu/~djhsu/code/HS.tar.gz

Parameters:
  • classes (list) – List of distinct classes in data.
  • active_selecting ({True, False}, optional (default=True)) – False (random selecting): sample weight of a pruning is its number of unsean leaves. True (active selecting): sample weight of a pruning is its weighted error bound.
  • subsample_qs ({libact.base.interfaces.query_strategies, None}, optional (default=None)) – Subsample query strategy used to sample a node in the selected pruning. RandomSampling is used if None.
  • random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
m

int – number of nodes

classes

list – List of distinct classes in data.

n

int – number of leaf nodes

num_class

int – number of classes

parent

np.array instance, shape = (m) – parent indices

left_child

np.array instance, shape = (m) – left child indices

right_child

np.array instance, shape = (m) – right child indices

size

np.array instance, shape = (m) – number of leaves in subtree

depth

np.array instance, shape = (m) – maximum depth in subtree

count

np.array instance, shape = (m, num_class) – node class label counts

total

np.array instance, shape = (m) – total node class labels seen (total[i] = Sum_j count[i][j])

lower_bound

np.array instance, shape = (m, num_class) – upper bounds on true node class label counts

upper_bound

np.array instance, shape = (m, num_class) – lower bounds on true node class label counts

admissible

np.array instance, shape = (m, num_class) – flag indicating if (node,label) is admissible

best_label

np.array instance, shape = (m) – best admissible label

random_states_

np.random.RandomState instance – The random number generator using.

Examples

Here is an example of declaring a HierarchicalSampling query_strategy object:

from libact.query_strategies import UncertaintySampling
from libact.query_strategies.multiclass import HierarchicalSampling

sub_qs = UncertaintySampling(
    dataset, method='sm', model=SVM(decision_function_shape='ovr'))

qs = HierarchicalSampling(
         dataset, # Dataset object
         dataset.get_num_of_labels(),
         active_selecting=True,
         subsample_qs=sub_qs
     )

References

[1]Sanjoy Dasgupta and Daniel Hsu. “Hierarchical sampling for active learning.” ICML 2008.
make_query()

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:int
report_all_label()

Return the best label of the asked entry.

Returns:labels – The best label of all samples.
Return type:list of object, shape=(m)
report_entry_label(entry_id)

Return the best label of the asked entry.

Parameters:entry_id (int) – The index of the sample to ask.
Returns:label – The best label of the given sample.
Return type:object
update(entry_id, label)

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:
  • entry_id (int) – The index of the newly labeled sample.
  • label (float) – The label of the queried sample.

Module contents