libact.query_strategies.multiclass package¶

Submodules¶

libact.query_strategies.multiclass.active_learning_with_cost_embedding module¶

Active Learning with Cost Embedding (ALCE)

class libact.query_strategies.multiclass.active_learning_with_cost_embedding.ActiveLearningWithCostEmbedding(dataset, cost_matrix, base_regressor, embed_dim=None, mds_params={}, nn_params={}, random_state=None)¶

Bases: libact.base.interfaces.QueryStrategy

Active Learning with Cost Embedding (ALCE)

Cost sensitive multi-class algorithm. Assume each class has at least one sample in the labeled pool.

Parameters:

cost_matrix (array-like, shape=(n_classes, n_classes)) – The ith row, jth column represents the cost of the ground truth being ith class and prediction as jth class.
mds_params (dict, optional) – http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html
nn_params (dict, optional) – http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html
embed_dim (int, optional (default: None)) – if is None, embed_dim = n_classes
base_regressor (sklearn regressor) –
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

nn_¶: sklearn.neighbors.NearestNeighbors object instance

Examples

Here is an example of declaring a ActiveLearningWithCostEmbedding query_strategy object:

import numpy as np
from sklearn.svm import SVR

from libact.query_strategies.multiclass import ActiveLearningWithCostEmbedding as ALCE

cost_matrix = 2000. * np.random.rand(n_classes, n_classes)
qs3 = ALCE(dataset, cost_matrix, SVR())

References

[1]	Kuan-Hao, and Hsuan-Tien Lin. “A Novel Uncertainty Sampling Algorithm for Cost-sensitive Multiclass Active Learning”, In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

libact.query_strategies.uncertainty_sampling module¶

Uncertainty Sampling

This module contains a class that implements two of the most well-known uncertainty sampling query strategies: the least confidence method and the smallest margin method (margin sampling).

class libact.query_strategies.uncertainty_sampling.UncertaintySampling(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Uncertainty Sampling

This class implements Uncertainty Sampling active learning algorithm [1].

Parameters:

model (libact.base.interfaces.ContinuousModel or libact.base.interfaces.ProbabilisticModel object instance) – The base model used for training.
method ({'lc', 'sm', 'entropy'}, optional (default='lc')) – least confidence (lc), it queries the instance whose posterior probability of being positive is nearest 0.5 (for binary classification); smallest margin (sm), it queries the instance whose posterior probability gap between the most and the second probable labels is minimal; entropy, requires libact.base.interfaces.ProbabilisticModel to be passed in as model parameter;

model¶: libact.base.interfaces.ContinuousModel or libact.base.interfaces.ProbabilisticModel object instance – The model trained in last query.

Examples

Here is an example of declaring a UncertaintySampling query_strategy object:

from libact.query_strategies import UncertaintySampling
from libact.models import LogisticRegression

qs = UncertaintySampling(
         dataset, # Dataset object
         model=LogisticRegression(C=0.1)
     )

Note that the model given in the model parameter must be a ContinuousModel which supports predict_real method.

References

[1]	Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.

make_query(return_score=False)¶

Return the index of the sample to be queried and labeled and selection score of each sample. Read-only.

No modification to the internal states.

Returns:	ask_id (int) – The index of the next unlabeled sample to be queried and labeled. score (list of (index, score) tuple) – Selection score of unlabled entries, the larger the better.

libact.query_strategies.multiclass.expected_error_reduction module¶

Expected Error Reduction

class libact.query_strategies.multiclass.expected_error_reduction.EER(dataset, model=None, loss='log', random_state=None)¶

Bases: libact.base.interfaces.QueryStrategy

Expected Error Reduction(EER)

This class implements EER active learning algorithm [1].

Parameters:

model (libact.base.interfaces.ProbabilisticModel object instance) – The base model used for training.
loss ({'01', 'log'}, optional (default='log')) – The loss function expected to reduce
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

model¶: libact.base.interfaces.ProbabilisticModel object instance – The model trained in last query.

Examples

Here is an example of declaring a UncertaintySampling query_strategy object:

from libact.query_strategies import EER
from libact.models import LogisticRegression

qs = EER(dataset, model=LogisticRegression(C=0.1))

Note that the model given in the model parameter must be a ContinuousModel which supports predict_real method.

References

[1]	Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

libact.query_strategies.multiclass.hierarchical_sampling module¶

Hierarchical Sampling for Active Learning (HS)

This module contains a class that implements Hierarchical Sampling for Active Learning (HS).

class libact.query_strategies.multiclass.hierarchical_sampling.HierarchicalSampling(dataset, classes, active_selecting=True, subsample_qs=None, random_state=None)¶

Bases: libact.base.interfaces.QueryStrategy

Hierarchical Sampling for Active Learning (HS)

HS is an active learning scheme that exploits cluster structure in data. The original C++ implementation by the authors can be found at: http://www.cs.columbia.edu/~djhsu/code/HS.tar.gz

Parameters:

classes (list) – List of distinct classes in data.
active_selecting ({True, False}, optional (default=True)) – False (random selecting): sample weight of a pruning is its number of unsean leaves. True (active selecting): sample weight of a pruning is its weighted error bound.
subsample_qs ({libact.base.interfaces.query_strategies, None}, optional (default=None)) – Subsample query strategy used to sample a node in the selected pruning. RandomSampling is used if None.
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

m¶: int – number of nodes

classes¶: list – List of distinct classes in data.

n¶: int – number of leaf nodes

num_class¶: int – number of classes

parent¶: np.array instance, shape = (m) – parent indices

left_child¶: np.array instance, shape = (m) – left child indices

right_child¶: np.array instance, shape = (m) – right child indices

size¶: np.array instance, shape = (m) – number of leaves in subtree

depth¶: np.array instance, shape = (m) – maximum depth in subtree

count¶: np.array instance, shape = (m, num_class) – node class label counts

total¶: np.array instance, shape = (m) – total node class labels seen (total[i] = Sum_j count[i][j])

lower_bound¶: np.array instance, shape = (m, num_class) – upper bounds on true node class label counts

upper_bound¶: np.array instance, shape = (m, num_class) – lower bounds on true node class label counts

admissible¶: np.array instance, shape = (m, num_class) – flag indicating if (node,label) is admissible

best_label¶: np.array instance, shape = (m) – best admissible label

random_states_¶: np.random.RandomState instance – The random number generator using.

Examples

Here is an example of declaring a HierarchicalSampling query_strategy object:

from libact.query_strategies import UncertaintySampling
from libact.query_strategies.multiclass import HierarchicalSampling

sub_qs = UncertaintySampling(
    dataset, method='sm', model=SVM(decision_function_shape='ovr'))

qs = HierarchicalSampling(
         dataset, # Dataset object
         dataset.get_num_of_labels(),
         active_selecting=True,
         subsample_qs=sub_qs
     )

References

[1]	Sanjoy Dasgupta and Daniel Hsu. “Hierarchical sampling for active learning.” ICML 2008.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

report_all_label()¶

Return the best label of the asked entry.

Returns:	labels – The best label of all samples.
Return type:	list of object, shape=(m)

report_entry_label(entry_id)¶

Return the best label of the asked entry.

Parameters:	entry_id (int) – The index of the sample to ask.
Returns:	label – The best label of the given sample.
Return type:	object

update(entry_id, label)¶

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:	entry_id (int) – The index of the newly labeled sample. label (float) – The label of the queried sample.

libact.query_strategies.multiclass package¶

Submodules¶

libact.query_strategies.multiclass.active_learning_with_cost_embedding module¶

libact.query_strategies.uncertainty_sampling module¶

libact.query_strategies.multiclass.expected_error_reduction module¶

libact.query_strategies.multiclass.hierarchical_sampling module¶

Module contents¶