libact.query_strategies.multiclass package¶
Submodules¶
libact.query_strategies.multiclass.active_learning_with_cost_embedding module¶
Active Learning with Cost Embedding (ALCE)
-
class
libact.query_strategies.multiclass.active_learning_with_cost_embedding.
ActiveLearningWithCostEmbedding
(dataset, cost_matrix, base_regressor, embed_dim=None, mds_params={}, nn_params={}, random_state=None)¶ Bases:
libact.base.interfaces.QueryStrategy
Active Learning with Cost Embedding (ALCE)
Cost sensitive multi-class algorithm. Assume each class has at least one sample in the labeled pool.
Parameters: - cost_matrix (array-like, shape=(n_classes, n_classes)) – The ith row, jth column represents the cost of the ground truth being ith class and prediction as jth class.
- mds_params (dict, optional) – http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html
- nn_params (dict, optional) – http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html
- embed_dim (int, optional (default: None)) – if is None, embed_dim = n_classes
- base_regressor (sklearn regressor) –
- random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
-
nn_
¶ sklearn.neighbors.NearestNeighbors object instance
Examples
Here is an example of declaring a ActiveLearningWithCostEmbedding query_strategy object:
import numpy as np from sklearn.svm import SVR from libact.query_strategies.multiclass import ActiveLearningWithCostEmbedding as ALCE cost_matrix = 2000. * np.random.rand(n_classes, n_classes) qs3 = ALCE(dataset, cost_matrix, SVR())
References
[1] Kuan-Hao, and Hsuan-Tien Lin. “A Novel Uncertainty Sampling Algorithm for Cost-sensitive Multiclass Active Learning”, In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016
libact.query_strategies.uncertainty_sampling module¶
Uncertainty Sampling
This module contains a class that implements two of the most well-known uncertainty sampling query strategies: the least confidence method and the smallest margin method (margin sampling).
-
class
libact.query_strategies.uncertainty_sampling.
UncertaintySampling
(*args, **kwargs)¶ Bases:
libact.base.interfaces.QueryStrategy
Uncertainty Sampling
This class implements Uncertainty Sampling active learning algorithm [1].
Parameters: - model (
libact.base.interfaces.ContinuousModel
orlibact.base.interfaces.ProbabilisticModel
object instance) – The base model used for training. - method ({'lc', 'sm', 'entropy'}, optional (default='lc')) – least confidence (lc), it queries the instance whose posterior
probability of being positive is nearest 0.5 (for binary
classification);
smallest margin (sm), it queries the instance whose posterior
probability gap between the most and the second probable labels is
minimal;
entropy, requires
libact.base.interfaces.ProbabilisticModel
to be passed in as model parameter;
-
model
¶ libact.base.interfaces.ContinuousModel
orlibact.base.interfaces.ProbabilisticModel
object instance – The model trained in last query.
Examples
Here is an example of declaring a UncertaintySampling query_strategy object:
from libact.query_strategies import UncertaintySampling from libact.models import LogisticRegression qs = UncertaintySampling( dataset, # Dataset object model=LogisticRegression(C=0.1) )
Note that the model given in the
model
parameter must be aContinuousModel
which supports predict_real method.References
[1] Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11. -
make_query
(return_score=False)¶ Return the index of the sample to be queried and labeled and selection score of each sample. Read-only.
No modification to the internal states.
Returns: - ask_id (int) – The index of the next unlabeled sample to be queried and labeled.
- score (list of (index, score) tuple) – Selection score of unlabled entries, the larger the better.
- model (
libact.query_strategies.multiclass.expected_error_reduction module¶
Expected Error Reduction
-
class
libact.query_strategies.multiclass.expected_error_reduction.
EER
(dataset, model=None, loss='log', random_state=None)¶ Bases:
libact.base.interfaces.QueryStrategy
Expected Error Reduction(EER)
This class implements EER active learning algorithm [1].
Parameters: - model (
libact.base.interfaces.ProbabilisticModel
object instance) – The base model used for training. - loss ({'01', 'log'}, optional (default='log')) – The loss function expected to reduce
- random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
-
model
¶ libact.base.interfaces.ProbabilisticModel
object instance – The model trained in last query.
Examples
Here is an example of declaring a UncertaintySampling query_strategy object:
from libact.query_strategies import EER from libact.models import LogisticRegression qs = EER(dataset, model=LogisticRegression(C=0.1))
Note that the model given in the
model
parameter must be aContinuousModel
which supports predict_real method.References
[1] Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11. - model (
libact.query_strategies.multiclass.hierarchical_sampling module¶
Hierarchical Sampling for Active Learning (HS)
This module contains a class that implements Hierarchical Sampling for Active Learning (HS).
-
class
libact.query_strategies.multiclass.hierarchical_sampling.
HierarchicalSampling
(dataset, classes, active_selecting=True, subsample_qs=None, random_state=None)¶ Bases:
libact.base.interfaces.QueryStrategy
Hierarchical Sampling for Active Learning (HS)
HS is an active learning scheme that exploits cluster structure in data. The original C++ implementation by the authors can be found at: http://www.cs.columbia.edu/~djhsu/code/HS.tar.gz
Parameters: - classes (list) – List of distinct classes in data.
- active_selecting ({True, False}, optional (default=True)) – False (random selecting): sample weight of a pruning is its number of unsean leaves. True (active selecting): sample weight of a pruning is its weighted error bound.
- subsample_qs ({
libact.base.interfaces.query_strategies
, None}, optional (default=None)) – Subsample query strategy used to sample a node in the selected pruning. RandomSampling is used if None. - random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
-
m
¶ int – number of nodes
-
classes
¶ list – List of distinct classes in data.
-
n
¶ int – number of leaf nodes
-
num_class
¶ int – number of classes
-
parent
¶ np.array instance, shape = (m) – parent indices
-
left_child
¶ np.array instance, shape = (m) – left child indices
-
right_child
¶ np.array instance, shape = (m) – right child indices
-
size
¶ np.array instance, shape = (m) – number of leaves in subtree
-
depth
¶ np.array instance, shape = (m) – maximum depth in subtree
-
count
¶ np.array instance, shape = (m, num_class) – node class label counts
-
total
¶ np.array instance, shape = (m) – total node class labels seen (total[i] = Sum_j count[i][j])
-
lower_bound
¶ np.array instance, shape = (m, num_class) – upper bounds on true node class label counts
-
upper_bound
¶ np.array instance, shape = (m, num_class) – lower bounds on true node class label counts
-
admissible
¶ np.array instance, shape = (m, num_class) – flag indicating if (node,label) is admissible
-
best_label
¶ np.array instance, shape = (m) – best admissible label
-
random_states_
¶ np.random.RandomState instance – The random number generator using.
Examples
Here is an example of declaring a HierarchicalSampling query_strategy object:
from libact.query_strategies import UncertaintySampling from libact.query_strategies.multiclass import HierarchicalSampling sub_qs = UncertaintySampling( dataset, method='sm', model=SVM(decision_function_shape='ovr')) qs = HierarchicalSampling( dataset, # Dataset object dataset.get_num_of_labels(), active_selecting=True, subsample_qs=sub_qs )
References
[1] Sanjoy Dasgupta and Daniel Hsu. “Hierarchical sampling for active learning.” ICML 2008. -
make_query
()¶ Return the index of the sample to be queried and labeled. Read-only.
No modification to the internal states.
Returns: ask_id – The index of the next unlabeled sample to be queried and labeled. Return type: int
-
report_all_label
()¶ Return the best label of the asked entry.
Returns: labels – The best label of all samples. Return type: list of object, shape=(m)