libact.query_strategies package¶

Submodules¶

libact.query_strategies.active_learning_by_learning module¶

Active learning by learning (ALBL)

This module includes two classes. ActiveLearningByLearning is the main algorithm for ALBL and Exp4P is the multi-armed bandit algorithm which will be used in ALBL.

class libact.query_strategies.active_learning_by_learning.ActiveLearningByLearning(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Active Learning By Learning (ALBL) query strategy.

ALBL is an active learning algorithm that adaptively choose among existing query strategies to decide which data to make query. It utilizes Exp4.P, a multi-armed bandit algorithm to adaptively make such decision. More details of ALBL can refer to the work listed in the reference section.

Parameters:

T (integer) – Query budget, the maximal number of queries to be made.
query_strategies (list of libact.query_strategies) –
instance (object) – The active learning algorithms used in ALBL, which will be both the the arms in the multi-armed bandit algorithm Exp4.P. Note that these query_strategies should share the same dataset instance with ActiveLearningByLearning instance.
delta (float, optional (default=0.1)) – Parameter for Exp4.P.
uniform_sampler ({True, False}, optional (default=True)) – Determining whether to include uniform random sample as one of arms.
pmin (float, 0<pmin< \(\frac{1}{len(query\_strategies)}\),) – optional (default= \(\frac{\sqrt{\log{N}}}{KT}\)) Parameter for Exp4.P. The minimal probability for random selection of the arms (aka the underlying active learning algorithms). N = K = number of query_strategies, T is the number of query budgets.
model (libact.models object instance) – The learning model used for the task.
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

query_strategies_¶: list of libact.query_strategies object instance – The active learning algorithm instances.

exp4p_¶: instance of Exp4P object – The multi-armed bandit instance.

queried_hist_¶: list of integer – A list of entry_id of the dataset which is queried in the past.

random_states_¶: np.random.RandomState instance – The random number generator using.

Examples

Here is an example of how to declare a ActiveLearningByLearning query_strategy object:

from libact.query_strategies import ActiveLearningByLearning
from libact.query_strategies import HintSVM
from libact.query_strategies import UncertaintySampling
from libact.models import LogisticRegression

qs = ActiveLearningByLearning(
     dataset, # Dataset object
     T=100, # qs.make_query can be called for at most 100 times
     query_strategies=[
         UncertaintySampling(dataset, model=LogisticRegression(C=1.)),
         UncertaintySampling(dataset, model=LogisticRegression(C=.01)),
         HintSVM(dataset)
         ],
     model=LogisticRegression()
 )

The query_strategies parameter is a list of libact.query_strategies object instances where each of their associated dataset must be the same Dataset instance. ALBL combines the result of these query strategies and generate its own suggestion of which sample to query. ALBL will adaptively learn from each of the decision it made, using the given supervised learning model in model parameter to evaluate its IW-ACC.

References

[1]	Wei-Ning Hsu, and Hsuan-Tien Lin. “Active Learning by Learning.” Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

calc_query()¶: Calculate the sampling query distribution

calc_reward_fn()¶: Calculate the reward value

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

update(entry_id, label)¶

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:	entry_id (int) – The index of the newly labeled sample. label (float) – The label of the queried sample.

class libact.query_strategies.active_learning_by_learning.Exp4P(*args, **kwargs)¶

Bases: object

A multi-armed bandit algorithm Exp4.P.

For the Exp4.P used in ALBL, the number of arms (actions) and number of experts are equal to the number of active learning algorithms wanted to use. The arms (actions) are the active learning algorithms, where is inputed from parameter ‘query_strategies’. There is no need for the input of experts, the advice of the kth expert are always equal e_k, where e_k is the kth column of the identity matrix.

Parameters:

query_strategies (QueryStrategy instances) – The active learning algorithms wanted to use, it is equivalent to actions or arms in original Exp4.P.
unlabeled_invert_id_idx (dict) – A look up table for the correspondance of entry_id to the index of the unlabeled data.
delta (float, >0, optional (default=0.1)) – A parameter.
pmin (float, 0<pmin<1/len(query_strategies), optional (default= \(\frac{\sqrt{log(N)}}{KT}\))) – The minimal probability for random selection of the arms (aka the unlabeled data), N = K = number of query_strategies, T is the maximum number of rounds.
T (int, optional (default=100)) – The maximum number of rounds.
uniform_sampler ({True, False}, optional (default=Truee)) – Determining whether to include uniform random sampler as one of the underlying active learning algorithms.

t¶: int – The current round this instance is at.

N¶: int – The number of arms (actions) in this exp4.p instance.

query_models_¶: list of libact.query_strategies object instance – The underlying active learning algorithm instances.

References

[1]	Beygelzimer, Alina, et al. “Contextual bandit algorithms with supervised learning guarantees.” In Proceedings on the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011u.

exp4p()¶

The generator which implements the main part of Exp4.P.

Parameters:	reward (float) – The reward value calculated from ALBL. ask_id (integer) – The entry_id of the sample point ALBL asked. lbl (integer) – The answer received from asking the entry_id ask_id.
Yields:	q (array-like, shape = [K]) – The query vector which tells ALBL what kind of distribution if should sample from the unlabeled pool.

next(reward, ask_id, lbl)¶: Taking the label and the reward value of last question and returns the next question to ask.

libact.query_strategies.hintsvm module¶

Hinted Support Vector Machine

This module contains a class that implements Hinted Support Vector Machine, an active learning algorithm.

Standalone hintsvm can be retrieved from https://github.com/yangarbiter/hintsvm

class libact.query_strategies.hintsvm.HintSVM(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Hinted Support Vector Machine

Hinted Support Vector Machine is an active learning algorithm within the hined sampling framework with an extended support vector machine.

Parameters:

Cl (float, >0, optional (default=0.1)) – The weight of the classification error on labeled pool.
Ch (float, >0, optional (default=0.1)) – The weight of the hint error on hint pool.
p (float, >0 and <=1, optional (default=.5)) – The probability to select an instance from unlabeld pool to hint pool.
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.
kernel ({'linear', 'poly', 'rbf', 'sigmoid'}, optional (default='linear')) – linear: u’*v poly: (gamma*u’*v + coef0)^degree rbf: exp(-gamma*|u-v|^2) sigmoid: tanh(gamma*u’*v + coef0)
degree (int, optional (default=3)) – Parameter for kernel function.
gamma (float, optional (default=0.1)) – Parameter for kernel function.
coef0 (float, optional (default=0.)) – Parameter for kernel function.
tol (float, optional (default=1e-3)) – Tolerance of termination criterion.
shrinking ({0, 1}, optional (default=1)) – Whether to use the shrinking heuristics.
cache_size (float, optional (default=100.)) – Set cache memory size in MB.
verbose (int, optional (default=0)) – Set verbosity level for hintsvm solver.

random_states_¶: np.random.RandomState instance – The random number generator using.

Examples

Here is an example of declaring a HintSVM query_strategy object:

from libact.query_strategies import HintSVM

qs = HintSVM(
     dataset, # Dataset object
     Cl=0.01,
     p=0.8,
     )

References

[1]	Li, Chun-Liang, Chun-Sung Ferng, and Hsuan-Tien Lin. “Active Learning with Hinted Support Vector Machine.” ACML. 2012.

[2]	Chun-Liang Li, Chun-Sung Ferng, and Hsuan-Tien Lin. Active learning using hint information. Neural Computation, 27(8):1738–1765, August 2015.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

libact.query_strategies.query_by_committee module¶

Query by committee

This module contains a class that implements Query by committee active learning algorithm.

class libact.query_strategies.query_by_committee.QueryByCommittee(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Query by committee

Parameters:

models (list of libact.models instances or str) – This parameter accepts a list of initialized libact Model instances, or class names of libact Model classes to determine the models to be included in the committee to vote for each unlabeled instance.
disagreement (['vote', 'kl_divergence'], optional (default='vote')) – Sets the method for measuring disagreement between models. ‘vote’ represents vote entropy. kl_divergence requires models being ProbabilisticModel
random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

students¶: list, shape = (len(models)) – A list of the model instances used in this algorithm.

random_states_¶: np.random.RandomState instance – The random number generator using.

Examples

Here is an example of declaring a QueryByCommittee query_strategy object:

from libact.query_strategies import QueryByCommittee
from libact.models import LogisticRegression

qs = QueryByCommittee(
         dataset, # Dataset object
         models=[
             LogisticRegression(C=1.0),
             LogisticRegression(C=0.1),
         ],
     )

References

[1]	Seung, H. Sebastian, Manfred Opper, and Haim Sompolinsky. “Query by committee.” Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

teach_students()¶: Train each model (student) with the labeled data using bootstrap aggregating (bagging).

update(entry_id, label)¶

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:	entry_id (int) – The index of the newly labeled sample. label (float) – The label of the queried sample.

libact.query_strategies.quire module¶

Active Learning by QUerying Informative and Representative Examples (QUIRE)

This module contains a class that implements an active learning algorithm (query strategy): QUIRE

class libact.query_strategies.quire.QUIRE(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Querying Informative and Representative Examples (QUIRE)

Query the most informative and representative examples where the metrics measuring and combining are done using min-max approach.

Parameters:

lambda (float, optional (default=1.0)) – A regularization parameter used in the regularization learning framework.
kernel ({'linear', 'poly', 'rbf', callable}, optional (default='rbf')) – Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, or a callable. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).
degree (int, optional (default=3)) – Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
gamma (float, optional (default=1.)) – Kernel coefficient for ‘rbf’, ‘poly’.
coef0 (float, optional (default=1.)) – Independent term in kernel function. It is only significant in ‘poly’.

Examples

Here is an example of declaring a QUIRE query_strategy object:

from libact.query_strategies import QUIRE

qs = QUIRE(
         dataset, # Dataset object
     )

References

[1]	S.-J. Huang, R. Jin, and Z.-H. Zhou. Active learning by querying informative and representative examples.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

update(entry_id, label)¶

Update the internal states of the QueryStrategy after each queried sample being labeled.

Parameters:	entry_id (int) – The index of the newly labeled sample. label (float) – The label of the queried sample.

libact.query_strategies.random_sampling module¶

Random Sampling

class libact.query_strategies.random_sampling.RandomSampling(dataset, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Random sampling

This class implements the random query strategy. A random entry from the unlabeled pool is returned for each query.

Parameters:	random_state ({int, np.random.RandomState instance, None}, optional (default=None)) – If int or None, random_state is passed as parameter to generate np.random.RandomState instance. if np.random.RandomState instance, random_state is the random number generate.

random_states_¶: np.random.RandomState instance – The random number generator using.

Examples

Here is an example of declaring a RandomSampling query_strategy object:

from libact.query_strategies import RandomSampling

qs = RandomSampling(
         dataset, # Dataset object
     )

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

libact.query_strategies.uncertainty_sampling module¶

Uncertainty Sampling

This module contains a class that implements two of the most well-known uncertainty sampling query strategies: the least confidence method and the smallest margin method (margin sampling).

class libact.query_strategies.uncertainty_sampling.UncertaintySampling(*args, **kwargs)

Bases: libact.base.interfaces.QueryStrategy

Uncertainty Sampling

This class implements Uncertainty Sampling active learning algorithm [1].

Parameters:

model (libact.base.interfaces.ContinuousModel or libact.base.interfaces.ProbabilisticModel object instance) – The base model used for training.
method ({'lc', 'sm', 'entropy'}, optional (default='lc')) – least confidence (lc), it queries the instance whose posterior probability of being positive is nearest 0.5 (for binary classification); smallest margin (sm), it queries the instance whose posterior probability gap between the most and the second probable labels is minimal; entropy, requires libact.base.interfaces.ProbabilisticModel to be passed in as model parameter;

model¶: libact.base.interfaces.ContinuousModel or libact.base.interfaces.ProbabilisticModel object instance – The model trained in last query.

Examples

Here is an example of declaring a UncertaintySampling query_strategy object:

from libact.query_strategies import UncertaintySampling
from libact.models import LogisticRegression

qs = UncertaintySampling(
         dataset, # Dataset object
         model=LogisticRegression(C=0.1)
     )

Note that the model given in the model parameter must be a ContinuousModel which supports predict_real method.

References

[1]	Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.

make_query(return_score=False)

Return the index of the sample to be queried and labeled and selection score of each sample. Read-only.

No modification to the internal states.

Returns:	ask_id (int) – The index of the next unlabeled sample to be queried and labeled. score (list of (index, score) tuple) – Selection score of unlabled entries, the larger the better.

libact.query_strategies.variance_reduction module¶

Variance Reduction

class libact.query_strategies.variance_reduction.VarianceReduction(*args, **kwargs)¶

Bases: libact.base.interfaces.QueryStrategy

Variance Reduction

This class implements Variance Reduction active learning algorithm [1].

Parameters:

model ({libact.model.LogisticRegression instance, 'LogisticRegression'}) – The model used for variance reduction to evaluate the variance. Only Logistic regression are supported now.
sigma (float, >0, optional (default=100.0)) – 1/sigma is added to the diagonal of the Fisher information matrix as a regularization term.
optimality ({'trace', 'determinant', 'eigenvalue'}, optional (default='trace')) – The type of optimal design. The options are the trace, determinant, or maximum eigenvalue of the inverse Fisher information matrix. Only ‘trace’ are supported now.
n_jobs (int, optional (default=1)) – The number of processors to estimate the expected variance.

References

[1]	Schein, Andrew I., and Lyle H. Ungar. “Active learning for logistic regression: an evaluation.” Machine Learning 68.3 (2007): 235-265.

[2]	Settles, Burr. “Active learning literature survey.” University of Wisconsin, Madison 52.55-66 (2010): 11.

make_query()¶

Return the index of the sample to be queried and labeled. Read-only.

No modification to the internal states.

Returns:	ask_id – The index of the next unlabeled sample to be queried and labeled.
Return type:	int

Module contents¶

Concrete query strategy classes.