hundred_hammers package#
Config module#
Global configuration for the library.
Model_zoo module#
This module provides an easy way of accessing all available machine learning models and provides some default models to use for classification and regression tasks.
Base module#
- class HundredHammersBase(models: Iterable[Tuple[str, BaseEstimator, dict]] = None, metrics: Iterable[str | callable] = None, eval_metric: str | callable = None, input_transform: TransformerMixin | str = None, cross_validator: callable = None, cross_validator_params: dict = None, test_size: float = 0.2, n_train_evals: int = 1, n_val_evals: int = 1, show_progress_bar: bool = True, seed_strategy: str = 'sequential')[source]#
Bases:
objectBase HundredHammers class. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.
- Parameters:
models (Iterable[Tuple[str, BaseEstimator, dict]]) – List of models to evaluate.
metrics (Iterable[str | callable]) – Metrics to use to evaluate the models.
eval_metric (str | callable) – Target metric to use in hyperparameter optimization.
input_transform (TransformerMixin) – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)
cross_validator (callable) – Cross Validator to use in the evaluation.
cross_validator_params (dict) – Parameters for the Cross Validator.
test_size (float) – Percentage of the dataset to use for testing.
n_train_evals (int) – Number of times to vary the training/test separation seed.
n_val_evals (int) – Number of times to vary the cross-validation seed.
seed_strategy (str) – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)
- property full_report: DataFrame#
Pandas dataframe reflecting the results of the last evaluation of the models with extra information.
- Returns:
Dataframe with the performance of each of the models.
- Return type:
DataFrame
- property report: DataFrame#
Pandas dataframe reflecting the results of the last evaluation of the models.
- Returns:
Dataframe with the performance of each of the models.
- Return type:
DataFrame
- property best_params: List[Tuple[str, dict]]#
List of the best hyperparameters found for each model.
- Returns:
List of the best hyperparameters obtained for each model.
- Return type:
List[Tuple[str, dict]]
- property trained_models: Iterable[tuple[str, BaseEstimator, dict]]#
Get the trained models.
- Returns:
A list of models in the form of tuples (name, model, hyperparameters).
- Return type:
Iterable[tuple[str, BaseEstimator, dict]]
- evaluate(X: ndarray, y: ndarray, optim_hyper: bool = True, hyperoptimizer: HyperOptimizer | None = None) DataFrame[source]#
- tune_models(X: ndarray, y: ndarray, hyperoptimizer: HyperOptimizer | None = None, split_idx: int = 1, progress: Progress | None = None) List[Tuple[str, BaseEstimator, dict]][source]#
Tune a model using cross-validation.
- Parameters:
X (ndarray) – Input observations.
y (ndarray) – Target values.
hyperoptimizer (HyperOptimizer) – Hyperparameter optimizer that will find the best parameters for each model.
- Returns:
The tuned model.
- Return type:
List[Tuple[str, BaseEstimator, dict]]
- optimize_hyperparams(X: ndarray, y: ndarray, hyperoptimizer: HyperOptimizer | None = None, split_idx: int = 1, progress: Progress | None = None) List[dict][source]#
Obtain the best set of parameters for each of the models.
- Parameters:
X (ndarray) – Input data.
y (ndarray) – Target data.
hyperoptimizer (HyperOptimizer) – Hyperparameter optimizer that will find the best parameters for each model.
- Returns:
List of the best hyperparameters obtained for each model.
- Return type:
List[dict]
Classifier module#
- class HundredHammersClassifier(models=None, metrics=None, eval_metric=None, input_transform=None, cross_validator=<class 'sklearn.model_selection._split.StratifiedKFold'>, cross_validator_params=None, test_size=0.2, n_train_evals=1, n_val_evals=1, show_progress_bar=False, seed_strategy='sequential')[source]#
Bases:
HundredHammersBaseHundredHammers class specialized in classification models. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.
- Parameters:
models – List of models to evaluate (has a default list of models)
metrics – Metrics to use to evaluate the models (has a default list of metrics)
eval_metric – Target metric to use in hyperparameter optimization (default is the first metric in metrics)
input_transform – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)
cross_validator – Cross Validator to use in the evaluation (default KFold)
cross_validator_params – Parameters for the Cross Validator (default {“shuffle”: True, “n_splits”: 5})
test_size – Percentage of the dataset to use for testing (default 0.2)
n_train_evals – Number of times to vary the training/test separation seed.
n_val_evals – Number of times to vary the cross-validation seed.
show_progress_bar – Show progress bar in the evaluation (default False)
seed_strategy – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)
Regressor module#
- class HundredHammersRegressor(models=None, metrics=None, eval_metric=None, input_transform=None, cross_validator=<class 'sklearn.model_selection._split.KFold'>, cross_validator_params=None, test_size=0.2, n_val_evals=1, n_train_evals=1, show_progress_bar=False, seed_strategy='sequential')[source]#
Bases:
HundredHammersBaseHundredHammers class specialized in regression models. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.
- Parameters:
models – List of models to evaluate (has a default list of models)
metrics – Metrics to use to evaluate the models (has a default list of metrics)
eval_metric – Target metric to use in hyperparameter optimization (default is the first metric in metrics)
input_transform – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)
cross_validator – Cross Validator to use in the evaluation (default KFold)
cross_validator_params – Parameters for the Cross Validator (default {“shuffle”: True, “n_splits”: 5})
test_size – Percentage of the dataset to use for testing (default 0.2)
n_train_evals – Number of times to vary the training/test separation seed.
n_val_evals – Number of times to vary the cross-validation seed.
show_progress_bar – Show progress bar in the evaluation (default False)
seed_strategy – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)
Hyperparameters module#
- add_known_model_def(def_dict: dict)[source]#
Adds the definition of the hyperparameters of a new model to the list of know hyperparameters and known models.
The definition should be a dictionary that follows this schema:
{ 'model': <Name>, <hyperparam_name>: Choose one of: - {"type": "real", "min": <number>, "max": <number>}, - {"type": "integer", "min": <number>, "max": <number>}, - {"type": "categorical", "values": [<any>]} }
There can be any number of hyperparameters, even 0, they MUST correspond to the arguments used in the model constructor, or you will get an error in the hyperparameter search step.
- Parameters:
def_dict (dict) – dictionary that defines the hyperparameters of a new model.
- find_hyperparam_def(model: BaseEstimator) dict[source]#
Obtains the definitions of the hyperparameters of each of the model listed.
- Parameters:
model (BaseEstimator) – Model for which we want to find the hyperparameters.
- Returns:
Hyperparameter definition for the model.
- Return type:
dict
- find_hyperparam_grid(model: BaseEstimator, n_grid_points: int = 10) dict[source]#
Obtains a grid of hyperparameters to optimize for the model.
- Parameters:
model (BaseEstimator) – Model for which we want to find the hyperparameters.
n_grid_points (int) – Number of values to pick for each hyperparameter.
- Returns:
Hyperparameter definition for the model.
- Return type:
dict
- construct_hyperparam_grid(hyperparam_grid_def: dict, n_grid_points: int = 10) dict[source]#
Generate a grid of hyperparameters from their definition.
- Parameters:
hyperparam_grid_def (dict) – Definition of the hyperparameters to be generated as a grid.
n_grid_points (int) – Number of values to pick for each hyperparameter.
- Returns:
Hyperparameter grid to use in grid search.
- Return type:
dict
- find_hyperparam_random(model: BaseEstimator, n_samples: int = 10) dict[source]#
Obtains a grid of hyperparameters to optimize for the model.
- Parameters:
model (BaseEstimator) – Model for which we want to find the hyperparameters.
n_grid_points (int) – Number of values to pick for each hyperparameter.
- Returns:
Hyperparameter definition for the model.
- Return type:
dict
- construct_hyperparam_random(hyperparam_grid_def: dict, n_samples: int = 10) dict[source]#
Generate a grid of hyperparameters from their definition.
- Parameters:
hyperparam_grid_def (dict) – Definition of the hyperparameters to be generated as a grid.
n_grid_points (int) – Number of values to pick for each hyperparameter.
- Returns:
List of hyperparameter grids to use in grid search.
- Return type:
dict
Hyperparameter Optimization module#
- class HyperOptimizer(metric: str | callable | Tuple[str, callable, dict] = 'MSE')[source]#
Bases:
ABCHyperparameter Optimizer interface.
- Parameters:
metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predictions of a model compared with the real dataset.
- abstract best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None) dict[source]#
Obtains the best set parameters for the given model and dataset.
- Parameters:
X (ndarray) – input dataset.
y (ndarray) – target dataset.
model (BaseEstimator) – machine learning model to evaluate.
param_grid (dict) – grid of parameters to search over.
- Return type:
dict
- class HyperOptimizerGridSearch(metric: str | callable = 'MSE', n_folds_tune: int = 5, n_grid_points: int = 10)[source]#
Bases:
HyperOptimizerGrid Search Hyperparameter Optimizer.
- Parameters:
metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predicitons of a model compared with the real dataset.
n_folds_tune (int) – number of splits in cross validation for grid search.
n_grid_points (int) – amount of points to choose per parameter when the grid is constructed.
- best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None)[source]#
Obtains the best set parameters for the given model and dataset.
- Parameters:
X (ndarray) – input dataset.
y (ndarray) – target dataset.
model (BaseEstimator) – machine learning model to evaluate.
param_grid (dict) – grid of parameters to search over.
- Return type:
dict
- class HyperOptimizerRandomSearch(metric: str | callable = 'MSE', n_folds_tune: int = 5, n_iter: int = 10)[source]#
Bases:
HyperOptimizerGrid Search Hyperparameter Optimizer.
- Parameters:
metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predictions of a model compared with the real dataset.
n_folds_tune (int) – number of splits in cross validation for grid search.
n_iter (int) – amount of samples to take for each parameter.
- best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None)[source]#
Obtains the best set parameters for the given model and dataset.
- Parameters:
X (ndarray) – input dataset.
y (ndarray) – target dataset.
model (BaseEstimator) – machine learning model to evaluate.
param_grid (dict) – grid of parameters to search over.
- Return type:
dict
Metric_alias module#
This module provides some alternative names for metrics implemented in sklearn.
Plots module#
- plot_confusion_matrix(X, y, model, class_dict, title='', test_size=0.2, seed=0, filepath=None, display=True)[source]#
Plot confusion matrix for a given model.
- Parameters:
X – input observations
y – target values
model – model to evaluate
class_dict – dictionary with class names (ex.: {0: “No”, 1: “Yes”})
title – title of the plot
test_size – percentage of the dataset to use for testing
seed – random seed
filepath – path to save the plot
display – whether to display the plot
- plot_regression_pred(X, y, models, y_label='', title='', test_size=0.2, metric=None, seed=0, filepath=None, display=True)[source]#
Plot the predictions of the regression model
- Parameters:
X – input observations
y – target values
models – list of models to evaluate
y_label – name of the target variable
title – title of the plot
test_size – percentage of the dataset to use for testing
metric – metric to use for evaluation
seed – random seed
filepath – path to save the plot
display – whether to display the plot
- plot_batch_results(df, metric_name, title='', filepath=None, display=True)[source]#
Plot the results of the batch evaluation
- Parameters:
df – results dataframe
title – title of plot
filepath – filepath to save plot
display – whether to display the plot
- plot_multiple_datasets(df, metric_name, id_col='Code', title='', line_at_0=False, higher_is_better=True, filepath=None, display=True)[source]#
Plot the results of the batch evaluation
- Parameters:
df – results dataframe
metric_name – metric to plot
id_col – column containing the ID of the dataset
title – title of plot
line_at_0 – determines if a line is plotted at 0
higher_is_better – determines if higher values are better
filepath – filepath to save plot
display – whether to display the plot
Utils module#
- class NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#
Bases:
JSONEncoderSpecial json encoder for numpy types
- default(o)[source]#
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)