hundred_hammers package#

Config module#

Global configuration for the library.

Model_zoo module#

This module provides an easy way of accessing all available machine learning models and provides some default models to use for classification and regression tasks.

Base module#

class HundredHammersBase(models: Iterable[Tuple[str, BaseEstimator, dict]] = None, metrics: Iterable[str | callable] = None, eval_metric: str | callable = None, input_transform: TransformerMixin | str = None, cross_validator: callable = None, cross_validator_params: dict = None, test_size: float = 0.2, n_train_evals: int = 1, n_val_evals: int = 1, show_progress_bar: bool = True, seed_strategy: str = 'sequential')[source]#

Bases: object

Base HundredHammers class. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.

Parameters:
  • models (Iterable[Tuple[str, BaseEstimator, dict]]) – List of models to evaluate.

  • metrics (Iterable[str | callable]) – Metrics to use to evaluate the models.

  • eval_metric (str | callable) – Target metric to use in hyperparameter optimization.

  • input_transform (TransformerMixin) – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)

  • cross_validator (callable) – Cross Validator to use in the evaluation.

  • cross_validator_params (dict) – Parameters for the Cross Validator.

  • test_size (float) – Percentage of the dataset to use for testing.

  • n_train_evals (int) – Number of times to vary the training/test separation seed.

  • n_val_evals (int) – Number of times to vary the cross-validation seed.

  • seed_strategy (str) – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)

property full_report: DataFrame#

Pandas dataframe reflecting the results of the last evaluation of the models with extra information.

Returns:

Dataframe with the performance of each of the models.

Return type:

DataFrame

property report: DataFrame#

Pandas dataframe reflecting the results of the last evaluation of the models.

Returns:

Dataframe with the performance of each of the models.

Return type:

DataFrame

property best_params: List[Tuple[str, dict]]#

List of the best hyperparameters found for each model.

Returns:

List of the best hyperparameters obtained for each model.

Return type:

List[Tuple[str, dict]]

property trained_models: Iterable[tuple[str, BaseEstimator, dict]]#

Get the trained models.

Returns:

A list of models in the form of tuples (name, model, hyperparameters).

Return type:

Iterable[tuple[str, BaseEstimator, dict]]

evaluate(X: ndarray, y: ndarray, optim_hyper: bool = True, hyperoptimizer: HyperOptimizer | None = None) DataFrame[source]#
tune_models(X: ndarray, y: ndarray, hyperoptimizer: HyperOptimizer | None = None, split_idx: int = 1, progress: Progress | None = None) List[Tuple[str, BaseEstimator, dict]][source]#

Tune a model using cross-validation.

Parameters:
  • X (ndarray) – Input observations.

  • y (ndarray) – Target values.

  • hyperoptimizer (HyperOptimizer) – Hyperparameter optimizer that will find the best parameters for each model.

Returns:

The tuned model.

Return type:

List[Tuple[str, BaseEstimator, dict]]

optimize_hyperparams(X: ndarray, y: ndarray, hyperoptimizer: HyperOptimizer | None = None, split_idx: int = 1, progress: Progress | None = None) List[dict][source]#

Obtain the best set of parameters for each of the models.

Parameters:
  • X (ndarray) – Input data.

  • y (ndarray) – Target data.

  • hyperoptimizer (HyperOptimizer) – Hyperparameter optimizer that will find the best parameters for each model.

Returns:

List of the best hyperparameters obtained for each model.

Return type:

List[dict]

Classifier module#

class HundredHammersClassifier(models=None, metrics=None, eval_metric=None, input_transform=None, cross_validator=<class 'sklearn.model_selection._split.StratifiedKFold'>, cross_validator_params=None, test_size=0.2, n_train_evals=1, n_val_evals=1, show_progress_bar=False, seed_strategy='sequential')[source]#

Bases: HundredHammersBase

HundredHammers class specialized in classification models. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.

Parameters:
  • models – List of models to evaluate (has a default list of models)

  • metrics – Metrics to use to evaluate the models (has a default list of metrics)

  • eval_metric – Target metric to use in hyperparameter optimization (default is the first metric in metrics)

  • input_transform – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)

  • cross_validator – Cross Validator to use in the evaluation (default KFold)

  • cross_validator_params – Parameters for the Cross Validator (default {“shuffle”: True, “n_splits”: 5})

  • test_size – Percentage of the dataset to use for testing (default 0.2)

  • n_train_evals – Number of times to vary the training/test separation seed.

  • n_val_evals – Number of times to vary the cross-validation seed.

  • show_progress_bar – Show progress bar in the evaluation (default False)

  • seed_strategy – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)

Regressor module#

class HundredHammersRegressor(models=None, metrics=None, eval_metric=None, input_transform=None, cross_validator=<class 'sklearn.model_selection._split.KFold'>, cross_validator_params=None, test_size=0.2, n_val_evals=1, n_train_evals=1, show_progress_bar=False, seed_strategy='sequential')[source]#

Bases: HundredHammersBase

HundredHammers class specialized in regression models. Implements methods for automatic machine learning like evaluating a list of models and performing hyperparameter optimization.

Parameters:
  • models – List of models to evaluate (has a default list of models)

  • metrics – Metrics to use to evaluate the models (has a default list of metrics)

  • eval_metric – Target metric to use in hyperparameter optimization (default is the first metric in metrics)

  • input_transform – Input normalization strategy used. Specified as a string or the normalization class. (‘MinMax’, ‘MaxAbs’, ‘Standard’, ‘Norm’, ‘Robust’)

  • cross_validator – Cross Validator to use in the evaluation (default KFold)

  • cross_validator_params – Parameters for the Cross Validator (default {“shuffle”: True, “n_splits”: 5})

  • test_size – Percentage of the dataset to use for testing (default 0.2)

  • n_train_evals – Number of times to vary the training/test separation seed.

  • n_val_evals – Number of times to vary the cross-validation seed.

  • show_progress_bar – Show progress bar in the evaluation (default False)

  • seed_strategy – Strategy used to generate the seeds for the different evaluations (‘sequential’ or ‘random’)

Hyperparameters module#

add_known_model_def(def_dict: dict)[source]#

Adds the definition of the hyperparameters of a new model to the list of know hyperparameters and known models.

The definition should be a dictionary that follows this schema:

{
    'model': <Name>,
    <hyperparam_name>:
        Choose one of:
        - {"type": "real", "min": <number>, "max": <number>},
        - {"type": "integer", "min": <number>, "max": <number>},
        - {"type": "categorical", "values": [<any>]}
}

There can be any number of hyperparameters, even 0, they MUST correspond to the arguments used in the model constructor, or you will get an error in the hyperparameter search step.

Parameters:

def_dict (dict) – dictionary that defines the hyperparameters of a new model.

find_hyperparam_def(model: BaseEstimator) dict[source]#

Obtains the definitions of the hyperparameters of each of the model listed.

Parameters:

model (BaseEstimator) – Model for which we want to find the hyperparameters.

Returns:

Hyperparameter definition for the model.

Return type:

dict

find_hyperparam_grid(model: BaseEstimator, n_grid_points: int = 10) dict[source]#

Obtains a grid of hyperparameters to optimize for the model.

Parameters:
  • model (BaseEstimator) – Model for which we want to find the hyperparameters.

  • n_grid_points (int) – Number of values to pick for each hyperparameter.

Returns:

Hyperparameter definition for the model.

Return type:

dict

construct_hyperparam_grid(hyperparam_grid_def: dict, n_grid_points: int = 10) dict[source]#

Generate a grid of hyperparameters from their definition.

Parameters:
  • hyperparam_grid_def (dict) – Definition of the hyperparameters to be generated as a grid.

  • n_grid_points (int) – Number of values to pick for each hyperparameter.

Returns:

Hyperparameter grid to use in grid search.

Return type:

dict

find_hyperparam_random(model: BaseEstimator, n_samples: int = 10) dict[source]#

Obtains a grid of hyperparameters to optimize for the model.

Parameters:
  • model (BaseEstimator) – Model for which we want to find the hyperparameters.

  • n_grid_points (int) – Number of values to pick for each hyperparameter.

Returns:

Hyperparameter definition for the model.

Return type:

dict

construct_hyperparam_random(hyperparam_grid_def: dict, n_samples: int = 10) dict[source]#

Generate a grid of hyperparameters from their definition.

Parameters:
  • hyperparam_grid_def (dict) – Definition of the hyperparameters to be generated as a grid.

  • n_grid_points (int) – Number of values to pick for each hyperparameter.

Returns:

List of hyperparameter grids to use in grid search.

Return type:

dict

Hyperparameter Optimization module#

class HyperOptimizer(metric: str | callable | Tuple[str, callable, dict] = 'MSE')[source]#

Bases: ABC

Hyperparameter Optimizer interface.

Parameters:

metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predictions of a model compared with the real dataset.

abstract best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None) dict[source]#

Obtains the best set parameters for the given model and dataset.

Parameters:
  • X (ndarray) – input dataset.

  • y (ndarray) – target dataset.

  • model (BaseEstimator) – machine learning model to evaluate.

  • param_grid (dict) – grid of parameters to search over.

Return type:

dict

class HyperOptimizerGridSearch(metric: str | callable = 'MSE', n_folds_tune: int = 5, n_grid_points: int = 10)[source]#

Bases: HyperOptimizer

Grid Search Hyperparameter Optimizer.

Parameters:
  • metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predicitons of a model compared with the real dataset.

  • n_folds_tune (int) – number of splits in cross validation for grid search.

  • n_grid_points (int) – amount of points to choose per parameter when the grid is constructed.

best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None)[source]#

Obtains the best set parameters for the given model and dataset.

Parameters:
  • X (ndarray) – input dataset.

  • y (ndarray) – target dataset.

  • model (BaseEstimator) – machine learning model to evaluate.

  • param_grid (dict) – grid of parameters to search over.

Return type:

dict

class HyperOptimizerRandomSearch(metric: str | callable = 'MSE', n_folds_tune: int = 5, n_iter: int = 10)[source]#

Bases: HyperOptimizer

Grid Search Hyperparameter Optimizer.

Parameters:
  • metric (str or callable or Tuple[str, callable, dict]) – function that calculates the error of the predictions of a model compared with the real dataset.

  • n_folds_tune (int) – number of splits in cross validation for grid search.

  • n_iter (int) – amount of samples to take for each parameter.

best_params(X: ndarray, y: ndarray, model: BaseEstimator, param_grid: dict | None = None)[source]#

Obtains the best set parameters for the given model and dataset.

Parameters:
  • X (ndarray) – input dataset.

  • y (ndarray) – target dataset.

  • model (BaseEstimator) – machine learning model to evaluate.

  • param_grid (dict) – grid of parameters to search over.

Return type:

dict

Metric_alias module#

This module provides some alternative names for metrics implemented in sklearn.

process_metric(metric: str | callable, metric_params: dict = None) Tuple[str, callable, dict][source]#

Converts a metric into a tuple with the name, function call and its parameters

Parameters:

metric – a string or callable that represents the error function

Plots module#

plot_confusion_matrix(X, y, model, class_dict, title='', test_size=0.2, seed=0, filepath=None, display=True)[source]#

Plot confusion matrix for a given model.

Parameters:
  • X – input observations

  • y – target values

  • model – model to evaluate

  • class_dict – dictionary with class names (ex.: {0: “No”, 1: “Yes”})

  • title – title of the plot

  • test_size – percentage of the dataset to use for testing

  • seed – random seed

  • filepath – path to save the plot

  • display – whether to display the plot

plot_regression_pred(X, y, models, y_label='', title='', test_size=0.2, metric=None, seed=0, filepath=None, display=True)[source]#

Plot the predictions of the regression model

Parameters:
  • X – input observations

  • y – target values

  • models – list of models to evaluate

  • y_label – name of the target variable

  • title – title of the plot

  • test_size – percentage of the dataset to use for testing

  • metric – metric to use for evaluation

  • seed – random seed

  • filepath – path to save the plot

  • display – whether to display the plot

plot_batch_results(df, metric_name, title='', filepath=None, display=True)[source]#

Plot the results of the batch evaluation

Parameters:
  • df – results dataframe

  • title – title of plot

  • filepath – filepath to save plot

  • display – whether to display the plot

plot_multiple_datasets(df, metric_name, id_col='Code', title='', line_at_0=False, higher_is_better=True, filepath=None, display=True)[source]#

Plot the results of the batch evaluation

Parameters:
  • df – results dataframe

  • metric_name – metric to plot

  • id_col – column containing the ID of the dataset

  • title – title of plot

  • line_at_0 – determines if a line is plotted at 0

  • higher_is_better – determines if higher values are better

  • filepath – filepath to save plot

  • display – whether to display the plot

Utils module#

class NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Bases: JSONEncoder

Special json encoder for numpy types

default(o)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)