learning_machines_drift package

Submodules

learning_machines_drift.backends module

Backend module.

class Backend(*args, **kwargs)

Bases: Protocol

A protocol class for a Backend.

clear_logged_dataset(tag: str) → bool

Delete directory containing logged files.

Parameters:: tag (str) – Path to logged directory.
Returns:: if tag/logged path exists. False: if tag/logged path does not exist.
Return type:: True

clear_reference_dataset(tag: str) → bool

Delete directory containing reference files.

Parameters:: tag – Path to reference directory.

load_logged_dataset(tag: str) → Dataset

Return a Dataset from the union of logged data.

Parameters:: tag (str) – Tag identifying dataset.

load_reference_dataset(tag: str) → Dataset

Load reference dataset from reference path.

Parameters:: tag (str) – Tag identifying dataset.

save_logged_features(tag: str, identifier: UUID, dataframe: DataFrame) → None

Save logged features using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the logged dataset.
dataframe (pd.DataFrame) – The dataframe that needs saving.

save_logged_labels(tag: str, identifier: UUID, labels: Series) → None

Save logged labels using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
labels (pd.Series) – The dataframe that needs saving.

save_logged_latents(tag: str, identifier: UUID, dataframe: DataFrame) → None

Save optionally passed latents dataframe using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
dataframe (pd.DataFrame) – The dataframe of latents to be saved.

save_reference_dataset(tag: str, dataset: Dataset) → None

Saves passed dataset to backend under tag.

Parameters:

tag (str) – A tag for locating the dataset within the backend.
dataset (Dataset) – Reference dataset to be saved.

class FileBackend(root_dir: Union[str, Path])

Bases: object

Implements the Backend protocol for writing files to the filesystem.

clear_logged_dataset(tag: str) → bool

Delete directory containing logged files.

Parameters:: tag (str) – Path to logged directory.
Returns:: if tag/logged path exists. False: if tag/logged path does not exist.
Return type:: True

clear_reference_dataset(tag: str) → bool

Delete directory containing reference files.

Parameters:: tag – Path to reference directory.

load_logged_dataset(tag: str) → Dataset

Return a Dataset from the union of logged data.

Parameters:: tag (str) – Tag identifying dataset.

load_reference_dataset(tag: str) → Dataset

Load reference dataset from reference path.

Parameters:: tag (str) – Tag identifying dataset.

save_logged_features(tag: str, identifier: UUID, dataframe: DataFrame) → None

Save logged features using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the logged dataset.
dataframe (pd.DataFrame) – The dataframe that needs saving.

save_logged_labels(tag: str, identifier: UUID, labels: Series) → None

Save logged labels using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
labels (pd.Series) – The dataframe that needs saving.

save_logged_latents(tag: str, identifier: UUID, dataframe: Optional[DataFrame]) → None

Save optionally passed latents dataframe using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
dataframe (pd.DataFrame) – The dataframe of latents to be saved.

save_reference_dataset(tag: str, dataset: Dataset) → None

Saves passed dataset to backend under tag.

Parameters:

tag (str) – A tag for locating the dataset within the backend.
dataset (Dataset) – Reference dataset to be saved.

get_identifier(path_object: Union[str, Path]) → Optional[UUID]

Extract the UUID from the filename. The filename should have the format UUID + some other text and a file extension. The UUID should match the regex in the pattern variable UUIDHex4.

Parameters:

path_obejct (Union[str, Path]) –

Returns:

Optional universally unique identifier (UUID) from: path_object.

Return type:

Optional[UUID]

learning_machines_drift.datasets module

Datasets module with functions for generating example data.

example_dataset(n_rows: int, seed: Optional[int] = None) → Tuple[DataFrame, Series, DataFrame]

Generates data and returns features, labels and latents.

Parameters:

n_rows (int) – Number of rows/samples.
seed (Optional[int]) – Random seed for reproducibly generating data.

Returns:

A dataset tuple of: generated features, labels and latents.

Return type:

Tuple[pd.DataFrame, pd.Series, pd.DataFrame]

logistic_model(x_mu: ndarray[Any, dtype[float64]] = array([0., 0., 0.]), x_scale: ndarray[Any, dtype[float64]] = array([1., 1., 1.]), x_corr: ndarray[Any, dtype[float64]] = array([[1., 0.4, 0.], [0.4, 1., 0.], [0., 0., 1.]]), alpha: float = 0.5, beta: ndarray[Any, dtype[float64]] = array([1., 0.5, 0.]), size: int = 50, seed: Optional[int] = None, return_latents: bool = False) → Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], Optional[ndarray[Any, dtype[float64]]]]

Generate synthetic features, labels and latents.

Features are generated from a multivariate normal distribution, where the mean vector, scale vector and correlation matrix can be specified, allowing users to simulate covariate drift.

Labels are generated with a logistic regression model. The regression parameters are controlled with the beta parameter, allowing simulation of concept drift.

Latents are a single feature as characterizing the Bernoulli probability generated by the model.

Parameters:

x_mu (NDArray[np.float64]) – Mean vector of features. Defaults to np.array([0.0, 0.0, 0.0]).
x_scale (NDArray[np.float64]) – Scale of features. Defaults to np.array([1.0, 1.0, 1.0]).
x_corr (NDArray[np.float64]) – Correlation matrix giving the correlation between features. Defaults to np.array([[1.0, 0.4, 0.0], [0.4, 1.0, 0.0], [0.0, 0.0, 1.0]]).
alpha (float) – Regression alpha parameter. Defaults to 0.5.
beta (NDArray[np.float64]) – Regression beta parameters . Defaults to np.array([1.0, 0.5, 0.0]).
size (int) – Number of samples to draw from model. Defaults to 50.
return_latents (bool) – Return underlying prediction value before thresholding as ‘latent’ data. Defaults to False.

Returns:

Tuple of features, labels and (optional) latents generated.

Return type:

Tuple[NDArray[np.float64], NDArray[np.float64], Optional[NDArray[np.float64]]]

learning_machines_drift.display module

Class for scoring drift between reference and registered datasets.

class Display

Bases: object

A class for converting a dictionary of drift scores to displayed output.

classmethod plot(result: StructuredResult, score_name: Optional[str] = None, score_type: str = 'pvalue', alpha: float = 0.05) → Tuple[Figure, Any]

Plot method for displaying a set of scores on a subplot grid.

Parameters:

result (StructuredResult) – Structured result from a drift score measurement.
score_type (str) – Either “statistic” or “pvalue”.
score_name (str) – Name of score to be plotted and used as plot title.
alpha (float) – Value of alpha to be used in p-value plots.

Returns:

tuple of fig and subplot array.

Return type:

Tuple[plt.Figure, Any]

classmethod table(result: StructuredResult, verbose: bool = True) → DataFrame

Gets a pandas dataframe and optionally prints a table of results from drift scoring.

Parameters:: structured_result (StructuredResult) – Structured result from a drift score measurement.
Returns:: Dataframe of scores.
Return type:: pd.DataFrame

learning_machines_drift.registry module

Module for registry handling storage and logging of datasets.

class Registry(tag: str, expect_features: bool = True, expect_labels: bool = True, expect_latent: bool = False, backend: Optional[Backend] = None, clear_logged: bool = False, clear_reference: bool = False)

Bases: object

Class for registry for logging datasets.

backend

Optional backend for data.

Type:: Optional[Backend]

tag

Tag identifying dataset.

Type:: str

ref_dataset

Optional reference dataset.

Type:: Optional[Dataset]

registered_features

Optional registered features.

Type:: Optional[pd.DataFrame]

registered_labels

Optional registered labels.

Type:: Optional[pd.Series]

registered_latents

Optional registered latents.

Type:: Optional[pd.Series]

expect_features

Whether features are expected in registry.

Type:: bool

expect_labels

Whether a labels series is expected in registry.

Type:: bool

expect_latent

Whether latents are expected in registry.

Type:: bool

all_registered() → bool

Checks whether all expected datastes are registered.

Returns:: True if all expected registered, False otherwise.
Return type:: bool

property identifier: UUID

Gets the identifier of the registry.

Returns:: The identifier.
Return type:: UUID

log_dataset(dataset: Dataset) → None

Logs dataset features in registered data.

Parameters:: dataset (Dataset) – New dataset to be logged.

log_features(features: DataFrame) → None

Logs dataset features in registered data.

Parameters:: features (pd.DataFrame) – Features dataframe to be registered.

log_labels(labels: Series) → None

Logs dataset labels in registered data.

Parameters:: labels (pd.Series) – Labels series to be registered.

log_latents(latent: DataFrame) → None

Logs dataset latents in registered data.

Parameters:: latents (pd.DataFrame) – Latents dataframe to be registered.

ref_summary() → BaselineSummary

Return a JSON describing shape of dataset feature, labels and: latents.

Returns:: Summary of the dataset shapes.
Return type:: BaselineSummary

register_ref_dataset(features: DataFrame, labels: Series, latents: Optional[DataFrame] = None) → None

Registers passed reference data.

Parameters:

features (pd.DataFrame) – Reference features to be stored.
labels (pd.Series) – Reference labels to be stored.
latents (Optional[pd.DataFrame]) – Reference latents to be stored.

property registered_dataset: Dataset

Gets the registered dataset.

Returns:: The registered dataset.
Return type:: Dataset

save_reference_dataset(dataset: Dataset) → None

Registers passed reference data.

Parameters:: dataset (Dataset) – Reference dataset to be stored.

learning_machines_drift.filter module

Module with class to filter a dataset.

class Comparison(value)

Bases: Enum

Comparison enum for ‘LESS’, ‘GREATER’ and ‘EQUAL’ cases.

EQUAL = 3

GREATER = 2

LESS = 1

class Condition(comparison_str: str, value: Any)

Bases: object

Condition class comprising of a ‘comparison’ and a ‘value’.

comparison: Comparison

value: Any

class Filter(conditions: Optional[dict[str, List[learning_machines_drift.filter.Condition]]])

Bases: object

Filter class.

Filters a given dataset through an AND operation applied across all passed conditions.

conditions: Optional[dict[str, List[learning_machines_drift.filter.Condition]]]

Dict with key (variable) and value as a list of (condition, value) to be used for filtering.

Type:: dict[str, List[Condition]]

transform(dataset: Dataset) → Dataset

Transform the passed dataset given filter.

Parameters:: dataset (Dataset) – the dataset to be filtered.
Returns:: transformed dataset given filters.
Return type:: Dataset

learning_machines_drift.monitor module

Monitor class for interacting with data and scoring drift.

class Monitor(tag: str, backend: Optional[Backend] = None)

Bases: object

A class for monitoring data with data loading from backend and scoring drift scoring with metrics class.

tag

The tag where data for monitoring is located within backend.

Type:: str

ref_dataset

The reference dataset.

Type:: Optional[Dataset]

registered_dataset

The logged, registered dataset for drift comparison to reference dataset.

Type:: Optional[Dataset]

load_data(drift_filter: Optional[Filter] = None) → Monitor

Load data from backend into monitor.

Parameters:

drift_filter (Filter, optional) – An optional filter with conditions applied to both reference and registered loaded data.

Returns:

The calling Monitor instance with (optionally) filtered: datasets loaded.

Return type:

Monitor

property metrics: Metrics

Drift metrics.

Raises:

ReferenceDatasetMissing – The reference dataset is None.
ValueError – There is no additional registered data.

learning_machines_drift.exceptions module

Exceptions module.

exception ReferenceDatasetMissing

Bases: Exception

Raised when no reference dataset logged.

learning_machines_drift.metrics module

Class for scoring drift between reference and registered datasets.

class Metrics(reference_dataset: Dataset, registered_dataset: Dataset, random_state: Optional[int] = None)

Bases: object

A class with metrics for scoring data drift between registered and reference datasets.

reference_dataset

Reference datastet for drift measures.

Type:: Dataset

registered_dataset

Registered/logged datastet for drift measures.

Type:: Dataset

random_state

Optional seeding for reproducibility.

Type:: Optional[int]

get_boundary_adherence() → StructuredResult

For each feature the proportion of registered data that lies within the minimum and maximum of the reference dataset.

See SDMetrics for further details.

Returns:

The boundary adherence of the registered dataset: compared to the reference dataset.

Return type:

StructuredResult

get_range_coverage() → StructuredResult

For each feature the proportion of the range of the registered data that is covered by the reference dataset.

See SDMetrics for further details.

Returns:

The range of the registered dataset compared: to the reference dataset.

Return type:

StructuredResult

logistic_detection(normalize: bool = False, score_type: Optional[str] = None, seed: Optional[int] = None, verbose: bool = True) → StructuredResult

Calculates a measure of similarity using fitted logistic regression to predict reference or registered label. SD metrics package source # pylint: disable=line-too-long is adapted to permit optional score_type and seed to be given allowing alternative and reproducible metrics.

score_type can be:

None: defaults to scoring of logistic_detection method.
“f1”: Cross-validated F1 score with 0.5 threshold.
“roc_auc”: Cross-validated receiver operating characteristic (area under the curve).

Parameters:

score_type (Optional[str]) – None for default or string; “f1” and “roc_auc” currently implemented.
seed (Optional[int]) – Optional integer for reproducibility of scoring as cross-validation performed.
verbose (bool) – Boolean for verbose output to stdout.

Returns:

Score providing an overall similarity measure of: reference and registered datasets.

Return type:

results (float)

scipy_kolmogorov_smirnov(verbose: bool = True) → StructuredResult

Calculates feature-wise two-sample Kolmogorov-Smirnov test for goodness of fit. Assumes continuous underlying distributions but scores are still interpretable if data is approximately continuous.

Parameters:: verbose (bool) – Boolean for verbose output to stdout.
Returns:: Dictionary of statistics and p-values by feature.
Return type:: results (dict)

scipy_mannwhitneyu(verbose: bool = True) → StructuredResult

Calculates feature-wise Mann-Whitney U test, a nonparametric test of the null hypothesis that the distribution underlying sample x is the same as the distribution underlying sample y. Provides a test for the difference in location of two distributions. Assumes continuous underlying distributions but scores are still interpretable if data is approximately continuous.

Parameters:: verbose (bool) – Boolean for verbose output to stdout.
Returns:: Dictionary of statistics and p-values by feature.
Return type:: results (dict)

scipy_permutation(agg_func: ~typing.Callable[[...], float] = <function mean>, verbose: bool = True) → StructuredResult

Performs feature-wise permutation test with default statistic to measure differences under permutations of labels as the mean.

Parameters:

func (Callable[..., float]) – Function for comparing two samples.
verbose (bool) – Print outputs

Returns:

Dictionary with keys as features and values as scipy.stats.permutation_test object with test results.

Return type:

results (dict)

class Wrapper(value)

Bases: Enum

Enum for specifying the calculation type.

TYPE_OTHER = 2

TYPE_SDMETRIC = 3

TYPE_TUPLE = 1

learning_machines_drift.types module

Module of drift types.

class BaselineSummary(*, shapes: ShapeSummary)

Bases: BaseModel

Class for storing a shape summary with JSON string representation.

shapes: ShapeSummary

A shape summary instance of a dataset.

Type:: ShapeSummary

class Dataset(features: DataFrame, labels: Series, latents: Optional[DataFrame] = None)

Bases: object

Class for representing a drift dataset.

property feature_names: List[str]

Returns a list of features dataframe columns.

Returns:: A list of feature column names as strings.
Return type:: List[str]

features: DataFrame

A combined dataframe of input features and ground truth labels.

Type:: pd.DataFrame

labels: Series

A series of predicted labels from a model.

Type:: pd.Series

latents: Optional[DataFrame] = None

An optional dataframe of latent variables per sample.

Type:: Optional[pd.DataFrame]

unify() → DataFrame

Returns a column-wise concatenated dataframe of features, labels and latents.

Returns:

Column-wise concatenated dataframe of features,: labels and latents.

Return type:

pd.DataFrame

class FeatureSummary(*, n_rows: int, n_features: int)

Bases: BaseModel

Provides a summary of a features dataframe.

n_features: int

Number of features (columns).

Type:: int

n_rows: int

Number of samples (rows).

Type:: int

class LabelSummary(*, n_rows: int, n_labels: int)

Bases: BaseModel

Provides a summary of a labels series.

n_labels: int

Number of distinct labels. For example, for binary data, this would be equal to 2.

Type:: int

n_rows: int

Number of samples (rows).

Type:: int

class LatentSummary(*, n_rows: int, n_latents: int)

Bases: BaseModel

Provides a summary of a latents dataframe.

n_latents: int

Number of latent features (columns).

Type:: int

n_rows: int

Number of samples (rows).

Type:: int

class ShapeSummary(*, features: FeatureSummary, labels: LabelSummary, latents: Optional[LatentSummary] = None)

Bases: BaseModel

Provides a summary of the object shapes in a dataset of features, labels and latents.

features: FeatureSummary

Features shape summary.

Type:: FeatureSummary

labels: LabelSummary

Labels shape summary.

Type:: LabelSummary

latents: Optional[LatentSummary]

Optional latents shape summary.

Type:: Optional[LatentSummary]

class StructuredResult(method_name: str, results: Dict[str, Dict[str, float]])

Bases: object

A type for representing a result from the hypothesis tests module.

method_name: str

Name of the scoring method used.

Type:: str

results: Dict[str, Dict[str, float]]

Dictionary of results with keys as feature_name or, if for a unified dataset, “single_value”. Values are a dictionary containing the result statistic and p-value (if available) for a given method_name.

Type:: Dict[str, Dict[str, float]]

Module contents

Tools for measuring data drift.

class Dataset(features: DataFrame, labels: Series, latents: Optional[DataFrame] = None)

Bases: object

Class for representing a drift dataset.

property feature_names: List[str]

Returns a list of features dataframe columns.

Returns:: A list of feature column names as strings.
Return type:: List[str]

features: DataFrame

A combined dataframe of input features and ground truth labels.

Type:: pd.DataFrame

labels: Series

A series of predicted labels from a model.

Type:: pd.Series

latents: Optional[DataFrame] = None

An optional dataframe of latent variables per sample.

Type:: Optional[pd.DataFrame]

unify() → DataFrame

Returns a column-wise concatenated dataframe of features, labels and latents.

Returns:

Column-wise concatenated dataframe of features,: labels and latents.

Return type:

pd.DataFrame

class Display

Bases: object

A class for converting a dictionary of drift scores to displayed output.

classmethod plot(result: StructuredResult, score_name: Optional[str] = None, score_type: str = 'pvalue', alpha: float = 0.05) → Tuple[Figure, Any]

Plot method for displaying a set of scores on a subplot grid.

Parameters:

result (StructuredResult) – Structured result from a drift score measurement.
score_type (str) – Either “statistic” or “pvalue”.
score_name (str) – Name of score to be plotted and used as plot title.
alpha (float) – Value of alpha to be used in p-value plots.

Returns:

tuple of fig and subplot array.

Return type:

Tuple[plt.Figure, Any]

classmethod table(result: StructuredResult, verbose: bool = True) → DataFrame

Gets a pandas dataframe and optionally prints a table of results from drift scoring.

Parameters:: structured_result (StructuredResult) – Structured result from a drift score measurement.
Returns:: Dataframe of scores.
Return type:: pd.DataFrame

class FileBackend(root_dir: Union[str, Path])

Bases: object

Implements the Backend protocol for writing files to the filesystem.

clear_logged_dataset(tag: str) → bool

Delete directory containing logged files.

Parameters:: tag (str) – Path to logged directory.
Returns:: if tag/logged path exists. False: if tag/logged path does not exist.
Return type:: True

clear_reference_dataset(tag: str) → bool

Delete directory containing reference files.

Parameters:: tag – Path to reference directory.

load_logged_dataset(tag: str) → Dataset

Return a Dataset from the union of logged data.

Parameters:: tag (str) – Tag identifying dataset.

load_reference_dataset(tag: str) → Dataset

Load reference dataset from reference path.

Parameters:: tag (str) – Tag identifying dataset.

save_logged_features(tag: str, identifier: UUID, dataframe: DataFrame) → None

Save logged features using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the logged dataset.
dataframe (pd.DataFrame) – The dataframe that needs saving.

save_logged_labels(tag: str, identifier: UUID, labels: Series) → None

Save logged labels using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
labels (pd.Series) – The dataframe that needs saving.

save_logged_latents(tag: str, identifier: UUID, dataframe: Optional[DataFrame]) → None

Save optionally passed latents dataframe using tag as the path with UUID prepended to filename.

Parameters:

tag (str) – Tag identifying dataset.
identifier (UUID) – A unique identifier for the labels of the dataset.
dataframe (pd.DataFrame) – The dataframe of latents to be saved.

save_reference_dataset(tag: str, dataset: Dataset) → None

Saves passed dataset to backend under tag.

Parameters:

tag (str) – A tag for locating the dataset within the backend.
dataset (Dataset) – Reference dataset to be saved.

class Filter(conditions: Optional[dict[str, List[learning_machines_drift.filter.Condition]]])

Bases: object

Filter class.

Filters a given dataset through an AND operation applied across all passed conditions.

conditions: Optional[dict[str, List[learning_machines_drift.filter.Condition]]]

Dict with key (variable) and value as a list of (condition, value) to be used for filtering.

Type:: dict[str, List[Condition]]

transform(dataset: Dataset) → Dataset

Transform the passed dataset given filter.

Parameters:: dataset (Dataset) – the dataset to be filtered.
Returns:: transformed dataset given filters.
Return type:: Dataset

class Monitor(tag: str, backend: Optional[Backend] = None)

Bases: object

A class for monitoring data with data loading from backend and scoring drift scoring with metrics class.

tag

The tag where data for monitoring is located within backend.

Type:: str

ref_dataset

The reference dataset.

Type:: Optional[Dataset]

registered_dataset

The logged, registered dataset for drift comparison to reference dataset.

Type:: Optional[Dataset]

load_data(drift_filter: Optional[Filter] = None) → Monitor

Load data from backend into monitor.

Parameters:

drift_filter (Filter, optional) – An optional filter with conditions applied to both reference and registered loaded data.

Returns:

The calling Monitor instance with (optionally) filtered: datasets loaded.

Return type:

Monitor

property metrics: Metrics

Drift metrics.

Raises:

ReferenceDatasetMissing – The reference dataset is None.
ValueError – There is no additional registered data.

class Registry(tag: str, expect_features: bool = True, expect_labels: bool = True, expect_latent: bool = False, backend: Optional[Backend] = None, clear_logged: bool = False, clear_reference: bool = False)

Bases: object

Class for registry for logging datasets.

backend

Optional backend for data.

Type:: Optional[Backend]

tag

Tag identifying dataset.

Type:: str

ref_dataset

Optional reference dataset.

Type:: Optional[Dataset]

registered_features

Optional registered features.

Type:: Optional[pd.DataFrame]

registered_labels

Optional registered labels.

Type:: Optional[pd.Series]

registered_latents

Optional registered latents.

Type:: Optional[pd.Series]

expect_features

Whether features are expected in registry.

Type:: bool

expect_labels

Whether a labels series is expected in registry.

Type:: bool

expect_latent

Whether latents are expected in registry.

Type:: bool

all_registered() → bool

Checks whether all expected datastes are registered.

Returns:: True if all expected registered, False otherwise.
Return type:: bool

property identifier: UUID

Gets the identifier of the registry.

Returns:: The identifier.
Return type:: UUID

log_dataset(dataset: Dataset) → None

Logs dataset features in registered data.

Parameters:: dataset (Dataset) – New dataset to be logged.

log_features(features: DataFrame) → None

Logs dataset features in registered data.

Parameters:: features (pd.DataFrame) – Features dataframe to be registered.

log_labels(labels: Series) → None

Logs dataset labels in registered data.

Parameters:: labels (pd.Series) – Labels series to be registered.

log_latents(latent: DataFrame) → None

Logs dataset latents in registered data.

Parameters:: latents (pd.DataFrame) – Latents dataframe to be registered.

ref_summary() → BaselineSummary

Return a JSON describing shape of dataset feature, labels and: latents.

Returns:: Summary of the dataset shapes.
Return type:: BaselineSummary

register_ref_dataset(features: DataFrame, labels: Series, latents: Optional[DataFrame] = None) → None

Registers passed reference data.

Parameters:

features (pd.DataFrame) – Reference features to be stored.
labels (pd.Series) – Reference labels to be stored.
latents (Optional[pd.DataFrame]) – Reference latents to be stored.

property registered_dataset: Dataset

Gets the registered dataset.

Returns:: The registered dataset.
Return type:: Dataset

save_reference_dataset(dataset: Dataset) → None

Registers passed reference data.

Parameters:: dataset (Dataset) – Reference dataset to be stored.

class StructuredResult(method_name: str, results: Dict[str, Dict[str, float]])

Bases: object

A type for representing a result from the hypothesis tests module.

method_name: str

Name of the scoring method used.

Type:: str

results: Dict[str, Dict[str, float]]

Dictionary of results with keys as feature_name or, if for a unified dataset, “single_value”. Values are a dictionary containing the result statistic and p-value (if available) for a given method_name.

Type:: Dict[str, Dict[str, float]]