API Reference

saiph

saiph.fit(df: DataFrame, nf: int | None = None, col_weights: Dict[str, int | float] | None = None, sparse: bool = False) Model

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: None, which uses all columns.

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data.

Return type:

model

saiph.fit_transform(df: DataFrame, nf: int | None = None, col_weights: Dict[str, int | float] | None = None) Tuple[DataFrame, Model]

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: ‘all’

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data.

Return type:

model

saiph.inverse_transform(coord: DataFrame, model: Model, *, use_approximate_inverse: bool = False, use_max_modalities: bool = True, seed: int | None = None) DataFrame

Return original format dataframe from coordinates.

Parameters:
  • coord – coord of individuals to reverse transform

  • model – model used for projection

  • use_approximate_inverse – matrix is not invertible when n_individuals < n_dimensions an approximation with bias can be done by setting to True. default: False

  • use_max_modalities – for each variable, it assigns to the individual the modality with the highest proportion (True) or a random modality weighted by their proportion (False). default: True

  • seed – seed to fix randomness if use_max_modalities = False. default: None

Returns:

coordinates transformed into original space.

Retains shape, encoding and structure.

Return type:

inverse

saiph.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute the contributions and cos2.

Parameters:
  • model – Model computed by fit.

  • df – original dataframe

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False). Only valid for categorical variables.

Returns:

model populated with contribution.

Return type:

model

saiph.transform(df: DataFrame, model: Model, *, sparse: bool = False) DataFrame

Scale and project into the fitted numerical space.

Parameters:
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns:

Coordinates of the dataframe in the fitted space.

Return type:

coord

saiph.models

class saiph.models.Model(dummy_categorical: List[str], original_dtypes: pandas.core.series.Series, original_categorical: List[str], original_continuous: List[str], nf: int, column_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], row_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var_ratio: numpy.ndarray[Any, numpy.dtype[numpy.float64]], variable_coord: pandas.core.frame.DataFrame, V: numpy.ndarray[Any, numpy.dtype[numpy.float64]], modalities_types: Dict[str, str], U: numpy.ndarray[Any, numpy.dtype[numpy.float64]], s: numpy.ndarray[Any, numpy.dtype[numpy.float64]] | None = None, mean: pandas.core.series.Series | None = None, std: pandas.core.series.Series | None = None, prop: pandas.core.series.Series | None = None, _modalities: numpy.ndarray[Any, numpy.dtype[numpy.bytes_]] | None = None, D_c: numpy.ndarray[Any, numpy.dtype[numpy.float64]] | None = None, type: str | None = None, is_fitted: bool = False, correlations: pandas.core.frame.DataFrame | None = None, contributions: pandas.core.frame.DataFrame | None = None, cos2: pandas.core.frame.DataFrame | None = None, dummies_col_prop: numpy.ndarray[Any, numpy.dtype[numpy.float64]] | None = None)

Bases: object

D_c: ndarray[Any, dtype[float64]] | None = None
U: ndarray[Any, dtype[float64]]
V: ndarray[Any, dtype[float64]]
column_weights: ndarray[Any, dtype[float64]]
contributions: DataFrame | None = None
correlations: DataFrame | None = None
cos2: DataFrame | None = None
dummies_col_prop: ndarray[Any, dtype[float64]] | None = None
dummy_categorical: List[str]
explained_var: ndarray[Any, dtype[float64]]
explained_var_ratio: ndarray[Any, dtype[float64]]
is_fitted: bool = False
mean: Series | None = None
modalities_types: Dict[str, str]
nf: int
original_categorical: List[str]
original_continuous: List[str]
original_dtypes: Series
prop: Series | None = None
row_weights: ndarray[Any, dtype[float64]]
s: ndarray[Any, dtype[float64]] | None = None
std: Series | None = None
type: str | None = None
variable_coord: DataFrame

saiph.famd

FAMD projection module.

saiph.reduction.famd.center(df: DataFrame, quanti: List[str], quali: List[str]) Tuple[DataFrame, ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data, scale it, compute modalities and proportions of each categorical.

Used as internal function during fit.

NB: saiph.reduction.famd.scaler is better suited when a Model is already fitted.

Parameters:
  • df – DataFrame to center.

  • quanti – Indices of continuous variables.

  • quali – Indices of categorical variables.

Returns:

The scaled DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe. prop: Proportion of each categorical. _modalities: Modalities for the MCA.

Return type:

df_scale

saiph.reduction.famd.compute_categorical_cos2(model: Model, df: DataFrame, min_nf: int) DataFrame

Compute the cos2 statistic for categorical variables.

Parameters:
  • model – model

  • df – dataframe

  • min_nf – number of degrees of freedom

Return type:

dataframe of categorical cos2

saiph.reduction.famd.compute_continuous_cos2(model: Model, scaled_df: DataFrame, min_nf: int, s: ndarray[Any, dtype[float64]], U: ndarray[Any, dtype[float64]]) DataFrame
saiph.reduction.famd.fit(df: ~pandas.core.frame.DataFrame, nf: int | None = None, col_weights: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]] | None = None, center: ~typing.Callable[[~pandas.core.frame.DataFrame, ~typing.List[str], ~typing.List[str]], ~typing.Tuple[~pandas.core.frame.DataFrame, ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]]]] = <function center>, seed: int | None = None) Model

Fit a FAMD model on data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data.

Return type:

model

saiph.reduction.famd.fit_transform(df: DataFrame, nf: int | None = None, col_weights: ndarray[Any, dtype[float64]] | None = None, seed: int | None = None) Tuple[DataFrame, Model]

Fit a FAMD model on data and return transformed data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The transformed data. model: The model for transforming new data.

Return type:

coord

saiph.reduction.famd.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) Tuple[DataFrame, DataFrame]

Compute the contributions of the df variables within the fitted space.

Parameters:
  • model – Model computed by fit.

  • df – dataframe to compute contributions from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns:

tuple of contributions and cos2.

saiph.reduction.famd.scaler(model: Model, df: DataFrame) DataFrame

Scale data using mean, std, modalities and proportions of each categorical from model.

Parameters:
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns:

The scaled DataFrame.

Return type:

df_scaled

saiph.reduction.famd.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute contributions and cos2.

Parameters:
  • model – Model computed by fit.

  • df – dataframe to compute statistics from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns:

model populated with contribution and cos2.

Return type:

model

saiph.reduction.famd.transform(df: ~pandas.core.frame.DataFrame, model: ~saiph.models.Model, *, scaler: ~typing.Callable[[~saiph.models.Model, ~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function scaler>) DataFrame

Scale and project into the fitted numerical space.

Parameters:
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns:

Coordinates of the dataframe in the fitted space.

Return type:

coord

saiph.mca

MCA projection module.

saiph.reduction.mca.center(df: DataFrame) Tuple[DataFrame, ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data and compute modalities.

Used as internal function during fit.

NB: saiph.reduction.mca.scaler is better suited when a Model is already fitted.

Parameters:

df – DataFrame to center.

Returns:

The centered DataFrame. _modalities: Modalities for the MCA row_sum: Sums line by line column_sum: Sums column by column

Return type:

df_centered

saiph.reduction.mca.fit(df: DataFrame, nf: int | None = None, col_weights: ndarray[Any, dtype[float64]] | None = None, seed: int | None = None) Model

Fit a MCA model on data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data.

Return type:

model

saiph.reduction.mca.fit_transform(df: DataFrame, nf: int | None = None, col_weights: ndarray[Any, dtype[float64]] | None = None, seed: int | None = None) Tuple[DataFrame, Model]

Fit a MCA model on data and return transformed data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data. coord: The transformed data.

Return type:

model

saiph.reduction.mca.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) DataFrame

Compute the contributions of the df variables within the fitted space.

Parameters:
  • model – Model computed by fit.

  • df – dataframe to compute contributions from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns:

contributions

saiph.reduction.mca.scaler(model: Model, df: DataFrame) DataFrame

Scale data using modalities from model.

Parameters:
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns:

The scaled DataFrame.

Return type:

df_scaled

saiph.reduction.mca.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute the contributions.

Parameters:
  • model – Model computed by fit.

  • df – dataframe to compute contributions from in the original space

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns:

model.

saiph.reduction.mca.transform(df: DataFrame, model: Model) DataFrame

Scale and project into the fitted numerical space.

Parameters:
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns:

Coordinates of the dataframe in the fitted space.

Return type:

coord

saiph.pca

PCA projection module.

saiph.reduction.pca.center(df: DataFrame) Tuple[DataFrame, Series, Series]

Center data and standardize it if scale. Compute mean and std values.

Used as internal function during fit.

NB: saiph.reduction.pca.scaler is better suited when a Model is already fitted.

Parameters:

df – DataFrame to center.

Returns:

The centered DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe.

Return type:

df

saiph.reduction.pca.fit(df: DataFrame, nf: int | None = None, col_weights: ndarray[Any, dtype[float64]] | None = None, seed: int | None = None) Model

Fit a PCA model on data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data.

Return type:

model

saiph.reduction.pca.fit_transform(df: DataFrame, nf: int | None = None, col_weights: ndarray[Any, dtype[float64]] | None = None, seed: int | None = None) Tuple[DataFrame, Model]

Fit a PCA model on data and return transformed data.

Parameters:
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns:

The model for transforming new data. coord: The transformed data.

Return type:

model

saiph.reduction.pca.scaler(model: Model, df: DataFrame) DataFrame

Scale data using mean and std from model.

Parameters:
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns:

The scaled DataFrame.

Return type:

df

saiph.reduction.pca.transform(df: DataFrame, model: Model) DataFrame

Scale and project into the fitted numerical space.

Parameters:
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns:

Coordinates of the dataframe in the fitted space.

Return type:

coord

saiph.svd

saiph.reduction.utils.svd.get_direct_randomized_svd(A: ndarray[Any, dtype[float64]], l_retained_dimensions: int, q: int = 2, seed: int | None = None) Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]]

Compute a fixed-rank SVD approximation using random methods.

The computation of the randomized SVD is generally faster than a regular SVD when we retain a smaller number of dimensions than the dimension of the matrix.

From https://arxiv.org/abs/0909.4061, algorithm 5.1 page 29 (Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. Halko, Nathan and Martinsson, Per-Gunnar and Tropp, Joel A.)

Parameters:
  • A (input matrix, shape (m, n)) –

  • l_retained_dimensions (target number of retained dimensions, l<min(m,n)) –

  • q (exponent of the power method. Higher this exponent, the more precise will be) –

  • SVD (the) –

  • compute. (but more complex to) –

  • seed (random seed. Default None) –

Returns:

  • U (unitary matrix having left singular vectors as columns, shape (m,l))

  • S (vector of the singular values, shape (l,))

  • Vt (unitary matrix having right singular vectors as rows, shape (l,n))

saiph.reduction.utils.svd.get_randomized_subspace_iteration(A: ndarray[Any, dtype[float64]], l_retained_dimensions: int, *, q: int = 2, seed: int | None = None) ndarray[Any, dtype[float64]]

Generate a subspace for more efficient SVD computation using random methods.

From https://arxiv.org/abs/0909.4061, algorithm 4.4 page 27 (Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. Halko, Nathan and Martinsson, Per-Gunnar and Tropp, Joel A.)

Parameters:
  • A (input matrix, shape (m, n)) –

  • l_retained_dimensions (target number of retained dimensions, l<min(m,n)) –

  • q (exponent of the power method. The higher this exponent, the more precise will be) – the SVD, but more complex to compute. Default 2

  • seed (random seed. Default None) –

Returns:

Q

Return type:

matrix whose range approximates the range of A, shape (m, l)

saiph.reduction.utils.svd.get_svd(df: DataFrame, nf: int | None = None, *, svd_flip: bool = True, seed: int | None = None) Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]]

Compute Singular Value Decomposition.

Parameters:
  • df (Matrix to decompose, shape (m, n)) –

  • nf (target number of dimensions to retain (number of singular values). Default None.) – It keeps the nf higher singular values and nf associated singular vectors.

  • svd_flip (Whether to use svd_flip on U and V or not. Default True) –

  • seed (random seed. Default None) –

Returns:

  • U (unitary matrix having left singular vectors as columns, shape (m,l))

  • S (vector of the singular values, shape (l,))

  • Vt (unitary matrix having right singular vectors as rows, shape (l,n))

saiph.visualization

Visualization functions.

saiph.visualization.plot_circle(model: Model, dimensions: List[int] | None = None, min_cor: float = 0.1, max_var: int = 7) None

Plot correlation circle.

Parameters:
  • model – The model for transforming new data.

  • dimensions – Dimensions to help by each axis

  • min_cor – Minimum correlation threshold to display arrow. default: 0.1

  • max_var – Number of variables to display (in descending order). default: 7

saiph.visualization.plot_explained_var(model: Model, max_dims: int = 10, cumulative: bool = False) None

Plot explained variance per dimension.

Parameters:
  • model – Model computed by fit.

  • max_dims – Maximum number of dimensions to plot

saiph.visualization.plot_projections(model: Model, data: DataFrame, dim: Tuple[int, int] = (0, 1)) None

Plot projections in reduced space for input data.

Parameters:
  • model – Model computed by fit.

  • data – Data to plot in the reduced space

  • dim – Axes to use for the 2D plot (default (0,1))

saiph.visualization.plot_var_contribution(values: ndarray[Any, dtype[float64]], names: ndarray[Any, dtype[bytes_]], title: str = 'Variables contributions') None

Plot the variable contributions for a given dimension.