API Reference

saiph

saiph.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None, sparse: bool = False) → Model

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters

df – Data to project.
nf – Number of components to keep. default: None, which uses all columns.
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None) → Tuple[DataFrame, Model]

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters

df – Data to project.
nf – Number of components to keep. default: ‘all’
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.inverse_transform(coord: DataFrame, model: Model, *, use_approximate_inverse: bool = False, use_max_modalities: bool = True, seed: Optional[int] = None) → DataFrame

Return original format dataframe from coordinates.

Parameters

coord – coord of individuals to reverse transform
model – model used for projection
use_approximate_inverse – matrix is not invertible when n_individuals < n_dimensions an approximation with bias can be done by setting to True. default: False
use_max_modalities – for each variable, it assigns to the individual the modality with the highest proportion (True) or a random modality weighted by their proportion (False). default: True
seed – seed to fix randomness if use_max_modalities = False. default: None

Returns

coordinates transformed into original space.: Retains shape, encoding and structure.

Return type

inverse

saiph.stats(model: Model, df: DataFrame, explode: bool = False) → Model

Compute the contributions and cos2.

Parameters

model – Model computed by fit.
df – original dataframe
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False). Only valid for categorical variables.

Returns

model populated with contribution.

Return type

model

saiph.transform(df: DataFrame, model: Model, *, sparse: bool = False) → DataFrame

Scale and project into the fitted numerical space.

Parameters

df – DataFrame to transform.
model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.models

class saiph.models.Model(dummy_categorical: List[str], original_dtypes: pandas.core.series.Series, original_categorical: List[str], original_continuous: List[str], nf: int, column_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], row_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var_ratio: numpy.ndarray[Any, numpy.dtype[numpy.float64]], variable_coord: pandas.core.frame.DataFrame, V: numpy.ndarray[Any, numpy.dtype[numpy.float64]], modalities_types: Dict[str, str], U: numpy.ndarray[Any, numpy.dtype[numpy.float64]], s: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, mean: Optional[pandas.core.series.Series] = None, std: Optional[pandas.core.series.Series] = None, prop: Optional[pandas.core.series.Series] = None, _modalities: Optional[numpy.ndarray[Any, numpy.dtype[numpy.bytes_]]] = None, D_c: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, type: Optional[str] = None, is_fitted: bool = False, correlations: Optional[pandas.core.frame.DataFrame] = None, contributions: Optional[pandas.core.frame.DataFrame] = None, cos2: Optional[pandas.core.frame.DataFrame] = None, dummies_col_prop: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None)

Bases: object

D_c: Optional[ndarray[Any, dtype[float64]]] = None

U: ndarray[Any, dtype[float64]]

V: ndarray[Any, dtype[float64]]

column_weights: ndarray[Any, dtype[float64]]

contributions: Optional[DataFrame] = None

correlations: Optional[DataFrame] = None

cos2: Optional[DataFrame] = None

dummies_col_prop: Optional[ndarray[Any, dtype[float64]]] = None

dummy_categorical: List[str]

explained_var: ndarray[Any, dtype[float64]]

explained_var_ratio: ndarray[Any, dtype[float64]]

is_fitted: bool = False

mean: Optional[Series] = None

modalities_types: Dict[str, str]

nf: int

original_categorical: List[str]

original_continuous: List[str]

original_dtypes: Series

prop: Optional[Series] = None

row_weights: ndarray[Any, dtype[float64]]

s: Optional[ndarray[Any, dtype[float64]]] = None

std: Optional[Series] = None

type: Optional[str] = None

variable_coord: DataFrame

saiph.famd

FAMD projection module.

saiph.reduction.famd.center(df: DataFrame, quanti: List[str], quali: List[str]) → Tuple[DataFrame, ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data, scale it, compute modalities and proportions of each categorical.

Used as internal function during fit.

NB: saiph.reduction.famd.scaler is better suited when a Model is already fitted.

Parameters

df – DataFrame to center.
quanti – Indices of continuous variables.
quali – Indices of categorical variables.

Returns

The scaled DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe. prop: Proportion of each categorical. _modalities: Modalities for the MCA.

Return type

df_scale

saiph.reduction.famd.compute_categorical_cos2(model: Model, df: DataFrame, min_nf: int) → DataFrame

Compute the cos2 statistic for categorical variables.

Parameters

model – model
df – dataframe
min_nf – number of degrees of freedom

Return type

dataframe of categorical cos2

saiph.reduction.famd.compute_continuous_cos2(model: Model, scaled_df: DataFrame, min_nf: int, s: ndarray[Any, dtype[float64]], U: ndarray[Any, dtype[float64]]) → DataFrame

saiph.reduction.famd.fit(df: ~pandas.core.frame.DataFrame, nf: ~typing.Optional[int] = None, col_weights: ~typing.Optional[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]]] = None, center: ~typing.Callable[[~pandas.core.frame.DataFrame, ~typing.List[str], ~typing.List[str]], ~typing.Tuple[~pandas.core.frame.DataFrame, ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]]]] = <function center>) → Model

Fit a FAMD model on data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.famd.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) → Tuple[DataFrame, Model]

Fit a FAMD model on data and return transformed data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The transformed data. model: The model for transforming new data.

Return type

coord

saiph.reduction.famd.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) → Tuple[DataFrame, DataFrame]

Compute the contributions of the df variables within the fitted space.

Parameters

model – Model computed by fit.
df – dataframe to compute contributions from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

tuple of contributions and cos2.

saiph.reduction.famd.scaler(model: Model, df: DataFrame) → DataFrame

Scale data using mean, std, modalities and proportions of each categorical from model.

Parameters

model – Model computed by fit.
df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df_scaled

saiph.reduction.famd.stats(model: Model, df: DataFrame, explode: bool = False) → Model

Compute contributions and cos2.

Parameters

model – Model computed by fit.
df – dataframe to compute statistics from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

model populated with contribution and cos2.

Return type

model

saiph.reduction.famd.transform(df: ~pandas.core.frame.DataFrame, model: ~saiph.models.Model, *, scaler: ~typing.Callable[[~saiph.models.Model, ~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function scaler>) → DataFrame

Scale and project into the fitted numerical space.

Parameters

df – DataFrame to transform.
model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.mca

MCA projection module.

saiph.reduction.mca.center(df: DataFrame) → Tuple[DataFrame, ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data and compute modalities.

Used as internal function during fit.

NB: saiph.reduction.mca.scaler is better suited when a Model is already fitted.

Parameters: df – DataFrame to center.
Returns: The centered DataFrame. _modalities: Modalities for the MCA row_sum: Sums line by line column_sum: Sums column by column
Return type: df_centered

saiph.reduction.mca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) → Model

Fit a MCA model on data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.mca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) → Tuple[DataFrame, Model]

Fit a MCA model on data and return transformed data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data. coord: The transformed data.

Return type

model

saiph.reduction.mca.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) → DataFrame

Compute the contributions of the df variables within the fitted space.

Parameters

model – Model computed by fit.
df – dataframe to compute contributions from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

contributions

saiph.reduction.mca.scaler(model: Model, df: DataFrame) → DataFrame

Scale data using modalities from model.

Parameters

model – Model computed by fit.
df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df_scaled

saiph.reduction.mca.stats(model: Model, df: DataFrame, explode: bool = False) → Model

Compute the contributions.

Parameters

model – Model computed by fit.
df – dataframe to compute contributions from in the original space
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

model.

saiph.reduction.mca.transform(df: DataFrame, model: Model) → DataFrame

Scale and project into the fitted numerical space.

Parameters

df – DataFrame to transform.
model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.pca

PCA projection module.

saiph.reduction.pca.center(df: DataFrame) → Tuple[DataFrame, Series, Series]

Center data and standardize it if scale. Compute mean and std values.

Used as internal function during fit.

NB: saiph.reduction.pca.scaler is better suited when a Model is already fitted.

Parameters: df – DataFrame to center.
Returns: The centered DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe.
Return type: df

saiph.reduction.pca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) → Model

Fit a PCA model on data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.pca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) → Tuple[DataFrame, Model]

Fit a PCA model on data and return transformed data.

Parameters

df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data. coord: The transformed data.

Return type

model

saiph.reduction.pca.scaler(model: Model, df: DataFrame) → DataFrame

Scale data using mean and std from model.

Parameters

model – Model computed by fit.
df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df

saiph.reduction.pca.transform(df: DataFrame, model: Model) → DataFrame

Scale and project into the fitted numerical space.

Parameters

df – DataFrame to transform.
model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.svd

saiph.reduction.utils.svd.SVD(df: DataFrame, svd_flip: bool = True) → Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Compute Singular Value Decomposition.

Parameters

df – Matrix to decompose.
svd_flip – Whether to use svd_flip on U and V or not.

Returns

Unitary matrix having left singular vectors as columns. s: The singular values. V: Unitary matrix having right singular vectors as rows.

Return type

U

saiph.visualization

Visualization functions.

saiph.visualization.plot_circle(model: Model, dimensions: Optional[List[int]] = None, min_cor: float = 0.1, max_var: int = 7) → None

Plot correlation circle.

Parameters

model – The model for transforming new data.
dimensions – Dimensions to help by each axis
min_cor – Minimum correlation threshold to display arrow. default: 0.1
max_var – Number of variables to display (in descending order). default: 7

saiph.visualization.plot_explained_var(model: Model, max_dims: int = 10, cumulative: bool = False) → None

Plot explained variance per dimension.

Parameters

model – Model computed by fit.
max_dims – Maximum number of dimensions to plot

saiph.visualization.plot_projections(model: Model, data: DataFrame, dim: Tuple[int, int] = (0, 1)) → None

Plot projections in reduced space for input data.

Parameters

model – Model computed by fit.
data – Data to plot in the reduced space
dim – Axes to use for the 2D plot (default (0,1))

saiph.visualization.plot_var_contribution(values: ndarray[Any, dtype[float64]], names: ndarray[Any, dtype[bytes_]], title: str = 'Variables contributions') → None: Plot the variable contributions for a given dimension.