API Reference

saiph

saiph.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None, sparse: bool = False) Model

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: None, which uses all columns.

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None) Tuple[DataFrame, Model]

Fit a PCA, MCA or FAMD model on data, imputing what has to be used.

Datetimes must be stored as numbers of seconds since epoch.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: ‘all’

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.inverse_transform(coord: DataFrame, model: Model, *, use_approximate_inverse: bool = False, use_max_modalities: bool = True, seed: Optional[int] = None) DataFrame

Return original format dataframe from coordinates.

Parameters
  • coord – coord of individuals to reverse transform

  • model – model used for projection

  • use_approximate_inverse – matrix is not invertible when n_individuals < n_dimensions an approximation with bias can be done by setting to True. default: False

  • use_max_modalities – for each variable, it assigns to the individual the modality with the highest proportion (True) or a random modality weighted by their proportion (False). default: True

  • seed – seed to fix randomness if use_max_modalities = False. default: None

Returns

coordinates transformed into original space.

Retains shape, encoding and structure.

Return type

inverse

saiph.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute the contributions and cos2.

Parameters
  • model – Model computed by fit.

  • df – original dataframe

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False). Only valid for categorical variables.

Returns

model populated with contribution.

Return type

model

saiph.transform(df: DataFrame, model: Model, *, sparse: bool = False) DataFrame

Scale and project into the fitted numerical space.

Parameters
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.models

class saiph.models.Model(dummy_categorical: List[str], original_dtypes: pandas.core.series.Series, original_categorical: List[str], original_continuous: List[str], nf: int, column_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], row_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var_ratio: numpy.ndarray[Any, numpy.dtype[numpy.float64]], variable_coord: pandas.core.frame.DataFrame, V: numpy.ndarray[Any, numpy.dtype[numpy.float64]], modalities_types: Dict[str, str], U: numpy.ndarray[Any, numpy.dtype[numpy.float64]], s: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, mean: Optional[pandas.core.series.Series] = None, std: Optional[pandas.core.series.Series] = None, prop: Optional[pandas.core.series.Series] = None, _modalities: Optional[numpy.ndarray[Any, numpy.dtype[numpy.bytes_]]] = None, D_c: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, type: Optional[str] = None, is_fitted: bool = False, correlations: Optional[pandas.core.frame.DataFrame] = None, contributions: Optional[pandas.core.frame.DataFrame] = None, cos2: Optional[pandas.core.frame.DataFrame] = None, dummies_col_prop: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None)

Bases: object

D_c: Optional[ndarray[Any, dtype[float64]]] = None
U: ndarray[Any, dtype[float64]]
V: ndarray[Any, dtype[float64]]
column_weights: ndarray[Any, dtype[float64]]
contributions: Optional[DataFrame] = None
correlations: Optional[DataFrame] = None
cos2: Optional[DataFrame] = None
dummies_col_prop: Optional[ndarray[Any, dtype[float64]]] = None
dummy_categorical: List[str]
explained_var: ndarray[Any, dtype[float64]]
explained_var_ratio: ndarray[Any, dtype[float64]]
is_fitted: bool = False
mean: Optional[Series] = None
modalities_types: Dict[str, str]
nf: int
original_categorical: List[str]
original_continuous: List[str]
original_dtypes: Series
prop: Optional[Series] = None
row_weights: ndarray[Any, dtype[float64]]
s: Optional[ndarray[Any, dtype[float64]]] = None
std: Optional[Series] = None
type: Optional[str] = None
variable_coord: DataFrame

saiph.famd

FAMD projection module.

saiph.reduction.famd.center(df: DataFrame, quanti: List[str], quali: List[str]) Tuple[DataFrame, ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data, scale it, compute modalities and proportions of each categorical.

Used as internal function during fit.

NB: saiph.reduction.famd.scaler is better suited when a Model is already fitted.

Parameters
  • df – DataFrame to center.

  • quanti – Indices of continuous variables.

  • quali – Indices of categorical variables.

Returns

The scaled DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe. prop: Proportion of each categorical. _modalities: Modalities for the MCA.

Return type

df_scale

saiph.reduction.famd.compute_categorical_cos2(model: Model, df: DataFrame, min_nf: int) DataFrame

Compute the cos2 statistic for categorical variables.

Parameters
  • model – model

  • df – dataframe

  • min_nf – number of degrees of freedom

Return type

dataframe of categorical cos2

saiph.reduction.famd.compute_continuous_cos2(model: Model, scaled_df: DataFrame, min_nf: int, s: ndarray[Any, dtype[float64]], U: ndarray[Any, dtype[float64]]) DataFrame
saiph.reduction.famd.fit(df: ~pandas.core.frame.DataFrame, nf: ~typing.Optional[int] = None, col_weights: ~typing.Optional[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]]] = None, center: ~typing.Callable[[~pandas.core.frame.DataFrame, ~typing.List[str], ~typing.List[str]], ~typing.Tuple[~pandas.core.frame.DataFrame, ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]]]] = <function center>) Model

Fit a FAMD model on data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.famd.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]

Fit a FAMD model on data and return transformed data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The transformed data. model: The model for transforming new data.

Return type

coord

saiph.reduction.famd.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) Tuple[DataFrame, DataFrame]

Compute the contributions of the df variables within the fitted space.

Parameters
  • model – Model computed by fit.

  • df – dataframe to compute contributions from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

tuple of contributions and cos2.

saiph.reduction.famd.scaler(model: Model, df: DataFrame) DataFrame

Scale data using mean, std, modalities and proportions of each categorical from model.

Parameters
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df_scaled

saiph.reduction.famd.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute contributions and cos2.

Parameters
  • model – Model computed by fit.

  • df – dataframe to compute statistics from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

model populated with contribution and cos2.

Return type

model

saiph.reduction.famd.transform(df: ~pandas.core.frame.DataFrame, model: ~saiph.models.Model, *, scaler: ~typing.Callable[[~saiph.models.Model, ~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function scaler>) DataFrame

Scale and project into the fitted numerical space.

Parameters
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.mca

MCA projection module.

saiph.reduction.mca.center(df: DataFrame) Tuple[DataFrame, ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Center data and compute modalities.

Used as internal function during fit.

NB: saiph.reduction.mca.scaler is better suited when a Model is already fitted.

Parameters

df – DataFrame to center.

Returns

The centered DataFrame. _modalities: Modalities for the MCA row_sum: Sums line by line column_sum: Sums column by column

Return type

df_centered

saiph.reduction.mca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Model

Fit a MCA model on data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.mca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]

Fit a MCA model on data and return transformed data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data. coord: The transformed data.

Return type

model

saiph.reduction.mca.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) DataFrame

Compute the contributions of the df variables within the fitted space.

Parameters
  • model – Model computed by fit.

  • df – dataframe to compute contributions from

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

contributions

saiph.reduction.mca.scaler(model: Model, df: DataFrame) DataFrame

Scale data using modalities from model.

Parameters
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df_scaled

saiph.reduction.mca.stats(model: Model, df: DataFrame, explode: bool = False) Model

Compute the contributions.

Parameters
  • model – Model computed by fit.

  • df – dataframe to compute contributions from in the original space

  • explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)

Returns

model.

saiph.reduction.mca.transform(df: DataFrame, model: Model) DataFrame

Scale and project into the fitted numerical space.

Parameters
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.pca

PCA projection module.

saiph.reduction.pca.center(df: DataFrame) Tuple[DataFrame, Series, Series]

Center data and standardize it if scale. Compute mean and std values.

Used as internal function during fit.

NB: saiph.reduction.pca.scaler is better suited when a Model is already fitted.

Parameters

df – DataFrame to center.

Returns

The centered DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe.

Return type

df

saiph.reduction.pca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Model

Fit a PCA model on data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data.

Return type

model

saiph.reduction.pca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]

Fit a PCA model on data and return transformed data.

Parameters
  • df – Data to project.

  • nf – Number of components to keep. default: min(df.shape)

  • col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])

Returns

The model for transforming new data. coord: The transformed data.

Return type

model

saiph.reduction.pca.scaler(model: Model, df: DataFrame) DataFrame

Scale data using mean and std from model.

Parameters
  • model – Model computed by fit.

  • df – DataFrame to scale.

Returns

The scaled DataFrame.

Return type

df

saiph.reduction.pca.transform(df: DataFrame, model: Model) DataFrame

Scale and project into the fitted numerical space.

Parameters
  • df – DataFrame to transform.

  • model – Model computed by fit.

Returns

Coordinates of the dataframe in the fitted space.

Return type

coord

saiph.svd

saiph.reduction.utils.svd.SVD(df: DataFrame, svd_flip: bool = True) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

Compute Singular Value Decomposition.

Parameters
  • df – Matrix to decompose.

  • svd_flip – Whether to use svd_flip on U and V or not.

Returns

Unitary matrix having left singular vectors as columns. s: The singular values. V: Unitary matrix having right singular vectors as rows.

Return type

U

saiph.visualization

Visualization functions.

saiph.visualization.plot_circle(model: Model, dimensions: Optional[List[int]] = None, min_cor: float = 0.1, max_var: int = 7) None

Plot correlation circle.

Parameters
  • model – The model for transforming new data.

  • dimensions – Dimensions to help by each axis

  • min_cor – Minimum correlation threshold to display arrow. default: 0.1

  • max_var – Number of variables to display (in descending order). default: 7

saiph.visualization.plot_explained_var(model: Model, max_dims: int = 10, cumulative: bool = False) None

Plot explained variance per dimension.

Parameters
  • model – Model computed by fit.

  • max_dims – Maximum number of dimensions to plot

saiph.visualization.plot_projections(model: Model, data: DataFrame, dim: Tuple[int, int] = (0, 1)) None

Plot projections in reduced space for input data.

Parameters
  • model – Model computed by fit.

  • data – Data to plot in the reduced space

  • dim – Axes to use for the 2D plot (default (0,1))

saiph.visualization.plot_var_contribution(values: ndarray[Any, dtype[float64]], names: ndarray[Any, dtype[bytes_]], title: str = 'Variables contributions') None

Plot the variable contributions for a given dimension.