API Reference
saiph
- saiph.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None, sparse: bool = False) Model
Fit a PCA, MCA or FAMD model on data, imputing what has to be used.
Datetimes must be stored as numbers of seconds since epoch.
- Parameters
df – Data to project.
nf – Number of components to keep. default: None, which uses all columns.
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data.
- Return type
model
- saiph.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[Dict[str, Union[int, float]]] = None) Tuple[DataFrame, Model]
Fit a PCA, MCA or FAMD model on data, imputing what has to be used.
Datetimes must be stored as numbers of seconds since epoch.
- Parameters
df – Data to project.
nf – Number of components to keep. default: ‘all’
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data.
- Return type
model
- saiph.inverse_transform(coord: DataFrame, model: Model, *, use_approximate_inverse: bool = False, use_max_modalities: bool = True, seed: Optional[int] = None) DataFrame
Return original format dataframe from coordinates.
- Parameters
coord – coord of individuals to reverse transform
model – model used for projection
use_approximate_inverse – matrix is not invertible when n_individuals < n_dimensions an approximation with bias can be done by setting to
True. default:Falseuse_max_modalities – for each variable, it assigns to the individual the modality with the highest proportion (True) or a random modality weighted by their proportion (False). default: True
seed – seed to fix randomness if use_max_modalities = False. default: None
- Returns
- coordinates transformed into original space.
Retains shape, encoding and structure.
- Return type
inverse
- saiph.stats(model: Model, df: DataFrame, explode: bool = False) Model
Compute the contributions and cos2.
- Parameters
model – Model computed by fit.
df – original dataframe
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False). Only valid for categorical variables.
- Returns
model populated with contribution.
- Return type
model
saiph.models
- class saiph.models.Model(dummy_categorical: List[str], original_dtypes: pandas.core.series.Series, original_categorical: List[str], original_continuous: List[str], nf: int, column_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], row_weights: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var: numpy.ndarray[Any, numpy.dtype[numpy.float64]], explained_var_ratio: numpy.ndarray[Any, numpy.dtype[numpy.float64]], variable_coord: pandas.core.frame.DataFrame, V: numpy.ndarray[Any, numpy.dtype[numpy.float64]], modalities_types: Dict[str, str], U: numpy.ndarray[Any, numpy.dtype[numpy.float64]], s: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, mean: Optional[pandas.core.series.Series] = None, std: Optional[pandas.core.series.Series] = None, prop: Optional[pandas.core.series.Series] = None, _modalities: Optional[numpy.ndarray[Any, numpy.dtype[numpy.bytes_]]] = None, D_c: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None, type: Optional[str] = None, is_fitted: bool = False, correlations: Optional[pandas.core.frame.DataFrame] = None, contributions: Optional[pandas.core.frame.DataFrame] = None, cos2: Optional[pandas.core.frame.DataFrame] = None, dummies_col_prop: Optional[numpy.ndarray[Any, numpy.dtype[numpy.float64]]] = None)
Bases:
object- D_c: Optional[ndarray[Any, dtype[float64]]] = None
- U: ndarray[Any, dtype[float64]]
- V: ndarray[Any, dtype[float64]]
- column_weights: ndarray[Any, dtype[float64]]
- contributions: Optional[DataFrame] = None
- correlations: Optional[DataFrame] = None
- cos2: Optional[DataFrame] = None
- dummies_col_prop: Optional[ndarray[Any, dtype[float64]]] = None
- dummy_categorical: List[str]
- explained_var: ndarray[Any, dtype[float64]]
- explained_var_ratio: ndarray[Any, dtype[float64]]
- is_fitted: bool = False
- mean: Optional[Series] = None
- modalities_types: Dict[str, str]
- nf: int
- original_categorical: List[str]
- original_continuous: List[str]
- original_dtypes: Series
- prop: Optional[Series] = None
- row_weights: ndarray[Any, dtype[float64]]
- s: Optional[ndarray[Any, dtype[float64]]] = None
- std: Optional[Series] = None
- type: Optional[str] = None
- variable_coord: DataFrame
saiph.famd
FAMD projection module.
- saiph.reduction.famd.center(df: DataFrame, quanti: List[str], quali: List[str]) Tuple[DataFrame, ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]
Center data, scale it, compute modalities and proportions of each categorical.
Used as internal function during fit.
NB: saiph.reduction.famd.scaler is better suited when a Model is already fitted.
- Parameters
df – DataFrame to center.
quanti – Indices of continuous variables.
quali – Indices of categorical variables.
- Returns
The scaled DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe. prop: Proportion of each categorical. _modalities: Modalities for the MCA.
- Return type
df_scale
- saiph.reduction.famd.compute_categorical_cos2(model: Model, df: DataFrame, min_nf: int) DataFrame
Compute the cos2 statistic for categorical variables.
- Parameters
model – model
df – dataframe
min_nf – number of degrees of freedom
- Return type
dataframe of categorical cos2
- saiph.reduction.famd.compute_continuous_cos2(model: Model, scaled_df: DataFrame, min_nf: int, s: ndarray[Any, dtype[float64]], U: ndarray[Any, dtype[float64]]) DataFrame
- saiph.reduction.famd.fit(df: ~pandas.core.frame.DataFrame, nf: ~typing.Optional[int] = None, col_weights: ~typing.Optional[~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]]] = None, center: ~typing.Callable[[~pandas.core.frame.DataFrame, ~typing.List[str], ~typing.List[str]], ~typing.Tuple[~pandas.core.frame.DataFrame, ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.float64]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]], ~numpy.ndarray[~typing.Any, ~numpy.dtype[~typing.Any]]]] = <function center>) Model
Fit a FAMD model on data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data.
- Return type
model
- saiph.reduction.famd.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]
Fit a FAMD model on data and return transformed data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The transformed data. model: The model for transforming new data.
- Return type
coord
- saiph.reduction.famd.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) Tuple[DataFrame, DataFrame]
Compute the contributions of the df variables within the fitted space.
- Parameters
model – Model computed by fit.
df – dataframe to compute contributions from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)
- Returns
tuple of contributions and cos2.
- saiph.reduction.famd.scaler(model: Model, df: DataFrame) DataFrame
Scale data using mean, std, modalities and proportions of each categorical from model.
- Parameters
model – Model computed by fit.
df – DataFrame to scale.
- Returns
The scaled DataFrame.
- Return type
df_scaled
- saiph.reduction.famd.stats(model: Model, df: DataFrame, explode: bool = False) Model
Compute contributions and cos2.
- Parameters
model – Model computed by fit.
df – dataframe to compute statistics from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)
- Returns
model populated with contribution and cos2.
- Return type
model
- saiph.reduction.famd.transform(df: ~pandas.core.frame.DataFrame, model: ~saiph.models.Model, *, scaler: ~typing.Callable[[~saiph.models.Model, ~pandas.core.frame.DataFrame], ~pandas.core.frame.DataFrame] = <function scaler>) DataFrame
Scale and project into the fitted numerical space.
- Parameters
df – DataFrame to transform.
model – Model computed by fit.
- Returns
Coordinates of the dataframe in the fitted space.
- Return type
coord
saiph.mca
MCA projection module.
- saiph.reduction.mca.center(df: DataFrame) Tuple[DataFrame, ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]
Center data and compute modalities.
Used as internal function during fit.
NB: saiph.reduction.mca.scaler is better suited when a Model is already fitted.
- Parameters
df – DataFrame to center.
- Returns
The centered DataFrame. _modalities: Modalities for the MCA row_sum: Sums line by line column_sum: Sums column by column
- Return type
df_centered
- saiph.reduction.mca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Model
Fit a MCA model on data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data.
- Return type
model
- saiph.reduction.mca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]
Fit a MCA model on data and return transformed data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data. coord: The transformed data.
- Return type
model
- saiph.reduction.mca.get_variable_contributions(model: Model, df: DataFrame, explode: bool = False) DataFrame
Compute the contributions of the df variables within the fitted space.
- Parameters
model – Model computed by fit.
df – dataframe to compute contributions from
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)
- Returns
contributions
- saiph.reduction.mca.scaler(model: Model, df: DataFrame) DataFrame
Scale data using modalities from model.
- Parameters
model – Model computed by fit.
df – DataFrame to scale.
- Returns
The scaled DataFrame.
- Return type
df_scaled
- saiph.reduction.mca.stats(model: Model, df: DataFrame, explode: bool = False) Model
Compute the contributions.
- Parameters
model – Model computed by fit.
df – dataframe to compute contributions from in the original space
explode – whether to split the contributions of each modality (True) or sum them as the contribution of the whole variable (False)
- Returns
model.
saiph.pca
PCA projection module.
- saiph.reduction.pca.center(df: DataFrame) Tuple[DataFrame, Series, Series]
Center data and standardize it if scale. Compute mean and std values.
Used as internal function during fit.
NB: saiph.reduction.pca.scaler is better suited when a Model is already fitted.
- Parameters
df – DataFrame to center.
- Returns
The centered DataFrame. mean: Mean of the input dataframe. std: Standard deviation of the input dataframe.
- Return type
df
- saiph.reduction.pca.fit(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Model
Fit a PCA model on data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data.
- Return type
model
- saiph.reduction.pca.fit_transform(df: DataFrame, nf: Optional[int] = None, col_weights: Optional[ndarray[Any, dtype[float64]]] = None) Tuple[DataFrame, Model]
Fit a PCA model on data and return transformed data.
- Parameters
df – Data to project.
nf – Number of components to keep. default: min(df.shape)
col_weights – Weight assigned to each variable in the projection (more weight = more importance in the axes). default: np.ones(df.shape[1])
- Returns
The model for transforming new data. coord: The transformed data.
- Return type
model
saiph.svd
- saiph.reduction.utils.svd.SVD(df: DataFrame, svd_flip: bool = True) Tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]
Compute Singular Value Decomposition.
- Parameters
df – Matrix to decompose.
svd_flip – Whether to use svd_flip on U and V or not.
- Returns
Unitary matrix having left singular vectors as columns. s: The singular values. V: Unitary matrix having right singular vectors as rows.
- Return type
U
saiph.visualization
Visualization functions.
- saiph.visualization.plot_circle(model: Model, dimensions: Optional[List[int]] = None, min_cor: float = 0.1, max_var: int = 7) None
Plot correlation circle.
- Parameters
model – The model for transforming new data.
dimensions – Dimensions to help by each axis
min_cor – Minimum correlation threshold to display arrow. default: 0.1
max_var – Number of variables to display (in descending order). default: 7
- saiph.visualization.plot_explained_var(model: Model, max_dims: int = 10, cumulative: bool = False) None
Plot explained variance per dimension.
- Parameters
model – Model computed by fit.
max_dims – Maximum number of dimensions to plot
- saiph.visualization.plot_projections(model: Model, data: DataFrame, dim: Tuple[int, int] = (0, 1)) None
Plot projections in reduced space for input data.
- Parameters
model – Model computed by fit.
data – Data to plot in the reduced space
dim – Axes to use for the 2D plot (default (0,1))
- saiph.visualization.plot_var_contribution(values: ndarray[Any, dtype[float64]], names: ndarray[Any, dtype[bytes_]], title: str = 'Variables contributions') None
Plot the variable contributions for a given dimension.