innovate.utils package

Submodules

innovate.adopt.categorization

innovate.adopt.categorization.categorize_adopters(model: DiffusionModel, t: Sequence[float]) DataFrame[source]
No-index:

Categorizes adopters based on the fitted diffusion model.

Parameters:
  • model – A fitted diffusion model.

  • t – A sequence of time points.

Returns:

A pandas DataFrame with the adopter categories for each time point.

innovate.utils.metrics module

innovate.utils.metrics.calculate_aic(n_params: int, n_samples: int, rss: float) float[source]

Calculates the Akaike Information Criterion (AIC).

Assumes errors are normally distributed.

innovate.utils.metrics.calculate_bic(n_params: int, n_samples: int, rss: float) float[source]

Calculates the Bayesian Information Criterion (BIC).

Assumes errors are normally distributed.

innovate.utils.metrics.calculate_mae(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Mean Absolute Error (MAE).

innovate.utils.metrics.calculate_mape(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Mean Absolute Percentage Error (MAPE).

innovate.utils.metrics.calculate_mse(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Mean Squared Error (MSE).

innovate.utils.metrics.calculate_r_squared(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the R-squared (coefficient of determination).

innovate.utils.metrics.calculate_rmse(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Root Mean Squared Error (RMSE).

innovate.utils.metrics.calculate_rss(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Residual Sum of Squares (RSS).

innovate.utils.metrics.calculate_smape(y_true: Sequence[float], y_pred: Sequence[float]) float[source]

Calculates the Symmetric Mean Absolute Percentage Error (SMAPE).

innovate.utils.model_evaluation module

innovate.utils.model_evaluation.compare_models(models: Dict[str, DiffusionModel], t_true: Sequence[float], y_true: Sequence[float]) DataFrame[source]

Compares multiple diffusion models based on various goodness-of-fit metrics.

Parameters:
  • models – A dictionary where keys are model names (str) and values are fitted DiffusionModel instances.

  • t_true – The true time points.

  • y_true – The true cumulative adoption values.

Returns:

A pandas DataFrame containing the comparison metrics for each model.

innovate.utils.model_evaluation.compute_residuals(model: DiffusionModel, t: Sequence[float], y: Sequence[float]) ndarray[source]

Return the residuals for a fitted model.

innovate.utils.model_evaluation.find_best_model(comparison_df: DataFrame, metric: str = 'RMSE', minimize: bool = True) Tuple[str, Dict[str, Any]][source]

Identifies the best performing model from a comparison DataFrame.

Parameters:
  • comparison_df – The DataFrame returned by compare_models.

  • metric – The metric to use for comparison (e.g., ‘RMSE’, ‘R-squared’).

  • minimize – If True, the best model has the minimum value for the metric. If False, the best model has the maximum value.

Returns:

A tuple containing the name of the best model and its full results row.

innovate.utils.model_evaluation.get_fit_metrics(model: DiffusionModel, t: Sequence[float], y: Sequence[float]) Dict[str, float][source]

Calculates various goodness-of-fit metrics for a model.

Parameters:
  • model – The fitted diffusion model.

  • t – The time points.

  • y – The true cumulative adoption values.

Returns:

A dictionary containing the calculated metrics.

innovate.utils.model_evaluation.model_aic(model: DiffusionModel, t: Sequence[float], y: Sequence[float]) float[source]

Return the Akaike Information Criterion for a fitted model.

innovate.utils.model_evaluation.model_bic(model: DiffusionModel, t: Sequence[float], y: Sequence[float]) float[source]

Return the Bayesian Information Criterion for a fitted model.

innovate.utils.model_evaluation.residual_acf(model: DiffusionModel, t: Sequence[float], y: Sequence[float], nlags: int = 40) ndarray[source]

Return the autocorrelation function of model residuals.

innovate.utils.model_evaluation.residual_pacf(model: DiffusionModel, t: Sequence[float], y: Sequence[float], nlags: int = 40) ndarray[source]

Return the partial autocorrelation function of model residuals.

innovate.utils.preprocessing module

innovate.utils.preprocessing.aggregate_time_series(data: Series | DataFrame, freq: str) Series | DataFrame[source]

Aggregates time series data to a specified frequency (e.g., ‘D’, ‘W’, ‘M’).

innovate.utils.preprocessing.apply_rolling_average(data: Series, window: int) Series[source]

Applies a rolling average to a time series.

Parameters:
  • data – A pandas Series.

  • window – The size of the rolling window.

Returns:

A pandas Series with the rolling average applied.

innovate.utils.preprocessing.apply_sarima(data: Series, order: Tuple[int, int, int], seasonal_order: Tuple[int, int, int, int]) Series[source]

Fits a SARIMA model to a time series and returns the fitted values.

Parameters:
  • data – A pandas Series.

  • order – The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters.

  • seasonal_order – The (P,D,Q,s) seasonal order of the model.

Returns:

A pandas Series with the fitted values from the SARIMA model.

innovate.utils.preprocessing.apply_stl_decomposition(data: Series, period: int = None, robust: bool = True) Tuple[Series, Series, Series][source]

Applies Seasonal-Trend decomposition using Loess (STL) to a time series.

Parameters:
  • data – A pandas Series with a DatetimeIndex.

  • period – Period of the seasonality. If None, it will try to infer.

  • robust – Whether to use robust fitting (less sensitive to outliers).

Returns:

A tuple of (trend, seasonal, residuals) as pandas Series.

innovate.utils.preprocessing.cumulative_sum(data: Sequence[float]) ndarray[source]

Calculates the cumulative sum of a sequence.

innovate.utils.preprocessing.ensure_datetime_index(data: Series | DataFrame) Series | DataFrame[source]

Ensures a pandas Series or DataFrame has a datetime index.

Module contents