Skip to content

Demographic Modelling

The src.demographic_modelling module provides tools to simulate demographic changes in a population over time. This is a key part of the "Historical Analysis Enhancements" outlined in the project roadmap.

The initial version of this module focuses on two key aspects of demographic change: 1. Aging: All individuals in the population are aged forward by one year. 2. Fertility: New individuals (babies) are added to the population based on age-specific fertility rates.

Key Functions

age_population_forward()

This is the main function for performing the demographic simulation for a single year.

src.demographic_modelling.age_population_forward(df, year)

Ages a population DataFrame forward by one year and simulates births.

This function performs two main operations: 1. Increments the 'age' of every individual in the DataFrame by 1. 2. Simulates new births based on age-specific fertility rates for the given year. It assumes the presence of 'age', 'sex', and 'family_id' columns. 'sex' is expected to be 'Male' or 'Female'.

Parameters:

Name Type Description Default
df DataFrame

The input population as a pandas DataFrame.

required
year int

The starting year of the population. The function will simulate events to create the population for year + 1.

required

Returns:

Type Description
DataFrame

A new pandas DataFrame representing the population in the next year.

Source code in src/demographic_modelling.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def age_population_forward(df: pd.DataFrame, year: int) -> pd.DataFrame:
    """
    Ages a population DataFrame forward by one year and simulates births.

    This function performs two main operations:
    1.  Increments the 'age' of every individual in the DataFrame by 1.
    2.  Simulates new births based on age-specific fertility rates for the
        given year. It assumes the presence of 'age', 'sex', and 'family_id'
        columns. 'sex' is expected to be 'Male' or 'Female'.

    Args:
        df: The input population as a pandas DataFrame.
        year: The starting year of the population. The function will simulate
              events to create the population for year + 1.

    Returns:
        A new pandas DataFrame representing the population in the next year.
    """
    print(f"Aging population from {year} to {year + 1}...")

    # 1. Age the existing population
    aged_df = df.copy()
    aged_df["age"] = aged_df["age"] + 1

    # 2. Simulate births
    fertility_data = get_fertility_data()
    year_str = str(year)

    if year_str not in fertility_data:
        print(f"Warning: No fertility data for year {year}. No births will be simulated.")
        return aged_df

    rates_for_year = fertility_data[year_str]

    women_of_childbearing_age = aged_df[(aged_df["sex"] == "Female") & (aged_df["age"] >= 15) & (aged_df["age"] <= 49)]

    new_births = []

    for _, woman in women_of_childbearing_age.iterrows():
        fertility_rate = _get_rate_for_age(woman["age"], rates_for_year)
        if random.random() < fertility_rate:  # nosec B311
            # A birth occurs!
            new_baby = {
                # Inherit family-level characteristics
                "family_id": woman["family_id"],
                "region": woman.get("region", "Unknown"),
                # Baby-specific characteristics
                "age": 0,
                "sex": random.choice(["Male", "Female"]),  # nosec B311
                # Assume babies have no income or assets initially
                "income": 0,
                "assets": 0,
            }
            # Add other columns with default values if they exist in the dataframe
            for col in aged_df.columns:
                if col not in new_baby:
                    new_baby[col] = 0

            new_births.append(new_baby)

    if new_births:
        print(f"Simulated {len(new_births)} new births.")
        babies_df = pd.DataFrame(new_births)
        final_df = pd.concat([aged_df, babies_df], ignore_index=True)
    else:
        print("No births were simulated.")
        final_df = aged_df

    return final_df

Data Source for Fertility Rates

The simulation of births relies on age-specific fertility rates. The get_fertility_data() function currently loads this data from a placeholder file: src/data/fertility_rates.json.

Important: The data in this file is for demonstration purposes only and is not based on real, comprehensive historical data. For accurate research, this data should be replaced with official data from a source like Stats NZ Infoshare.

Current Limitations & Future Work

This is the first version of the demographic modelling module and has several limitations that could be addressed in future work:

  • Mortality: The model does not yet simulate deaths.
  • Family Formation: The model only adds children to existing families; it does not simulate the formation of new family units.
  • Migration: The model does not account for immigration or emigration.
  • Data: The underlying fertility data is a placeholder.

Usage Example

A full, working example of how to use this functionality can be found in the examples/run_demographic_simulation.py script. The basic process is to load a population DataFrame for a given year and pass it to the age_population_forward() function to get the population for the next year.

# Example snippet from examples/run_demographic_simulation.py

population_1991 = age_population_forward(
    df=population_1990,
    year=1990
)