# Data
library(diversedata) # Diverse Data Hub data sets
# Core libraries
library(tidyverse)
library(lubridate)
# Spatial & mapping
library(sf)
library(terra)
library(ggmap)
library(ggspatial)
library(maptiles)
library(leaflet)
library(leaflet.extras)
# Visualization & color
library(viridis)
# Tables & reporting
library(gt)
library(kableExtra)
# Modeling & interpretation
library(marginaleffects)
library(broom)
Historical Alberta Wildfire Data
About the Data
This data set contains information on wildfires in Canada, compiled from official government sources under the Open Government Licence – Alberta.
The data was gathered to monitor, assess, and respond to wildfire risks across different regions. Wildfires have far-reaching environmental, social, and economic consequences. From an equity and inclusion perspective, analyzing wildfire data can reveal geographic and resource-based disparities in detection and containment efforts, and highlight how certain populations face greater risks due to climate change and limited infrastructure.
In particular, Alberta experiences some of the most severe and frequent wildfires in Canada due to its vast forested areas, dry climate, and increasing temperatures linked to climate change. Wildfires in Alberta can lead to widespread evacuations, destroy homes and livelihoods, and disproportionately affect rural and Indigenous communities, who may lack access to adequate emergency services and infrastructure. Understanding the patterns of wildfire occurrence and spread helps policymakers, environmental planners, and emergency services allocate resources more equitably and implement effective mitigation strategies. This data set enables data-driven approaches to reduce the impact of wildfires and support more resilient and inclusive disaster management practices across Alberta and beyond.
Download
Metadata
Variables
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
year |
ID | Integer | Year of the wildfire. incident | Year | No |
fire_number |
ID | String | Unique identifier for each wildfire | - | No |
current_size |
Feature | Numeric | Final estimated size of the wildfire | Hectares | No |
size_class |
Feature | Categorical | Size classification based on fire area | - | No |
latitude |
Feature | Numeric | Latitude coordinate of fire origin | Degrees | No |
longitude |
Feature | Numeric | Longitude coordinate of fire origin | Degrees | No |
fire_origin |
Metadata | Categorical | General location or region where fire started | - | No |
general_cause |
Feature | Categorical | Broad cause classification of the fire | - | Yes |
responsible_group |
Metadata | Categorical | Agency or group responsible for managing the fire | - | Yes |
activity_class |
Feature | Categorical | Activity classification at the time of ignition | - | Yes |
true_cause |
Feature | Categorical | Detailed fire cause (e.g., Arson Known, Hot Exhaust, Line Impact, Unattended Fire, etc.) | - | No |
fire_start_date |
Time | Date | Date the fire started | YYYY-MM-DD | No |
detection_agent_type |
Feature | Categorical | Type of detection method used (e.g., lookout, patrol, aircraft) | - | No |
detection_agent |
Feature | Categorical | Specific agent who detected the fire | - | Yes |
assessment_hectares |
Feature | Numeric | Officially assessed size of the fire | Hectares | No |
fire_spread_rate |
Feature | Numeric | Rate at which the fire spread | Hectares/hour | Yes |
fire_type |
Feature | Categorical | Fire behavior classification (e.g., surface, crown, ground) | - | No |
fire_position_on_slope |
Feature | Categorical | Position of the fire on slope (e.g., bottom, mid-slope, ridge) | - | Yes |
weather_conditions_over_fire |
Feature | Text | Description of weather at the fire location | - | Yes |
temperature |
Feature | Numeric | Temperature at the fire location | °C | Yes |
relative_humidity |
Feature | Numeric | Relative humidity at the fire location | % | Yes |
wind_direction |
Feature | Categorical | Wind direction during the fire | - | Yes |
wind_speed |
Feature | Numeric | Wind speed during the fire | km/h | Yes |
fuel_type |
Feature | Categorical | Dominant vegetation or material burned (e.g., grass, timber) | - | Yes |
initial_action_by |
Metadata | Categorical | Group that initiated suppression efforts | - | Yes |
ia_arrival_at_fire_date |
Time | DateTime | Time when initial action crews arrived | YYYY-MM-DD | Yes |
ia_access |
Feature | Categorical | Level of access for initial attack teams (e.g., road, helicopter only) | - | Yes |
fire_fighting_start_date |
Time | DateTime | Time when firefighting activities officially started | YYYY-MM-DD | Yes |
fire_fighting_start_size |
Feature | Numeric | Fire size at the time firefighting began | Hectares | Yes |
bucketing_on_fire |
Feature | Binary | Whether aerial bucketing was used on the fire | Yes/No | Yes |
first_bh_date |
Time | DateTime | Date of first blackline containment | YYYY-MM-DD | Yes |
first_bh_size |
Feature | Numeric | Fire size at time of first blacklining | Hectares | Yes |
first_uc_date |
Time | DateTime | Date when fire was first declared under control | YYYY-MM-DD | Yes |
first_uc_size |
Feature | Numeric | Fire size when first declared under control | Hectares | Yes |
first_ex_size_perimeter |
Feature | Numeric | Estimated fire perimeter at the time of first extinguishment | Kilometers | Yes |
Key Features of the Data Set
Each row represents a single wildfire incident and includes information such as:
temperature – The recorded air temperature (°C) at or near the fire location; higher temperatures often increase fire intensity and spread.
wind_speed – Speed of wind (km/h) during the fire; stronger winds can accelerate fire spread and complicate suppression.
relative_humidity – The percentage of moisture in the air; lower humidity typically increases fire risk by drying out vegetation.
fire_spread_rate – The rate at which the fire expanded (e.g., hectares/hour); reflects the fire’s growth dynamics.
fire_type – Classification of fire behavior (e.g., surface, crown); influences how fires are managed and controlled.
fuel_type – The dominant type of vegetation or material burned (e.g., grass, timber); determines fire intensity and burn characteristics.
ia_access – Indicator of how easily suppression crews could access the fire location; limited access can delay response.
latitude – Geographic latitude coordinate of the fire’s origin; used for spatial analysis and regional modeling.
longitude – Geographic longitude coordinate of the fire’s origin; used alongside latitude for location-specific insights.
Purpose and Use Cases
This data set is designed to support analysis of:
Factors contributing to the spread, intensity, and size of wildfires
The impact of weather conditions and fuel types on fire behavior
Geographic and seasonal patterns in wildfire occurrence
The effectiveness and timeliness of initial suppression efforts
Relationships between fire causes, detection methods, and responsible parties
Case Study
Objective
Large wildfires pose serious environmental, social, and economic challenges, especially as climate conditions become more extreme. Identifying the key environmental and human factors linked to these fires can help guide more effective prevention and response strategies.
So, our main question is:
Can we identify the environmental and human factors most associated with large wildfires?
According to Natural Resources Canada, wildfires exceeding 200 hectares in final size are classified as “large fires.” While these fires represent a small percentage of all wildfires, they account for the majority of the total area burned annually.
The goal is to explore potential predictors of fire size, such as weather, fire cause, and detection method, and provide insights that could inform early interventions and resource planning.
Analysis
Loading Libraries
1. Data Cleaning & Processing
- Converted fire size to numeric
- Created a binary variable
large_fire
(TRUE if >200 ha) - Filtered out incomplete records
# Reading Data
<- wildfire
wildfire_data
# Clean and prepare base data
<- wildfire_data |>
wildfire_clean filter(!is.na(assessment_hectares), assessment_hectares > 0) |>
mutate(
large_fire = current_size > 200,
true_cause = as.factor(true_cause),
detection_agent_type = as.factor(detection_agent_type),
temperature = as.numeric(temperature),
wind_speed = as.numeric(wind_speed)
)
# Drop unused levels for modeling
<- wildfire_clean |>
wildfire_clean filter(!is.na(true_cause), !is.na(detection_agent_type)) |>
mutate(
true_cause = droplevels(true_cause),
detection_agent_type = droplevels(detection_agent_type)
)
2. Exploratory Data Analysis
Map of Wildfire Size and Location in Alberta
This interactive map displays the geographic distribution and relative size of wildfires across Alberta, using red circles sized by fire area. Each point represents a wildfire event, with larger circles indicating more extensive burns. The map reveals regions with concentrated wildfire activity and visually emphasizes differences in fire magnitude across the province.
Note
To provide geographic context for our wildfire data, we added a shapefile representing Alberta’s boundaries.
This shapefile was sourced from the Alberta Government Open Data Portal and specifically corresponds to the Electoral Division Shapefile (Bill 33, 2017).
The data was processed and transformed to the appropriate geographic coordinate system to enable mapping alongside our wildfire data set.
# map
leaflet() |>
addProviderTiles("CartoDB.Positron") |>
setView(lng = -115, lat = 55, zoom = 5.5) |>
addPolygons(data = alberta_shape,
color = "#CCCCCC",
weight = 0.5,
fillOpacity = 0.02,
group = "Alberta Boundaries") |>
addCircles(data = wildfire_sf,
radius = ~sqrt(current_size) * 30,
fillOpacity = 0.6,
color = "red",
stroke = FALSE,
group = "Wildfires") |>
addLayersControl(overlayGroups = c("Alberta Boundaries", "Wildfires"),
options = layersControlOptions(collapsed = FALSE)) |>
addLegend(position = "bottomright",
title = "Wildfire Size (approx.)",
colors = "red",
labels = "Larger = Bigger fire")