# Data
library(diversedata) # Diverse Data Hub data sets
# Core libraries
library(tidyverse)
library(lubridate)
# Spatial & mapping
library(sf)
library(terra)
library(ggmap)
library(ggspatial)
library(maptiles)
library(leaflet)
library(leaflet.extras)
# Visualization & color
library(viridis)
# Tables & reporting
library(gt)
library(kableExtra)
# Modeling & interpretation
library(marginaleffects)
library(broom)
Key Features of the Data Set
Each row represents a single wildfire incident and includes information such as:
Environmental conditions (
temperature
,wind_speed
,relative_humidity
) – Air temperature (°C), wind speed (km/h) and relative humidity (%) near the fire; influence fire intensity and spread.Fire behavior metrics (
fire_spread_rate
,fire_type
,fuel_type
) – Fire expansion rate (ha/h), behavior class (surface, crown) and dominant fuel; determine growth and management.Access difficulty (
ia_access
) – Ease of crew access; low access delays response.Location coordinates (
latitude
,longitude
) – Decimal degrees of fire origin; used for mapping and spatial analysis.
Purpose and Use Cases
This data set is designed to support analysis of:
Factors contributing to the spread, intensity, and size of wildfires
The impact of weather conditions and fuel types on fire behavior
Geographic and seasonal patterns in wildfire occurrence
The effectiveness and timeliness of initial suppression efforts
Relationships between fire causes, detection methods, and responsible parties
Case Study
Objective
Large wildfires pose serious environmental, social, and economic challenges, especially as climate conditions become more extreme. Identifying the key environmental and human factors linked to these fires can help guide more effective prevention and response strategies.
So, our main question is:
Can we identify the environmental and human factors most associated with large wildfires?
According to Natural Resources Canada, wildfires exceeding 200 hectares in final size are classified as “large fires.” While these fires represent a small percentage of all wildfires, they account for the majority of the total area burned annually.
The goal is to explore potential predictors of fire size, such as weather, fire cause, and detection method, and provide insights that could inform early interventions and resource planning.
Analysis
Loading Libraries
1. Data Cleaning & Processing
- Converted fire size to numeric
- Created a binary variable
large_fire
(TRUE if >200 ha) - Filtered out incomplete records
# Reading Data
<- wildfire
wildfire_data
# Clean and prepare base data
<- wildfire_data |>
wildfire_clean filter(!is.na(assessment_hectares), assessment_hectares > 0) |>
mutate(
large_fire = current_size > 200,
true_cause = as.factor(true_cause),
detection_agent_type = as.factor(detection_agent_type),
temperature = as.numeric(temperature),
wind_speed = as.numeric(wind_speed)
)
# Drop unused levels for modeling
<- wildfire_clean |>
wildfire_clean filter(!is.na(true_cause), !is.na(detection_agent_type)) |>
mutate(
true_cause = droplevels(true_cause),
detection_agent_type = droplevels(detection_agent_type)
)
2. Exploratory Data Analysis
Map of Wildfire Size and Location in Alberta
This interactive map displays the geographic distribution and relative size of wildfires across Alberta, using red circles sized by fire area. Each point represents a wildfire event, with larger circles indicating more extensive burns. The map reveals regions with concentrated wildfire activity and visually emphasizes differences in fire magnitude across the province.
Note
To provide geographic context for our wildfire data, we added a shapefile representing Alberta’s boundaries.
This shapefile was sourced from the Alberta Government Open Data Portal and specifically corresponds to the Electoral Division Shapefile (Bill 33, 2017).
The data was processed and transformed to the appropriate geographic coordinate system to enable mapping alongside our wildfire data set.
# map
leaflet() |>
addProviderTiles("CartoDB.Positron") |>
setView(lng = -115, lat = 55, zoom = 5.5) |>
addPolygons(data = alberta_shape,
color = "#CCCCCC",
weight = 0.5,
fillOpacity = 0.02,
group = "Alberta Boundaries") |>
addCircles(data = wildfire_sf,
radius = ~sqrt(current_size) * 30,
fillOpacity = 0.6,
color = "red",
stroke = FALSE,
group = "Wildfires") |>
addLayersControl(overlayGroups = c("Alberta Boundaries", "Wildfires"),
options = layersControlOptions(collapsed = FALSE)) |>
addLegend(position = "bottomright",
title = "Wildfire Size (approx.)",
colors = "red",
labels = "Larger = Bigger fire")