Diverse Data Hub
  • Data Sets
    • Citation
    • Collaborate
      • Website
      • R Package

    On this page

    • Featured Data Sets
    • Install diversedata
      • Prerequisites
      • Installation
      • Loading the Package
      • Usage
    • License
    • Development Team
      • Owners
      • Collaborators

    Diverse Data Hub is an open educational resource offering curated data sets focused on equity, diversity, inclusion, and other socially relevant topics. It is designed to support students, educators, and researchers in accessing and working with meaningful data in their teaching, learning, and analysis.

    Data sets are available through the diversedata R package, allowing for straightforward integration into data science workflows. Each data set includes detailed documentation and contextual background to support informed exploration and connection to real-world topics. Example case studies are also included to illustrate practical applications.

    Get Started →

    Featured Data Sets

    Women’s March Madness

    This data set tracks every NCAA Division I Women’s Basketball Tournament appearance since 1982 up until 2018. It includes team seeds, results, bid types, season and conference records, and regional placements. Useful for analyzing team success, seeding impact, conference strength, and historical trends in women’s college basketball over four decades.

    More Details

    BC Indigenous Businesses

    The BC Indigenous Business Listings data set, compiled in 2025 by the Government of British Columbia, offers a detailed snapshot of Indigenous-owned businesses throughout the province. It includes information on the Indigenous entrepreneurship in urban and rural areas. This data set highlights economic activities across various traditional territories and sectors.

    More Details

    How Couples Meet and Stay Together

    This data set contains information from a 2022 survey of people across the U.S. to understand how couples meet and stay together. It focuses on how relationships were influenced by the COVID-19 pandemic and offers insight at modern relationships, including changes in dating habits and how couples adapted during a challenging time.

    More Details

    Wildfire

    This data set on Canadian wildfires includes data on fire size, cause, location, detection method, response, and weather. Collected from official sources, it supports wildfire risk assessment and response. It also highlights social and geographic disparities, emphasizing impacts on remote and underserved communities facing climate-related and infrastructure challenges.

    More Details

    Gender Assessment

    The Gender Assessment data set from the World Benchmarking Alliance evaluates nearly 2,000 of the world’s most influential companies on gender equality. Covering data from 2023 and 2024, the data set highlights early corporate actions and progress on gender issues. It serves as a benchmarking tool for identifying gaps and holding companies accountable for advancing gender equality.

    More Details

    Global Rights

    This data set provides yearly, country-level information on LGBTQ+ rights, economic indicators, and education spending. The data is compiled from Our World in Data, with primary sources including Equaldex, the World Bank, and other open-access data sets.

    More Details

    Install diversedata

    To install the diversedata R package from GitHub, follow these steps:

    Prerequisites

    • Ensure you have R installed (download from CRAN)

    • Install devtools package, if needed:

    install.packages("devtools")

    Installation

    Install diversedata directly from GitHub using:

    devtools::install_github("diverse-data-hub/diversedata")

    Loading the Package

    After installation, load the package into your R session:

    library(diversedata)

    Usage

    Once installed, you can explore the available data sets and their documentation:

    # List available data sets
    data(package = "diversedata")
    
    # View documentation for a specific data set
    ?wildfire
    
    # To load a data set into the environment:
    data("wildfire")

    License

    CC BY 4.0

    Full License

    Development Team

    Owners

    • Katie Burak PhD
    • Elham E Khoda PhD

    Collaborators

    • Azin Piran
    • Francisco Ramirez
    • Siddarth Subrahmanian
     
     

    This page is built with Quarto.