Diverse Data Hub
  • Data Sets
  • Citation
  • Collaborate

    On this page

    • Featured Data Sets
    • Install diversedata
    • License

    Diverse Data Hub is an open educational resource offering curated data sets focused on equity, diversity, inclusion, and other socially relevant topics. It is designed to support students, educators, and researchers in accessing and working with meaningful data.

    Data sets are available through the diversedata R and Python package, allowing for straightforward integration into data science workflows. Each dataset includes detailed documentation and contextual background to support informed exploration and connection to real-world topics. Example case studies are also included to illustrate practical applications.

    Featured Data Sets

    Women’s March Madness

    This data set tracks every NCAA Division I Women’s Basketball Tournament appearance since 1982 up until 2018. It includes team seeds, results, bid types, season and conference records, and regional placements. Useful for analyzing team success, seeding impact, conference strength, and historical trends in women’s college basketball over four decades.

    BC Indigenous Businesses

    The BC Indigenous Business Listings data set, compiled in 2025 by the Government of British Columbia, offers a detailed snapshot of Indigenous-owned businesses throughout the province. It includes information on the Indigenous entrepreneurship in urban and rural areas. This data set highlights economic activities across various traditional territories and sectors.

    Install diversedata

    • R Package
    • Python Package

    Prerequisites

    • R (>= 3.5): Download from CRAN

    • devtools (R package)

      install.packages("devtools")

    Installation

    Install diversedata directly from GitHub using:

    devtools::install_github("diverse-data-hub/diversedata")

    Usage

    Once installed, you can explore the available data sets and their documentation:

    library(diversedata)
    
    # List available data sets
    data(package = "diversedata")
    
    # View documentation for a specific data set
    ?wildfire
    
    # To load a data set into the environment:
    data("wildfire")

    Prerequisites

    • Python (>= 3.12) & pip: Download from python.org

    • pandas Python package (>= 2.3.1): This will be installed automatically when the diversedata Python package is installed via pip

    Installation

    The diversedata Python package can be installed via pip:

    pip install diversedata

    Usage

    Once installed, you can explore the available data sets and their documentation:

    import diversedata as dd
    
    # List available datasets
    dd.list_available_datasets()
    
    # View documentation for a specific dataset
    dd.print_data_description('wildfire')
    
    # To load a dataset and save it to an object:
    df = dd.load_data('wildfire')

    License

    CC BY 4.0

    Full License

     
     

    This page is built with Quarto.