Diverse Data Hub
  • Data Sets
  • Citation
  • Collaborate

    Gender Assessment data cleaning

    import pandas as pd
    # Load the dataset
    df = pd.read_csv("../data/raw/gender-assessment/gender_assessment.csv")
    # Inspect the data
    print(df.info())
    print(f"Initial number of rows: {len(df)}") 
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 2000 entries, 0 to 1999
    Data columns (total 79 columns):
     #   Column                                                                                                                                                                                                                                                                                                                                                               Non-Null Count  Dtype  
    ---  ------                                                                                                                                                                                                                                                                                                                                                               --------------  -----  
     0   WBA ID                                                                                                                                                                                                                                                                                                                                                               2000 non-null   object 
     1   Company Name                                                                                                                                                                                                                                                                                                                                                         2000 non-null   object 
     2   HQ Country                                                                                                                                                                                                                                                                                                                                                           2000 non-null   object 
     3   HQ Region                                                                                                                                                                                                                                                                                                                                                            2000 non-null   object 
     4   ISIN                                                                                                                                                                                                                                                                                                                                                                 1475 non-null   object 
     5   WBA Industry                                                                                                                                                                                                                                                                                                                                                         1995 non-null   object 
     6   Ownership                                                                                                                                                                                                                                                                                                                                                            1996 non-null   object 
     7   Year assessed                                                                                                                                                                                                                                                                                                                                                        2000 non-null   int64  
     8   Overall Gender Assessment Score                                                                                                                                                                                                                                                                                                                                      2000 non-null   float64
     9   Percentage of Total Possible Score 
    (out of 52.3)                                                                                                                                                                                                                                                                                                                    2000 non-null   int64  
     10  A01. Strategic action                                                                                                                                                                                                                                                                                                                                                2000 non-null   int64  
     11  A01.EA The company made a public commitment to gender equality and women’s empowerment (e.g. signatory to the UN Women’s Empowerment Principles, or having made another public commitment at CEO level).                                                                                                                                                             2000 non-null   object 
     12  A02. Gender targets                                                                                                                                                                                                                                                                                                                                                  2000 non-null   int64  
     13  A02.EA The company discloses one or more time-bound targets on gender equality and women’s empowerment with regard to its workplace.                                                                                                                                                                                                                                 2000 non-null   object 
     14  A02.EC The company discloses one or more time-bound targets on gender equality and women’s empowerment with regard  to its supply chain.                                                                                                                                                                                                                             2000 non-null   object 
     15  A04. Gender-responsive human rights due diligence process                                                                                                                                                                                                                                                                                                            2000 non-null   int64  
     16  A04.EA The company discloses what gender-related human rights impacts it has assessed and prioritised as being salient (i.e. most severe and potentially irremediable if not addressed).                                                                                                                                                                             2000 non-null   object 
     17  A04.EB The company consults with women or women's groups as part of the risk identification and assessment process.                                                                                                                                                                                                                                                  2000 non-null   object 
     18  A05. Grievance mechanisms                                                                                                                                                                                                                                                                                                                                            2000 non-null   float64
     19  A05.EA The company has a gender-responsive mechanism through which employees can report grievances.                                                                                                                                                                                                                                                                  2000 non-null   object 
     20  A05.EB The company has one or more channel(s)/mechanism(s), or participates in a shared mechanism, accessible to all external individuals and communities who may be adversely impacted by the company (or individuals or organisations acting on their behalf or who are otherwise in a position to be aware of adverse impacts), to raise complaints or concerns.  2000 non-null   object 
     21  A05.EC The company collects, analyses and monitors sex-disaggregated grievance data (e.g. number of grievances reported, number of grievances remediated).                                                                                                                                                                                                           2000 non-null   object 
     22  A06. Stakeholder engagement                                                                                                                                                                                                                                                                                                                                          2000 non-null   int64  
     23  A06.EA The company does employee surveys or other engagement mechanisms that specifically address gender equality & women’s empowerment issues.                                                                                                                                                                                                                      2000 non-null   object 
     24  A07. Corrective action process                                                                                                                                                                                                                                                                                                                                       2000 non-null   float64
     25  A07.EA The company screens for gender-related issues among its suppliers as part of its audit process. Can score Partially Met for .5.                                                                                                                                                                                                                               2000 non-null   object 
     26  A07.EB The company identifies any gender-related issues as requiring corrective action to be taken by a supplier within a set period of time in order to remediate the issue.                                                                                                                                                                                        2000 non-null   object 
     27  B01. Gender equality in leadership                                                                                                                                                                                                                                                                                                                                   2000 non-null   int64  
     28  B01.EA The company maintains a gender balance (between 40-60%) at the highest governance body.                                                                                                                                                                                                                                                                       2000 non-null   object 
     29  B01.EB The company maintains a gender balance (between 40-60%) at the senior executive level.                                                                                                                                                                                                                                                                        2000 non-null   object 
     30  B01.EC The company maintains a gender balance (between 40-60%) at the senior management level.                                                                                                                                                                                                                                                                       2000 non-null   object 
     31  B01.ED The company maintains a gender balance (between 40-60%) at the middle/other management level.                                                                                                                                                                                                                                                                 2000 non-null   object 
     32  B02. Professional development and recruitment                                                                                                                                                                                                                                                                                                                        2000 non-null   int64  
     33  B02.EA The company offers professional development programmes (e.g. mentoring programme(s), leadership coaching, access to internal and/or external professional networks, educational programs, and formal sponsorship programmes.                                                                                                                                  2000 non-null   object 
     34  B02.EB The company tracks the number of women who are participating in these programmes.                                                                                                                                                                                                                                                                             2000 non-null   object 
     35  B03. Sex-disaggregated employee data                                                                                                                                                                                                                                                                                                                                 2000 non-null   int64  
     36  B03.EA The company collects sex-disaggregated data on the gender balance of its employees by occupational function.                                                                                                                                                                                                                                                  2000 non-null   object 
     37  B03.EB The company collects sex-disaggregated data on the percentage of employees promoted.                                                                                                                                                                                                                                                                          2000 non-null   object 
     38  B03.EC The company collects sex-disaggregated data on the annual turnover of employees.                                                                                                                                                                                                                                                                              2000 non-null   object 
     39  B03.ED The company collect sex-disaggregated data on the annual absenteeism levels of employees.                                                                                                                                                                                                                                                                     2000 non-null   object 
     40  B04. Gender equality leadership in the supply chain                                                                                                                                                                                                                                                                                                                  2000 non-null   int64  
     41  B04.EA The company collects or requires its suppliers to collect sex-disaggregated data by leadership level across the supply chain.                                                                                                                                                                                                                                 2000 non-null   object 
     42  B06. Enabling environment for freedom of association and collective bargaining                                                                                                                                                                                                                                                                                       2000 non-null   int64  
     43  B06.EB The company describes how it supports the practices of its business relationships in relation to freedom of association and collective bargaining.                                                                                                                                                                                                            2000 non-null   object 
     44  B07. Gender-responsive procurement                                                                                                                                                                                                                                                                                                                                   2000 non-null   int64  
     45  B07.EA The company made a public commitment to gender-responsive procurement.                                                                                                                                                                                                                                                                                        2000 non-null   object 
     46  B07.EB The company procures from women-owned businesses.                                                                                                                                                                                                                                                                                                             2000 non-null   object 
     47  C01. Gender pay gap                                                                                                                                                                                                                                                                                                                                                  2000 non-null   int64  
     48  C01.EA The company collects sex-disaggregated pay data.                                                                                                                                                                                                                                                                                                              2000 non-null   object 
     49  C01.EB The company collects sex-disaggregated pay data by different pay bands, occupational functions, or other financial benefits.                                                                                                                                                                                                                                  2000 non-null   object 
     50  C01.EC The company uses a third party to undertake/verify its pay gap analysis.                                                                                                                                                                                                                                                                                      2000 non-null   object 
     51  C02. Paid primary and secondary carer leave                                                                                                                                                                                                                                                                                                                          2000 non-null   float64
     52  C02.EA The company has a global policy of providing at least 14 weeks of paid primary carer leave offered to full-time employees.                                                                                                                                                                                                                                    2000 non-null   object 
     53  C02.EB The company monitors the return-to-work rate of employees after primary carer leave and their retention a year after primary carer leave.                                                                                                                                                                                                                     2000 non-null   object 
     54  C02.EC The company has a global policy of providing at least two weeks of paid secondary carer leave offered to full-time employees.                                                                                                                                                                                                                                 2000 non-null   object 
     55  C02.ED The company tracks the number of workers who take secondary carer leave.                                                                                                                                                                                                                                                                                      2000 non-null   object 
     56  C03. Childcare and other family support                                                                                                                                                                                                                                                                                                                              2000 non-null   int64  
     57  C03.EA The company offers childcare support to employees.                                                                                                                                                                                                                                                                                                            2000 non-null   object 
     58  C03.EB The company offers other family support to its employees.                                                                                                                                                                                                                                                                                                     2000 non-null   object 
     59  C04. Flexible work                                                                                                                                                                                                                                                                                                                                                   2000 non-null   int64  
     60  C04.EA The company offers flexible working hours to its employees (the ability to alter the start and end of the day).                                                                                                                                                                                                                                               2000 non-null   object 
     61  C04.EB The company collects sex-disaggregated data on the number of employees who have flexible working hour arrangements.                                                                                                                                                                                                                                           2000 non-null   object 
     62  C04.EC The company offers flexible work locations to its employees (the ability to work from home/telecommuting).                                                                                                                                                                                                                                                    2000 non-null   object 
     63  C04.ED The company collects sex-disaggregated data on the number of employees who have flexible work location arrangements.                                                                                                                                                                                                                                          2000 non-null   object 
     64  C06. Living wage in the supply chain                                                                                                                                                                                                                                                                                                                                 2000 non-null   int64  
     65  C06.EA The company requires its suppliers to pay their workers a living wage.                                                                                                                                                                                                                                                                                        2000 non-null   object 
     66  C06.EB The company takes specific actions to help ensure its suppliers pay their workers a living wage.                                                                                                                                                                                                                                                              2000 non-null   object 
     67  D01. Health, safety and well-being in the workplace                                                                                                                                                                                                                                                                                                                  2000 non-null   float64
     68  D01.EA The company has a publicly available policy statement committing it to respect the health and safety of its employees.                                                                                                                                                                                                                                        2000 non-null   object 
     69  D01.EB The company discloses sex-disaggregated information on health and safety for its employees.                                                                                                                                                                                                                                                                   2000 non-null   object 
     70  D01.EC The company provides coverage of the costs associated with any of the following health information and services: maternal health, sexual and reproductive health, and mental health. It has to provide more than two different services for a full score. Partially met if only one service is provided.                                                      2000 non-null   object 
     71  D02. Safe and healthy work in the supply chain                                                                                                                                                                                                                                                                                                                       2000 non-null   int64  
     72  D02.EA The company has a publicly available statement of policy that expects its business relationships to commit to respecting the health and safety of their workers.                                                                                                                                                                                              2000 non-null   object 
     73  D02.EC The company discloses how it monitors the health and safety performance of its business relationships.                                                                                                                                                                                                                                                        2000 non-null   object 
     74  E01. Violence and harassment prevention                                                                                                                                                                                                                                                                                                                              2000 non-null   float64
     75  E01.EA The company has publicly available policies in place regarding violence and harassment in the workplace (e.g., zero tolerance policy, safe transport policy, etc.). Can score Partially Met for .5.                                                                                                                                                           2000 non-null   object 
     76  E02. Violence and harassment remediation                                                                                                                                                                                                                                                                                                                             2000 non-null   float64
     77  E02.EA The company has a remediation process for addressing violence and harassment grievances in the workplace. Can score Partially Met for .5.                                                                                                                                                                                                                     2000 non-null   object 
     78  E02.EB The company collects, analyses and monitors sex-disaggregated data on the remediation of violence and harassment grievances.                                                                                                                                                                                                                                  2000 non-null   object 
    dtypes: float64(7), int64(17), object(55)
    memory usage: 1.2+ MB
    None
    Initial number of rows: 2000
    # Choosing only indicator scores and removing element scores
    #Drop columns with values only in ['Met', 'Unmet', 'Partially Met']
    columns_to_drop = []
    for col in df.columns:
        unique_vals = df[col].dropna().unique()
        if all(val in ['Met', 'Unmet', 'Partially Met'] for val in unique_vals):
            columns_to_drop.append(col)
    
    #WBA ID and ISIN are not required in the analysis so removing
    columns_to_drop.extend(['WBA ID', 'ISIN'])
                           
    df = df.drop(columns=columns_to_drop)
    # Drop rows with missing values in critical columns
    df = df.dropna(subset=["Company Name ", "HQ Country", "Overall Gender Assessment Score"])
    #  Clean column names (convert to lowercase and replace spaces with underscores)
    df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
    # Rename long column names to shorter ones
    rename_map = {
        'company_name': 'company',
        'hq_country': 'country',
        'hq_region': 'region',
        'wba_industry': 'industry',
        'year_assessed': 'year',
        'overall_gender_assessment_score': 'score',
        'percentage_of_total_possible_score_\n(out_of_52.3)': 'percent_score',
        "a01._strategic_action": "strategic_action",
        "a02._gender_targets": "gender_targets",
        "a04._gender-responsive_human_rights_due_diligence_process": "gender_due_diligence",
        "a05._grievance_mechanisms": "grievance_mechanisms",
        "a06._stakeholder_engagement": "stakeholder_engagement",
        "a07._corrective_action_process": "corrective_action",
        "b01._gender_equality_in_leadership": "gender_leadership",
        "b02._professional_development_and_recruitment": "development_recruitment",
        "b03._sex-disaggregated_employee_data": "employee_data_by_sex",
        "b04._gender_equality_leadership_in_the_supply_chain": "supply_chain_gender_leadership",
        "b06._enabling_environment_for_freedom_of_association_and_collective_bargaining": "enabling_environment_union_rights",
        "b07._gender-responsive_procurement": "gender_procurement",
        "c01._gender_pay_gap": "gender_pay_gap",
        "c02._paid_primary_and_secondary_carer_leave": "carer_leave_paid",
        'c03._childcare_and_other_family_support': 'childcare_support',
        'c04._flexible_work': 'flex_work',
        'c06._living_wage_in_the_supply_chain': 'living_wage_supply_chain',
        'd01._health,_safety_and_well-being_in_the_workplace': 'health_safety',
        'd02._safe_and_healthy_work_in_the_supply_chain': 'health_safety_supply_chain',
        'e01._violence_and_harassment_prevention': 'violence_prevention',
        'e02._violence_and_harassment_remediation': 'violence_remediation'
    }
    
    # Apply renaming
    df = df.rename(columns=rename_map)
    # Ensure 'score' and 'percent_score' are numeric
    df['score'] = pd.to_numeric(df['score'], errors='coerce')
    df['percent_score'] = pd.to_numeric(df['percent_score'], errors='coerce')
    # Remove duplicates
    df = df.drop_duplicates()
    # Save cleaned file
    df.to_csv("../data/clean/genderassessment.csv", index=False)
    # Validate cleaned data
    clean_data= pd.read_csv("../data/clean/genderassessment.csv")
    print(f"Final cleaned dataset rows: {len(clean_data)}")  # Final row count
    clean_data.head()
    Final cleaned dataset rows: 2000
    company country region industry ownership year score percent_score strategic_action gender_targets ... gender_procurement gender_pay_gap carer_leave_paid childcare_support flex_work living_wage_supply_chain health_safety health_safety_supply_chain violence_prevention violence_remediation
    0 3M United States North America Chemicals Public 2023 11.3 22 1 0 ... 1 0 0.0 0 2 0 1.0 2 1.0 0.0
    1 Asos United Kingdom Europe & Central Asia Apparel & Footwear Public 2023 16.9 32 1 0 ... 0 0 0.0 1 1 2 0.5 2 0.5 0.0
    2 A.P. Moller - Maersk Denmark Europe & Central Asia Freight & logistics Public 2024 10.9 21 1 1 ... 0 0 0.0 0 0 0 1.0 2 1.0 0.0
    3 ABB Switzerland Europe & Central Asia Capital Goods Public 2023 12.8 25 1 1 ... 0 0 1.0 0 0 0 1.0 2 1.0 0.0
    4 AbbVie United States North America Pharmaceuticals & Biotechnology Public 2023 15.4 30 1 0 ... 1 0 0.0 2 1 0 1.0 2 1.0 0.0

    5 rows × 29 columns

     
     

    This page is built with Quarto.