`r lifecycle::badge("stable")`

Calculates the relative variation of subgroups (versions) within a larger grouping by calculating the ratio of their variability to the total variability of the group. This is useful for assessing the consistency of variation within different subgroups compared to the overall group variation.

relative_variation(df, grouping_col, value_col, variability_metric = sd)

Arguments

df

Dataframe containing the data to be analyzed

grouping_col

Column name in `df` that defines the groups for which variation is calculated. Each group consists of multiple versions (subgroups)

value_col

Column name in `df` that contains the numerical values whose variability is being measured

variability_metric

Function to compute variability. Defaults to `sd` (standard deviation), but other functions (e.g., `var`) can be used to customize the measure of variation

Value

Dataframe containing the computed relative variation ratio for each subgroup (`version`) within specified grouping. The output includes: - `grouping_col`: The grouping identifier - `var_version`: The variability (e.g., standard deviation) of the subgroup - `var_total`: The total variability of the entire group - `ratio`: The computed ratio of subgroup variation to total variation

Examples

# Typical usage.
relative_variation(
  df = mtcars,
  grouping_col = cyl,
  value_col = mpg
)
#> # A tibble: 3 × 4
#>     cyl var_version var_total ratio
#>   <dbl>       <dbl>     <dbl> <dbl>
#> 1     4        4.51      6.03 0.748
#> 2     6        1.45      6.03 0.241
#> 3     8        2.56      6.03 0.425

# Use with dplyr:group_by().
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
mtcars |>
group_by(gear) |>
relative_variation(
  grouping_col = cyl,
  value_col = mpg
)
#> # A tibble: 8 × 5
#>    gear   cyl var_version var_total   ratio
#>   <dbl> <dbl>       <dbl>     <dbl>   <dbl>
#> 1     3     4      NA          3.37 NA     
#> 2     3     6       2.33       3.37  0.692 
#> 3     3     8       2.77       3.37  0.823 
#> 4     4     4       4.81       5.28  0.911 
#> 5     4     6       1.55       5.28  0.294 
#> 6     5     4       3.11       6.66  0.467 
#> 7     5     6      NA          6.66 NA     
#> 8     5     8       0.566      6.66  0.0850