Generating exact proportions of categorical values

      Generating exact proportions of categorical values


        Article summary

        By default, the simulator generates random samples. This means that if you have a categorical variable with a given set of proportions for its values, then you will get samples distributed with these proportions on average. But the exact numbers are not guaranteed.

        So if you generate 40 samples, you might get 18 males and 22 females one time, then 21 males and 19 females the next.

        If you want generate a set of samples with values matching the given proportions exactly, for example exactly 20 males and 20 females, you need to click the Settings gear icon for the variable in the simulation view:

        This then lets you check the exact proportions setting.

        Now the values will always come out 50:50 - if you generate an even number of samples, of course.

        It also works for more complex proportions - if you had proportions (say) of 50%, 30% and 20%, then when you generate 10 samples, there will be 5, 3 and 2 samples with each value respectively.

        It also works when generating sample sets in chunks  - if you generate the 10 samples dataset (as above) as two runs of 5 samples (with new run turned off), then the exact proportions rule will still be obeyed.

        This only works for variables that don't depend on others
        If your simulation model is complex and there are connections that let one variable affect another, then the affected variables can't be set to exact mode (as this would prevent obeying the dependency).




        Was this article helpful?