Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy
Machine-learning designs can fail when they attempt to make forecasts for people who were underrepresented in the datasets they were trained on.
For example, a model that anticipates the best treatment option for someone with a chronic disease may be trained utilizing a dataset that contains mainly male clients. That design may make incorrect predictions for female patients when released in a hospital.
To improve results, engineers can try stabilizing the training dataset by eliminating information points up until all subgroups are represented equally. While dataset balancing is promising, it frequently needs eliminating big amount of data, harming the model's overall performance.
MIT researchers developed a new technique that determines and gets rid of specific points in a training dataset that contribute most to a model's failures on minority subgroups. By removing far less datapoints than other approaches, this method maintains the total accuracy of the model while enhancing its efficiency concerning underrepresented groups.
In addition, the technique can recognize surprise sources of bias in a training dataset that does not have labels. Unlabeled information are much more prevalent than labeled information for lots of applications.
This approach could also be integrated with other approaches to enhance the fairness of machine-learning models released in high-stakes situations. For instance, it might sooner or later assist ensure underrepresented patients aren't misdiagnosed due to a prejudiced AI design.
"Many other algorithms that attempt to resolve this issue assume each datapoint matters as much as every other datapoint. In this paper, we are revealing that presumption is not real.
Machine-learning designs can fail when they attempt to make forecasts for people who were underrepresented in the datasets they were trained on.
For example, a model that anticipates the best treatment option for someone with a chronic disease may be trained utilizing a dataset that contains mainly male clients. That design may make incorrect predictions for female patients when released in a hospital.
To improve results, engineers can try stabilizing the training dataset by eliminating information points up until all subgroups are represented equally. While dataset balancing is promising, it frequently needs eliminating big amount of data, harming the model's overall performance.
MIT researchers developed a new technique that determines and gets rid of specific points in a training dataset that contribute most to a model's failures on minority subgroups. By removing far less datapoints than other approaches, this method maintains the total accuracy of the model while enhancing its efficiency concerning underrepresented groups.
In addition, the technique can recognize surprise sources of bias in a training dataset that does not have labels. Unlabeled information are much more prevalent than labeled information for lots of applications.
This approach could also be integrated with other approaches to enhance the fairness of machine-learning models released in high-stakes situations. For instance, it might sooner or later assist ensure underrepresented patients aren't misdiagnosed due to a prejudiced AI design.
"Many other algorithms that attempt to resolve this issue assume each datapoint matters as much as every other datapoint. In this paper, we are revealing that presumption is not real.