Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS - Soft-MRS - that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.

Sacherschließung

lokal	Human behaviour
lokal	Computer science

Fächerklassifikation (DDC)

Medizin und Gesundheit |

Liste der Beteiligten

https://frl.publisso.de/adhoc/uri/SGF1cHRtYW5uLCBUb255|https://frl.publisso.de/adhoc/uri/RmVsbGVueiwgU29waGll|https://frl.publisso.de/adhoc/uri/TmF0aGFuLCBMYWtzYW4=|https://frl.publisso.de/adhoc/uri/VMO8c2NoZXIsIE9saXZlcg==|https://frl.publisso.de/adhoc/uri/S3JhbWVyLCBTdGVmYW4=

Label

frl:6472718

Förderer

Bundesministerium für Bildung und Forschung |

Fördernummer

031L0217A

Förderprogramm

DIASyM project

Dateien

Discriminative machine learning for maximal representative subsampling

Förderung

joinedFunding-child

1000	Förderer	Bundesministerium für Bildung und Forschung \|
1000	Förderprogramm	DIASyM project
1000	Fördernummer	031L0217A

Objektart

article

Beschrieben durch

1000	@id	frl:6472718.rdf
1000	Erstellt am	2023-12-14T08:16:56.474+0100
1000	Erstellt von	336
1000	beschreibt	frl:6472718
1000	Bearbeitet von	317
1000	Zuletzt bearbeitet	2023-12-18T08:35:38.733+0100
1000	Objekt bearb.	Mon Dec 18 08:35:26 CET 2023

Vgl.

frl:6472718

Oai Id

oai:frl.publisso.de:frl:6472718 |

Sichtbarkeit Metadaten

public

Sichtbarkeit Daten

public

Gegenstand von

https://repository.publisso.de/resource/frl:6472718/edit |

View source

Suche

Publisso-Menue

Discriminative machine learning for maximal representative subsampling

Hauptmann, Tony

Fellenz, Sophie

Nathan, Laksan

Tüscher, Oliver

Kramer, Stefan

Suche

Publisso-Menue

Sie sind hier

Discriminative machine learning for maximal representative subsampling

Hauptmann, Tony Fellenz, Sophie Nathan, Laksan Tüscher, Oliver Kramer, Stefan

Hauptmann, Tony

Fellenz, Sophie

Nathan, Laksan

Tüscher, Oliver

Kramer, Stefan