A Data Synthesis Framework for Few-Shot Biomedical Relation Extraction via Semantic Deconstruction and Re-synthesis

Authors

  • Hai Zhu School of Business, Nanfang College Guangzhou, Guangzhou, Guangdong, China

DOI:

https://doi.org/10.53469/jrse.2025.07(12).04

Keywords:

Data Synthesis, Large Language Models, Relation Extraction, Semantic Deconstruction, In-process Verification, Drug-Drug Interaction

Abstract

In the field of Biomedical Natural Language Processing (BioNLP), particularly in Drug-Drug Interaction (DDI) relation extraction tasks, the development of high-performance deep learning models has long been constrained by the severe scarcity of high-quality annotated data. Although Large Language Models (LLMs) have demonstrated exceptional capabilities in text generation and understanding, directly applying them to data synthesis in specialized domains often faces severe challenges, including frequent “factual hallucinations,” monotonous logical structures, and privacy leakage. Existing data augmentation methods, such as simple synonym replacement or two-stage prompting approaches based on the “generate-filter” paradigm, alleviate the data scarcity issue to some extent but fail to guarantee the logical self-consistency and structural diversity of synthetic data from the source. To overcome this bottleneck, this paper proposes a novel “Semantic Deconstruction and Re-synthesis” (SDR) data synthesis framework. Unlike traditional black-box generation modes, SDR adopts a white-box methodology characterized by “Deconstruction-Planning-Reconstruction.” First, a small number of seed samples are deconstructed into fine-grained semantic fragments to construct an extensible semantic fragment repository. Subsequently, an LLM acts as a “Knowledge Architect” to logically recombine these fragments. Crucially, we introduce an In-process Verification mechanism to double-check the abstract factual skeleton before text generation, thereby blocking the propagation of logical fallacies. Extensive experiments on the DDI Corpus dataset demonstrate that a local model (BioBERT) fine-tuned with SDR-synthesized data significantly outperforms existing methods in 1-shot, 5-shot, and 10-shot settings. Notably, in the 10-shot scenario, SDR achieves an F1 score of 60.43%, demonstrating the framework’s superior effectiveness and generalization capabilities in low-resource scenarios.

Downloads

Published

2025-12-30

How to Cite

Zhu, H. (2025). A Data Synthesis Framework for Few-Shot Biomedical Relation Extraction via Semantic Deconstruction and Re-synthesis. Journal of Research in Science and Engineering, 7(12), 20–24. https://doi.org/10.53469/jrse.2025.07(12).04

Issue

Section

Articles

Deprecated: json_decode(): Passing null to parameter #1 ($json) of type string is deprecated in /www/bryanhousepub/ojs/plugins/generic/citations/CitationsPlugin.inc.php on line 49