Natural Language Processing - Based Structured Data Extraction from Unstructured Clinical Notes


  • Arjit Amol More Sipna College of Engineering and Technology, Amravati, Maharashtra, India



Electronic health records, Unstructured Clinical Notes, Natural Language Processing, patient data extraction, SpaCy, healthcare informatics


Electronic Health Records (EHRs) are pivotal in modern healthcare, housing a treasure trove of patient information. They are real-time, patient-centered records that make information available instantly and securely to authorized users. However, a substantial portion of this data resides in unstructured clinical notes, presenting significant challenges for data extraction and utilization. This research paper investigates the issues posed by unstructured clinical notes application of Natural Language Processing (NLP) techniques in the healthcare sector to extract structured patient data from unstructured clinical notes. By utilizing NLP algorithms, healthcare institutions can unlock invaluable insights within EHRs, leading to improved patient care, clinical research, and administrative efficiency. This paper addresses various NLP approaches, the implementation of pre-trained SpaCy and Med7Modelfor extracting structural data, and the potential for future advancements in this critical area of healthcare informatics.


Beasley JW, Holden RJ, Sullivan F. Electronic health records: research into design and implementation. Br J Gen Pract. 2011 Oct;61(591):604-5. doi: 10.3399/bjgp11X601244. PMID: 22152827; PMCID: PMC3177114. 14/

Su, Yu-Hsiang and Chao, Ching-Ping and Hung, Ling- Chien and Sung, Sheng-Feng and Lee, Pei-Ju,” A Natural Language Processing Approach to Automated Highlighting of New Information in Clinical Notes”,Applied Sciences,2020, doi = 10.3390/app10082824 language-processing-5c7b3d17e137 ng-medical-information-from-clinical-text-with-nlp/

V. G, H. R and J. Hareesh, "Relation Extraction in Clinical Text using NLP Based Regular Expressions," 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, India, 2019, pp. 1278-1282, doi: 10.1109/ICICICT46008.2019.8993274.

D. Wang, J. Su and H. Yu, "Feature Extraction and Analysis of Natural Language Processing for Deep Learning English Language," in IEEE Access, vol. 8, pp. 46335-46345, 2020, doi: 10.1109/ACCESS.2020.2974101.

A. Q. Mahlawi and S. Sasi, "Structured data extraction from emails," 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), Thiruvananthapuram, India, 2017, pp. 323-328, doi: 10.1109/NETACT.2017.8076789.

H. Déjean, "Extracting structured data from unstructured document with incomplete resources," 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015, pp. 271-275, doi: 10.1109/ICDAR.2015.7333766.

D. Kumar, S. Pandey, P. Patel, K. Choudhari, A. Hajare and S. Jante, "Generalized Named Entity Recognition Framework," 2021 Asian Conference on Innovation in Technology (ASIANCON), PUNE, India, 2021, pp. 1-4, doi: 10.1109/ASIANCON51346.2021.9544652.

Varun Reji , Srikanth Reddy N , Umar Shariff,”INFORMATION EXTRACTION USING NATURAL LANGUAGE PROCESSING”,International Journal of Scientific Research in Engineering and Management (IJSREM),Vol. 06,2022,doi:10.55041/IJSREM13271.




How to Cite

More, A. A. (2024). Natural Language Processing - Based Structured Data Extraction from Unstructured Clinical Notes. Journal of Contemporary Medical Practice, 6(8), 327–330.