THIS ARTICLE/PRESS RELEASE IS PAID FOR AND PRESENTED BY the Norwegian centre for E-health research - read more

The use of unstructured data has not been fully explored and exploited, but this type of data has enormous potential.

Unstructured data in health records can provide better health treatment

Most health data for patients is unstructured data such as doctor's notes, emails and medical images. New analysis methods will be good for both treatment and health.

Electronic health record (EHR) are healthcare professionals' most important work tools for quick, easy and secure access to necessary information about the patient, regardless of where the patient becomes ill or receives treatment.

All relevant and necessary information about the patient must be documented in the medical record.

Challenges and opportunities in use

“Unstructured data can contribute to better treatments, says senior researcher Maryam Tayefi at the Norwegian Centre for E-health Research.”

Seven researchers at the Norwegian Centre for E-health Research have analysed and looked at challenges and opportunities when using all available data in the electronic health records.

They have examined the topic thoroughly and suggested new areas for research in an advanced review published in the scientific journal Wire's Computational Statistics.

They have particularly looked at how statisticians and other scientists, such as computer researchers and clinicians, can contribute to this important field.

Structured and unstructured data

Electronic health records contain a lot of valuable information about individual patients and the population.

Health data are classified as either structured or unstructured.

Structured health data is standardized and can easily be transferred between health information systems. For example, the patient's name, date of birth, or blood test result may be recorded in a structured data format.

Unstructured health data is not standardized and accounts for as much as 80 percent of all health data for a patient.

Such data can be e-mails, voice recordings, doctor's notes, the patient's symptoms or signs of illness, radiology and pathology reports, epicrisis, family history, medical images and X-rays that doctors and other health personnel store in electronic health records.

Time consuming to analyse

In other words, unstructured data does not follow a given structure and has until recently been challenging to analyse. The process of analysing is complex, time consuming and often require disproportionate manual effort.

However, advances in data storage capacity, computing power and machine learning make it increasingly easier to retrieve information and analyse unstructured data. Thus, the opportunities to utilize these are increasing.

Adopt new methods

Among unstructured data, clinical text and medical images are the two most common and important sources of information.

Advanced statistical methods in natural language analysis, machine learning, deep learning and conversion of digital images to data, have increasingly been used to analyse clinical text and images.

“Natural language analysis and machine learning are used to the greatest extent by researchers, but also clinicians, patients and care providers will benefit greatly from the technology to make full use of unstructured data,” says senior researcher Maryam Tayefi at the Department of Health Analytics at the Norwegian Centre for E-health Research.

Better data for better health

“Unstructured data contains a lot of valuable information. Such data can contribute to better treatment and make it easier to use decision support systems that clinicians, patients and healthcare professionals can greatly benefit from,” says Tayefi.

She says that larger amounts of data require more time and resources, but this will open up more opportunities for researchers and experts in the field to ultimately improve the health of the population.

The combination of large amounts of varied data, computing power and better methods has also made great strides in the processing of unstructured health data. It enables fast and automated production of machine learning algorithms that can analyse complex data with accurate results.

Protecting patients' privacy

There are still many open questions for researchers regarding the use of unstructured data that must be examined before this information can be used effectively in decision support tools for both patients and healthcare professionals.

Structured and unstructured data are stored in many different systems in different formats.

In the advanced review, the researchers ask the following questions, among others:

  • How can we simplify the process of unstructured data analysis and reduce manual effort?
  • How do we protect patients' privacy while improving the quality and availability of unstructured data?

“To answer this, we must look more closely at different methods for maintaining patients' privacy. New methods that can combine unstructured and structured data are expected to be extremely important in future innovations. To improve data availability, it will be crucial to have methods for extracting information from electronic health records and self-collected data,” says Tayefi.

Inventing new tools

Although there are many challenges that are not yet fully resolved and that may prevent the use of unstructured data, it is still possible to design useful diagnostic and decision support tools that include all available data.

“Several attempts have been made to extract important data information from electronic health records using machine learning and statistical methods. There is a good possibility that such methods will lead to useful decision support tools and recommendation systems for both patients and healthcare professionals. In the future, such tools will very likely be an integrated part of the care general practitioners and hospitals can offer,” Tayefi believes.

Research on unstructured data alone is a new field for many statisticians, so researchers with a background in machine learning and statistics will be crucial in developing new methods that can be used in the near future.

“The underlying idea is to extract information from all kinds of unstructured data, but it makes sense to start with each source separately.”

She says that the development of systems containing reliable calculation of uncertainty are important for statisticians. When it comes to health apps and programs, this is of particular interest since users must have a clear idea of the uncertainty of a proposed decision.

“In order to preserve the privacy of the patient, an alternative would be to examine a group of 'equal patients'. Another method could be to use synthetic data that is artificially created or created by algorithms, rather than actual measurements,” Tayefi concludes.


Maryam Tayefi Challenges and opportunities beyond structured data in analysis of electronic health records. WIREs Comput Stat, 2021.


Read the Norwegian version of this article at

Powered by Labrador CMS