November 9, 2021

From Documents to Datasets: Evolving Information Exchange in Drug Development

Regulatory Professional readers who recognise the following image can surely appreciate just how far we have come in the last 25 years.

Let’s recall the dark old days when major marketing applications for local affiliates arrived from head office ready-printed, on paper, literally on the back of a truck.

From my own experience, working in the Australian subsidiary of one of the largest global pharma companies, adjustments to meet local requirements would inevitably need to be made. This meant a large (or very large) repagination effort of the dossier, using the above torture device.

Eventually, all was well, all was complete, and the submission was sent off to the Therapeutic Goods Administration. On another truck.

Figure 1: A numbering machine

Presently, we remain heavily reliant on documentation to communicate and exchange information around drug development. Slowly but surely, however, we are moving inextricably away from documents and towards data.


The volume and complexity of submissions are increasing exponentially. For example – there are a couple of orders of magnitude differences in the number of data points in XEVMPD and the complete ISO IDMP schema.

With the preparation for IDMP, a lot of effort over the last several years has involved finding the source of truth for a great many additional pieces of information required in this new standard.

Each company has had to identify which of their internal departments owns the information, and the format in which that information currently resides. Often, the information needed sits in a dusty warehouse, in an archive. On paper.

Varying proportions of manual work and technology, including optical character recognition and artificial intelligence (e.g., natural language processing, machine learning) have been deployed, with varying degrees of success, to extract structured data from the unstructured data in documents.

Structured content authoring, where existing data is used to create the still-required documents of today, might perhaps be viewed in the future as a ‘holding technology’ which eases current outdated document shuffling, not unlike my digital photocopier of twenty years ago.


“Ongoing initiatives to transform regulatory data exchange will drive improved efficiencies and transparency while reducing costs and review times, bringing new medicines to patients sooner.”

– Sandra Vignes, Technical Solutions Director, Calyx

Certainly, because documents remain important today, it makes sense to invest in the best DMS to meet your needs, knowing that it will become less important in regulatory affairs as time moves on.

The drive from documents to data is unrelenting. We are likely to see the end of the dossier in the coming years.

Projects are already underway to transform the way regulatory data is exchanged, to bring indirect data interchange via the cloud, between marketing authorization holders and regulatory agencies. This will drive efficiencies and cost savings, better transparency, and reduced review times, bringing new medicines to patients sooner.

As has been already the trend, datasets will become larger and larger. This combined with more global standardization, like multi-regional IDMP adoption, means safety signals will be generated earlier. Regional interoperability, such as cooperation on international prescriptions, and against drug counterfeiting, will be very much enhanced.

Combined with artificial intelligence tools, such as the Alphafold project by Deepmind, the utilization of data will continue to expand. Solutions to one challenge will open opportunities to solve many more.

Fasten your seat belts.

Stay Updated

Never miss a beat. Sign up to receive emails covering industry news and useful content to help you advance clinical development.

  • This field is for validation purposes and should be left unchanged.