Medical Coding in Clinical Trials

Written by
Florentin Ory
Published on
May 6, 2026

Every verbatim term an investigator enters in your eCRF eventually lands on a coder's desk. In a mid-size multinational trial, that queue grows fast. Industry benchmarks consistently show that 20 to 30% of verbatim terms require manual review, even when auto-coding is enabled. That is the standing workload your data management team absorbs before the database can lock.

Medical coding converts those free-text entries into standardized codes using internationally recognized dictionaries. Adverse events go through MedDRA. Concomitant medications go through WHODrug. Without that mapping, "headache," "cephalalgia," and "head pain" remain three separate records instead of one. Regulators see gaps. Safety teams miss signals. Biostatisticians get noise.

This article explains how medical coding works, where it creates friction, and what changes when it runs inside your EDC.

What Is Medical Coding in Clinical Trials?

Medical coding maps investigator-reported verbatim terms to hierarchical dictionary entries, creating structured data that can be analyzed, compared, and submitted across sites, studies, and regulatory agencies.

The two standard dictionaries in clinical research:

  • MedDRA (Medical Dictionary for Regulatory Activities): covers adverse events, medical history, and indications. MedDRA organizes terms across five levels, from Low-Level Terms (LLT) up through Preferred Terms (PT), High Level Terms (HLT), High Level Group Terms (HLGT), and System Organ Classes (SOC). Most day-to-day coding happens at the PT level. MedDRA publishes two new versions per year, in March and September.
  • WHODrug (WHO Drug Dictionary): covers concomitant and prior medications, mapped to ATC classification codes. Like MedDRA, WHODrug follows a semi-annual release schedule: March and September.

Most modern EDC systems include coding as a built-in capability, though the depth of that integration varies significantly.

Why Medical Coding Cannot Be an Afterthought

When coding falls behind, the consequences reach beyond the data management team. Four areas take the hit directly:

  • Regulatory submissions. Regulatory submissions to the FDA and EMA require coded adverse event and medication data. Inconsistent or missing codes can delay or block approvals. There is no workaround at submission.
  • Data consistency. MedDRA maps "headache," "cephalalgia," and "head pain" to a single Preferred Term. Without that, your adverse event analysis overstates the number of distinct events and understates the frequency of any one term.
  • Safety signal detection. Pharmacovigilance teams identify patterns in adverse events across studies and populations, including real-world evidence studies. Uncoded or miscoded data breaks the signal before it can be detected.
  • Database lock timeline. A backlog of unresolved verbatim terms is one of the most common causes of database lock delays. Each week your coding queue stays open is a week added to your submission timeline.
Looking for a deeper dive into EDC features? Read the guide

How the Coding Workflow Runs

The process starts when an investigator enters a verbatim term in the eCRF. What happens next determines how much of your team's time goes to coding.

The EDC system attempts to match the verbatim term against dictionary entries using predefined rules and synonym tables. Terms that match go through automatically. The 20 to 30% that do not match queue for manual review: ambiguous descriptions, typos, site-specific abbreviations, entries in multiple languages.

A trained coder searches the relevant dictionary, selects the appropriate MedDRA term at PT level, and submits the code for approval. A data manager reviews and locks the entry as part of the data cleaning process. The more of that sequence that runs inside a single platform, the fewer handoffs your team manages between capture and lock.

Where Traditional Coding Creates Friction

The 20 to 30% manual review rate is not just a volume problem. Four structural factors keep it elevated across the life of a study.

  • Language and site variability. A decentralized trial with 40 sites across 10 countries generates verbatim terms in English, French, Spanish, and German. Each site brings its own conventions and abbreviations. Every term the system does not recognize goes to the manual queue.
  • Dictionary version cycles. MedDRA and WHODrug each publish two new versions per year, in March and September. Each release can require partial re-coding of existing terms. When that process runs outside the EDC, it creates a separate reconciliation task.
  • Disconnected tools. When coding happens in a system separate from data capture, your team exports terms, codes them, and imports the results back into the EDC. Each handoff is a delay and a potential discrepancy. An EDC with built-in coding removes that gap.
  • Static synonym tables. Auto-coding runs against a fixed set of recognized synonyms. If synonym tables are not updated as the study generates new terms, the manual queue stays large from first patient through last visit.

Medical Coding with Datacapt

Datacapt integrates medical coding directly into its EDC. Coders work in the same environment where data is captured. No exports, no parallel system, no reconciliation step between tools.

In practice, that changes the day-to-day workload for your coding team:

  • Auto-coding runs as data enters. When an investigator submits a verbatim term, the system searches MedDRA or WHODrug immediately. Terms that match go to review without manual intervention. Your coders focus on what the system cannot resolve.
  • Unmatched terms stay in context. Coders search, select, and approve terms in the platform where the original eCRF entry sits. No exports, no toggling between systems.
  • Synonym tables grow across studies. Each term your team codes manually becomes a candidate for the synonym list. Over time, your auto-coding rate improves across studies, not just within one.
  • Dictionary updates are managed in the platform. March and September version releases for both MedDRA and WHODrug are handled without exports or re-import. Your team applies the update, reviews affected terms, and moves on.
  • Every coding action is tracked. The audit trail captures who coded what, when, and against which dictionary version, covering the key 21 CFR Part 11 requirements: audit trail, access controls, and electronic signatures for records used in regulatory submissions.

Datacapt pairs medical coding with ePRO, eConsent, and randomization in one clinical trial platform. The data your coders resolve is the same data your biostatisticians analyze. No gap between capture and submission.

Datacapt

Let's Shape the Future of Clinical Trial Together!

Florentin Ory
CEO & Co-Founder

Florentin combines clinical research know-how with a true passion for product design. Attentive to detail and obsessed with user experience, he ensures that Datacapt remains a high-performance platform that’s also intuitive and accessible to every user.

Blog & News Datacapt

News, Articles, Resources et Tutorials.