What is the Horizon Scanning Database?
The Horizon Scanning Database identifies and tracks new pharmaceuticals expected to enter the international market until market entry has been achieved. (A topic is hereafter defined as a specific pharmaceutical substance indicated for a specific disease or condition comprising a specific delivery route and pharmaceutical form developed or manufactured by a specific company.)
What are the components of the Horizon Scanning Database?
Once complete, the system will consist, at a high level, of three logical technology layers corresponding to functional themes popularised by best practices in the data engineering world:
- Ingest
- Store/Explore/Process
- Serve/Distribute
An illustration of the components in more detail, showing how they will facilitate reporting, search, content analysis and the production of IHSI High Impact Reports is as follows:
Harvesting and Ingestion: The Automated Data Collection and Curation Process
Initial data loading will be accomplished through a combination of a sophisticated automated data collection and curation process (ADCCP: see ADCCP Data Harvesting and Ingestion below for details), human data collection, and curation processes.
The ECRI ADCCP includes regular automated data collection via RSS feeds, web scraping, and API calls that will be persisted to a data lake. ECRI data scientists and specialists will use powerful data virtualisation, search, and analytics to curate the ingestion processing. This ongoing work stream will continually review, improve, and refine the ingestion of data from harvested trials, news releases, news reports, and other sources captured on a regular basis. This ingestion curation work will be facilitated by business rules and dictionaries maintained by the ECRI data specialists. The output of the ingestion curation team will be data pipelines that apply normalisation, categorisation, tagging, and taxonomies on top of the records that are streamed to the staging warehouse.
Staging Warehouse: Reporting, Search, Content Analysis, and High Impact Report Production
The staging warehouse will be the set of primary relational databases backing the reporting and analysis done internally by horizon scanning analysts, as well as the data that are delivered to users by the Horizon Scanning Database website.
The horizon scanning analysts and research assistants will rely on the following capabilities to streamline and facilitate rapid and timely identification of candidate records for the expert medical stakeholder review:
- Robust search across the staging warehouse, facilitated by NLP
- A system to add metadata and tag, in either an automated or a manual fashion, records within the staging warehouse
- Email, push messaging, and integrations with workflow automation software such as Microsoft SharePoint and Teams
The analysts and team will use the staging warehouse to identify topics for topic profile drafting and subsequent stakeholder review and HIR topic selection.