Initial data loading will be accomplished through a combination of a sophisticated automated data collection and curation process (ADCCP: see ADCCP Data Harvesting and Ingestion below for details), human data collection, and curation processes.
The ECRI ADCCP includes regular automated data collection via RSS feeds, web scraping, and API calls that will be persisted to a data lake. ECRI data scientists and specialists will use powerful data virtualisation, search, and analytics to curate the ingestion processing. This ongoing work stream will continually review, improve, and refine the ingestion of data from harvested trials, news releases, news reports, and other sources captured on a regular basis. This ingestion curation work will be facilitated by business rules and dictionaries maintained by the ECRI data specialists. The output of the ingestion curation team will be data pipelines that apply normalisation, categorisation, tagging, and taxonomies on top of the records that are streamed to the staging warehouse.