Vocabulary_ID=20
The following describes the procedure to update
the FDB ETC vocabulary (concepts, synonyms, relationships, ancestry) with the
latest release of the source.
The
load process is a complete refresh of data. All new vocabulary
entieties will be added. All vocabulary entities that do not exist in
the latest files set will be marked as deprecated.
The concept hierarchy is defined in OMOP for FDB ETC concepts that are part of the following:
– Enhanced Therapeutic Classification
1.1 Scripts Transfer
All scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/FDB_ETC).
Script Name |
Description |
20_create_schema.sql |
Create prestage-stage schema with tables and indexes
|
20_load_to_prestage.sh
| Load RxNorm row-data |
20_transform_row_concepts.sql
|
Convert and store in stage table concept_stage |
20_transform_row_relations.sql |
Convert and store in stage table concept_relationship_stage |
20_load_concepts.sql |
Load only new concepts into concept table, add invalid code information |
20_load_relations.sql |
Load only new relationships into relationship table, add invalid code information. |
20_load_to_prestage_ETC_ID.ctl | Concept Descriptons control file |
20_load_to_prestage_ETC_SEQ.ctl | Concept SEQNO control file |
1.2 Download data from source
1.2.1 Download data from FDB source
The most current international release is
located at the following URL: http://www.fdbhealth.com/
Download the latest release file (e.g.FDB_20120815.zip)
Transfer downloaded zip file to /data/FDB_ETC directory.
Zip file suffix contains release date (e.g. 07022012=02-Jul-2012)
Latest production zip file is ~2MB in size
1.2.2 Download data from RxNorm source
The most current international release is
located at the following URL: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html
Historical archives
located at: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormarchive.html
Note: Make sure you perform updates in chronological order. So, if you want to
process a January and April release, execute the below steps for January, and
after successful completion, for April.
Download the latest release file (e.g. RxNorm_full_07022012.zip)
Transfer downloaded zip file to /data/FDB_ETC directory.
Zip file suffix contains release date (e.g. 07022012=02-Jul-2012)
Latest production zip file is ~136MB in size
Note:
For the rest of the document we will be referring to current schema as FDB_ETC_20120702 you would have to adjust
this entry to correspond to current release date file
1.3 Build Database Schema
All tables will be located in the schema FDB_ETC_<DateOfSourceFile> as following: FDB_ETC_20120131
The following steps will create a schema and
will grant to a schema owner appropriate permissions.
In addition to the schema, it will create tables to
load raw data: RXNCONSO, RETCTBL0_ETC_ID and RETCGC0_ETC_GCNSEQNO as well as staging tables:
CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE,
CONCEPT_RELATIONSHIP_STAGE,
SOURCE_TO_CONCEPT_MAP_STAGE
$ sqlplus System/<SystemPass>@DEV_VOCAB @20_create_schema.sql <FBD_ETC_Schema> <Pass_FDB_ETC_Schema>
Format of FBD_ETC_Schema should be FBD_ETC_YYYYMMDD (e.g. FBD_ETC_20120815)
1.4 Raw data load into Oracle tables
1.4.1 Extract files from the zip
archive into the current directory
$ unzip -u RxNorm_full_07022012.zip -d rxnorm_20120702
1.4.2 Verify that the following 3 files have been created:
1.4.2.1 RxNorm raw data
RXNCONSO.RRF (Latest file size ~96MB)
1.4.2.2 FDB raw data
RETCTBL0_ETC_ID.rrf ( Search it in FDB source )
RETCGC0_ETC_GCNSEQNO.rrf
1.4.3.
Load raw data into
Oracle.
1.4.3.1
Load RxNorm raw data into
Oracle.
This sqlldr process will run for about 2min.
$ ./20_load_to_prestage.sh /data/FDB_ETC/rxnorm_20120702 <FDB_ETC_Schema>/<Pass_FDB_ETC_Realise_Schema>
1.4.3.2
Load FDB raw data into
Oracle.
Each sqlldr process will run for about 1min.
$ sqlldr FDB_ETC_20120815/myPass@DEV_VOCAB control=20_load_to_prestage_ETC_ID.ctl
$ sqlldr FDB_ETC_20120815/myPass@DEV_VOCAB control=20_load_to_prestage_ETC_SEQ.ctl
1.4.3.3 Verify that files with extension .bad are empty, no records have been rejected.
1.4.3.4 Copy data and intermediate files to backup area. In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.
$ cp -Ru /data/FDB_ETC/*.zip /data/backup_area ; cp -Ru /data/FDB_ETC/*.log /data/backup_area
1.4.3.5 Verify that number of records loaded is equivalent to prior production load
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT count(*) FROM RXNCONSO; |
Current production row count is: 1,036,726 Rows
SELECT count(*) FROM RETCTBL0_ETC_ID; |
Current production row count is: 2,601 Rows
SELECT count(*) FROM RETCGC0_ETC_GCNSEQNO; |
Current production row count is: 29,846 Rows
1.5 Loading Staging Tables from raw
1.5.1 Convert and store in
staging table concepts
Loaded from raw staged data RETCTBL0_ETC_ID&RXNCONSO into the staged data CONCEPT_STAGE
In this step we will perform substitute GUID
identifiers into numerical format.
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB @20_transform_row_concepts.sql
1.5.2 Verify that number of records loaded is equivalent to prior production load
In this step we will verify that number of active records in staging
tables is valid prior to transfer data to historical DEV schema.
a. Number of Records in stage table
b. Number of records in DEV schema not deleted (active, prior load production)
c. How many records would be added to DEV schema table
d. How many active DEV records will be marked for deletion
In most cases number of records in stage (a) should be greater than number of active records in production (b)
In rare occasion that might not be the case. It is responsibility of
the developer running the scripts to verify that that is the the valid.
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 20 UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 20 AND NVL (d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 20 AND NOT EXISTS (SELECT 1 FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 20 AND c.CONCEPT_CODE = D.CONCEPT_CODE) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 20 AND NVL (d.INVALID_REASON, 'X') <> 'D' AND NOT EXISTS (SELECT 1 FROM CONCEPT_STAGE c WHERE c.VOCABULARY_ID = 20 AND c.CONCEPT_CODE = D.CONCEPT_CODE);
|
Current Result:
- Num Rec in stage 2601
- Num Rec in DEV not deleted 2489
- How many records would be new in DEV added 112
- How many DEV active will be marked for deletion 5
1.5.3 Load new concepts into DEV schema concept table,mark deprecated concepts as deleted.
Valid start date, valid end date and invalid
reason are left empty for now.
Transfer records from CONCEPT_STAGE to DEV.CONCEPT table
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB @20_load_concepts.sql
1.5.4 Load new concept relationships stage, identify invalid codes
information
Loaded from the raw staged data RXNREL into the staged data CONCEPT_RELATIONSHIP_STAGE,
identify invalid code information
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB @20_transform_row_relations.sql
1.5.5 Verify that number of records loaded is
equivalent to prior production load
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM CONCEPT_RELATIONSHIP_STAGE c UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.CONCEPT_RELATIONSHIP d WHERE EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN (08, 20) AND c.CONCEPT_ID = CONCEPT_ID_1) AND EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN (08, 20) AND c.CONCEPT_ID = CONCEPT_ID_2) AND NVL (d.INVALID_REASON_CD, 'X') <> 'D' AND D.RELATIONSHIP_ID IN (010, 130, 144, 244) UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM CONCEPT_RELATIONSHIP_STAGE c WHERE NOT EXISTS (SELECT 1 FROM DEV.CONCEPT_RELATIONSHIP d WHERE c.CONCEPT_ID_1 = d.CONCEPT_ID_1 AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2 AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.CONCEPT_RELATIONSHIP d WHERE EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN (08, 20) AND c.CONCEPT_ID = CONCEPT_ID_1) AND EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN (08, 20) AND c.CONCEPT_ID = CONCEPT_ID_2) AND NVL (d.INVALID_REASON_CD, 'X') <> 'D' AND NOT EXISTS (SELECT 1 FROM CONCEPT_RELATIONSHIP_STAGE c WHERE c.CONCEPT_ID_1 = d.CONCEPT_ID_1 AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2 AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID) AND D.RELATIONSHIP_ID IN (010, 130, 144, 244);
|
Current Result:
- Num Rec in stage
|
69284
|
- Num Rec in DEV
not deleted
|
69284
|
- How many
records would be new in DEV added
|
0
|
- How many DEV
active will be marked for deletion
|
0
|
1.5.6
Load new concept relationships into DEV schema,mark deprecated concept relatuionships as deleted
Transfer records from CONCEPT_RELATIONSHIP_STAGE to DEV.CONCEPT_RELATIONSHIP table
$ sqlplus FDB_ETC_20120815/myPass@DEV_VOCAB @20_load_relations.sql