Vocabulary_ID=28 The following describes the procedure to update the VA Product vocabulary (concepts, synonyms, relationships, ancestry) with the latest release of the source. The load process is a complete refresh of data. All new vocabulary entieties will be added. All vocabulary entities that do not exist in the latest files set will be marked as deprecated.
1.1 Scripts TransferAll scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/VAPROD).
Script Name | Description | 28_create_schema.sql | Create prestage-stage schema with tables and indexes
| 28_load_to_prestage.sh | Load RxNorm row-data | 28_transform_row_concepts.sql | Convert and store in stage table concept_stage | 28_load_concepts.sql | Load only new concepts into concept table, add invalid code information | 28_transform_row_relations.sql
| Convert and store in stage table concept_relationship_stage
| 28_load_relations.sql
| Load only new relationships into relationship table, add invalid code information.
| 28_transform_row_maps.sql
| Convert and store in stage table SOURCE_TO_CONCEPT_MAP_STAGE | 28_load_maps.sql | Load only new maps into SOURCE_TO_CONCEPT_MAP table, add invalid code information |
1.2 Download data from sourceThe most current international release is located at the following URL: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html Historical archives located at: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormarchive.html Note: Make sure you perform updates in chronological order. So, if you want to process a January and April release, execute the below steps for January, and after successful completion, for April. Download the latest release file (e.g. RxNorm_full_08062012.zip) Transfer downloaded zip file to /data/VAPROD directory. Zip file suffix contains release date (e.g. 08062012=06-Aug-2012) Latest production zip file is ~131MB in size Note: For the rest of the document we will be referring to current schema as VAPROD_20120806 you would have to adjust this entry to correspond to current release date file
1.3 Build Database SchemaAll tables will be located in the schema SNOMED_<DateOfSourceFile> as following: VAPROD_20120131
The following steps will create a schema and
will grant to a schema owner appropriate permissions.
In addition to the schema, it will create tables to
load raw data: RXNATOMARCHIVE,RXNCONSO,RXNCUI,RXNCUICHANGES,RXNDOC,RXNREL,RXNSAB,RXNSAT and RXNSTY as well as staging tables: CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE, CONCEPT_RELATIONSHIP_STAGE, SOURCE_TO_CONCEPT_MAP_STAGE $ sqlplus System/<SystemPass>@DEV_VOCAB @28_create_schema.sql <VAPROD_Schema> <Pass_VAPROD_Schema>
Format of VAPROD_Schema should be VAPROD_YYYYMMDD (e.g. VAPROD_20120806)
Staging tables match Vocabulary schema version 4.0 with slight modification. Modifications made to the tables are: CONCEPT_STAGE.CONCEPT_ID is nullable CONCEPT_RELATIONSHIP_STAGE .REL_ID is nullable
1.4 Raw data load into Oracle tables1.4.1 Extract files from the zip archive into the current directory $ unzip -u RxNorm_full_08062012.zip -d rxnorm_20120806
1.4.2 Verify that the following 2 files have been created: RXNCONSO.RRF (Latest file size ~96MB) RXNATOMARCHIVE.RRF (Latest file size ~42MB) RXNCUI.RRF (Latest file size ~1MB) RXNCUICHANGES.RRF (Latest file size ~1MB) RXNDOC.RRF (Latest file size ~1MB) RXNREL.RRF (Latest file size ~230MB) RXNSAB.RRF (Latest file size ~8KB) RXNSAT.RRF (Latest file size ~385MB) RXNSTY.RRF (Latest file size ~15MB)
1.4.4. Load raw data into Oracle. This sqlldr process will run for about 2min. $ ./28_load_to_prestage.sh /data/VAPROD/rxnorm_20120806 <VAPROD_Schema>/<Pass_VAPROD_Schema>
1.4.4.1 Verify that files with extension .bad are empty, no records have been rejected.
1.4.4.2 Copy data and intermediate files to backup area. In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.
$ cp -Ru /data/VAPROD/*.zip /data/backup_area ; cp -Ru /data/VAPROD/*.log /data/backup_area 1.4.4.3 Verify that number of records loaded is equivalent to prior production load
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB Execute the following SQL commands: SELECT count(*) FROM RXNCONSO; | Current production row count is: 979,203 Rows
SELECT count(*) FROM RXNREL;
| Current production row count is: 3,485,382 Rows
1.5 Loading Staging Tables from raw
1.5.1 Convert and store in staging table concepts Loaded from raw staged data RXNCONSO into the staged data CONCEPT_STAGE In this step we will perform substitute GUID identifiers into numerical format.
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB @28_transform_row_concepts.sql
1.5.2 Verify that number of records loaded is equivalent to prior production load In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema.
a. Number of Records in stage table
b. Number of records in DEV schema not deleted (active, prior load production)
c. How many records would be added to DEV schema table
d. How many active DEV records will be marked for deletion In most cases number of records in stage (a) should be greater than number of active records in production (b) In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 28 UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 28 AND NVL (d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 28 AND NOT EXISTS (SELECT 1 FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 28 AND c.CONCEPT_CODE = D.CONCEPT_CODE) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 28 AND NVL (d.INVALID_REASON, 'X') <> 'D' AND NOT EXISTS (SELECT 1 FROM CONCEPT_STAGE c WHERE c.VOCABULARY_ID = 28 AND c.CONCEPT_CODE = D.CONCEPT_CODE);
|
Current Result:
- Num Rec in stage 16344
- Num Rec in DEV not deleted 16195
- How many records would be new in DEV added 149
- How many DEV active will be marked for deletion 0
1.5.3 Load new concepts into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now. Transfer records from CONCEPT_STAGE to DEV.CONCEPT table
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB @28_load_concepts.sql
1.5.4 Load new concept relationships stage, identify invalid codes information
Loaded from the raw staged data RXNREL into the staged data CONCEPT_RELATIONSHIP_STAGE, identify invalid code information
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB @28_transform_row_relations.sql
1.5.5 Verify that number of records loaded is equivalent to prior production load
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM CONCEPT_RELATIONSHIP_STAGE d WHERE EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN ( 28) AND c.CONCEPT_ID = CONCEPT_ID_1) UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.CONCEPT_RELATIONSHIP d WHERE EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN ( 28) AND c.CONCEPT_ID = CONCEPT_ID_1) AND NVL(d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM CONCEPT_RELATIONSHIP_STAGE c WHERE NOT EXISTS (SELECT 1 FROM DEV.CONCEPT_RELATIONSHIP d WHERE c.CONCEPT_ID_1 = d.CONCEPT_ID_1 AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2 AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID) AND EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN ( 28) AND c.CONCEPT_ID = CONCEPT_ID_1) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.CONCEPT_RELATIONSHIP d WHERE EXISTS (SELECT 1 FROM DEV.CONCEPT c WHERE c.VOCABULARY_ID IN ( 28) AND c.CONCEPT_ID = CONCEPT_ID_1) AND NVL(d.INVALID_REASON, 'X') <> 'D' AND NOT EXISTS (SELECT 1 FROM CONCEPT_RELATIONSHIP_STAGE c WHERE c.CONCEPT_ID_1 = d.CONCEPT_ID_1 AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2 AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID);
|
Current Result:
- Num Rec in stage 149867
- Num Rec in DEV not deleted 147175
- How many records would be new in DEV added 1357
- How many DEV active will be marked for deletion 100
1.5.6 Load new concept relationships into DEV schema,mark deprecated concept relatuionships as deleted Transfer records from CONCEPT_RELATIONSHIP_STAGE to DEV.CONCEPT_RELATIONSHIP table $ sqlplus VAPROD_20120806/myPass@DEV_VOCAB @28_load_relations.sql
1.5.7 Convert and store in staging table maps Loaded from raw staged data RXNCONSO into the staged data SOURCE_TO_CONCEPT_MAP_STAGE
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB @28_transform_row_maps.sql
1.5.8 Verify that number of records loaded is equivalent to prior production load In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema.
a. Number of Records in stage table
b. Number of records in DEV schema not deleted (active, prior load production)
c. How many records would be added to DEV schema table
d. How many active DEV records will be marked for deletion In most cases number of records in stage (a) should be greater than number of active records in production (b) In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.
$ sqlplus VAPROD_20120806/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (28) UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (28) AND D.TARGET_VOCABULARY_ID IN (08) AND NVL (d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (28) AND c.TARGET_VOCABULARY_ID IN (08) AND NOT EXISTS (SELECT 1 FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (28) AND c.SOURCE_CODE = D.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (28) AND D.TARGET_VOCABULARY_ID IN (08) AND NVL (d.INVALID_REASON, 'X') <> 'D' AND d.VALID_START_DATE < TO_DATE ( SUBSTR (USER, REGEXP_INSTR (USER, '_[[:digit:]]') + 1, 256), 'YYYYMMDD') AND NOT EXISTS (SELECT 1 FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (28) AND c.SOURCE_CODE = D.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID) AND EXISTS (SELECT 1 FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE d.SOURCE_CODE = c.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID);
|
Current Result:
- Num Rec in stage 12814
- Num Rec in DEV not deleted 24994
- How many records would be new in DEV added 169
- How many DEV active will be marked for deletion 8
1.5.9 Load new maps into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now. Transfer records from SOURCE_TO_CONCEPT_MAP_STAGE to DEV.SOURCE_TO_CONCEPT_MAP table
$ sqlplus
VAPROD_20120806/myPass@DEV_VOCAB @28_load_maps.sql |
 Updating...
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
Serg Vereshagin, Oct 25, 2013, 7:51 AM
|