Drug Domain‎ > ‎

FDA SPL

Vocabulary_ID=50

The following describes the procedure to update the FDA SPL vocabulary  with the latest release of the source.

The load process is a complete refresh of data. All new vocabulary entieties will be added. All vocabulary entities that do not exist in the latest files set will be marked as deprecated.


1.1 Scripts Transfer

All scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/FDASPL).


 Script Name  Description
50_create_schema.sql  Create prestage-stage schema with tables and indexes
50_load_to_prestage.sh  Load RxNorm row-data
50_load_to_prestage_FDA_PRO.ctl FDA Concept control file
50_transform_row_maps.sql      Convert and store in stage table SOURCE_TO_CONCEPT_MAP_STAGE
50_load_maps.sql 
 Load only new maps into SOURCE_TO_CONCEPT_MAP table, add invalid code information


1.2 Download data from source


        1.2.1 Download data from RxNorm source

                The most current international release is located at the following URL: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html

                Historical archives located at: http://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormarchive.html

                Note: Make sure you perform updates in chronological order. So, if you want to process a January and April release, execute the below steps for January, and after successful completion, for April.

                Download the latest release file  (e.g. RxNorm_full_01062014.zip)

                Transfer downloaded zip file to /data/FDASPL directory.

                Zip file suffix contains release date (e.g. 08062012=06-Aug-2012)
                Latest production zip file is ~131MB in size

                Note: For the rest of the document we will be referring to current schema as FDASPL_20120806 you would have to adjust this entry to correspond to current release date file.

                        All ctl-files for loading RxNorm raw data are situated in RxNorm***.zip from the data provider.



         1.2.2 Download data from FDA source

                    The most current international release is located at the following URL:

                     http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/ndc.zip


    
               
Download the latest release file (e.g. ndc.zip)

                     Transfer downloaded zip file to /data/FDASPL directory.
                     Latest production zip file is ~14MB in size



1.3 Build Database Schema

All tables will be located in the schema FDASPL_<DateOfSourceFile> as following: FDASPL_20120131
The following steps will create a schema and will grant to a schema owner appropriate permissions. 


In addition to the schema, it will create tables to load raw data: RXNATOMARCHIVE,RXNCONSO,RXNCUI,RXNCUICHANGES,RXNDOC,RXNREL,RXNSAB,RXNSAT,FDA_NDC_PRODUCTS and RXNSTY  as well as staging tables: 

CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE, 

 

CONCEPT_RELATIONSHIP_STAGE, 

 

SOURCE_TO_CONCEPT_MAP_STAGE

$ sqlplus System/<SystemPass>@DEV_VOCAB @50_create_schema.sql <FDASPL_Schema> <Pass_FDASPL_Schema>

Format of  FDASPL_Schema should be FDASPL_YYYYMMDD (e.g.  FDASPL_20120806  )


Staging tables match Vocabulary schema version 4.0 with slight modification. Modifications made to the tables are:

        CONCEPT_STAGE.CONCEPT_ID is nullable     

        SOURCE_TO_CONCEPT_MAP_STAGE

.SOURCE_TO_CONCEPT_MAP_ID is nullable

        

        CONCEPT_RELATIONSHIP_STAGE

 .REL_ID is nullable

1.4 Raw data load into Oracle tables

1.4.1 Extract files from the zip archive into the current directory

        $ unzip -u RxNorm_full_12022013.zip -d rxnorm_20131202

                Format of RXNORM_directory should be rxnorm_YYYYMMDD (e.g. rxnorm_20131202)


1.4.2 Verify that the following 9 files have been created:

      RXNCONSO.RRF    (Latest file size ~96MB)

    RXNATOMARCHIVE.RRF    (Latest file size ~42MB)

    RXNCUI.RRF    (Latest file size ~1MB)

    RXNCUICHANGES.RRF    (Latest file size ~1MB)

    RXNDOC.RRF    (Latest file size ~1MB)

    RXNREL.RRF    (Latest file size ~230MB)

    RXNSAB.RRF    (Latest file size ~8KB)

    RXNSAT.RRF    (Latest file size ~385MB)

   RXNSTY.RRF    (Latest file size ~15MB)


 1.4.3    Load RxNorm raw data into Oracle. 

            This sqlldr process will run for about 3 min.

            $ chmod 744 50_load_to_prestage.sh

            $ ./50_load_to_prestage.sh /data/FDASPL/rxnorm_20131202 <FDASPL_Schema>/<Pass_FDASPL_Schema>

  

           1.4.4 Verify that files with extension .bad are empty, no records have been rejected.


   
1.4.5 Extract files from the zip archive into the current directory

        $ unzip -u ndc.zip

         

             1.4.6    Load FDA raw data into Oracle. 

             This sqlldr process will run for about 1min.

                  $ sqlldr FDASPL_20140128/myPass@DEV_VOCAB control=50_load_to_prestage_FDA_PRO.ctl


           1.4.7 Verify that files with extension .bad are empty, no records have been rejected.


           1.4.8 Copy data and intermediate files to backup area In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.         

                    $ cp -Ru /data/FDASPL/*.zip /data/backup_area ; cp -Ru /data/FDASPL/*.log /data/backup_area


            1.4.9  Verify that number of records loaded is equivalent to prior production load

                     sqlplus FDASPL_20120806/myPass@DEV_VOCAB

            Execute the following SQL commands:

    SELECT count(*) FROM RXNCONSO;
Current production row count is:     1,052,899 Rows

 SELECT count(*) FROM RXNREL;
Current production row count is:     4,7837,173 Rows

    SELECT count(*) FROM FDA_NDC_PRODUCTS;
Current production row count is:     75,362 Rows

 SELECT count(*) FROM RXNSAT;
Current production row count is:     7,385,564 Rows






1.5 Loading Staging Tables from raw


1.5.1 Convert and store in staging table maps

Loaded from raw staged data RXNCONSO into the staged data SOURCE_TO_CONCEPT_MAP_STAGE


 

$  sqlplus FDASPL_20120806/myPass@DEV_VOCAB @50_transform_row_maps.sql


1.5.2  Verify that number of records loaded is equivalent to prior production load

    In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema. 

    a. Number of Records in stage table

    b. Number of records in DEV schema not deleted (active, prior load production)

    c. How many records would be added to DEV schema table

            d. How many active DEV records will be marked for deletion


    In most cases  number of records in stage (a) should be greater than number of active records in production (b) 
    In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.



     $ sqlplus FDASPL_20120806/myPass@DEV_VOCAB





        Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
  FROM SOURCE_TO_CONCEPT_MAP_STAGE c
 WHERE c.SOURCE_VOCABULARY_ID IN (50)
UNION ALL
SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
  FROM DEV.SOURCE_TO_CONCEPT_MAP d
 WHERE     d.SOURCE_VOCABULARY_ID IN (50)
       AND D.TARGET_VOCABULARY_ID IN (08)
       AND NVL (d.INVALID_REASON, 'X') <> 'D'
UNION ALL
SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
  FROM SOURCE_TO_CONCEPT_MAP_STAGE c
 WHERE c.SOURCE_VOCABULARY_ID IN (50) AND c.TARGET_VOCABULARY_ID IN (08)
       AND NOT EXISTS
                  (SELECT 1
                     FROM DEV.SOURCE_TO_CONCEPT_MAP d
                    WHERE     d.SOURCE_VOCABULARY_ID IN (50)
                          AND c.SOURCE_CODE = D.SOURCE_CODE
                          AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                          AND d.MAPPING_TYPE = c.MAPPING_TYPE
                          AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID
                          AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID)
UNION ALL
SELECT '- How many DEV active will be marked for deletion' AS scr,
       COUNT (8) cnt
  FROM DEV.SOURCE_TO_CONCEPT_MAP d
 WHERE     d.SOURCE_VOCABULARY_ID IN (50)
       AND D.TARGET_VOCABULARY_ID IN (08)
       AND NVL (d.INVALID_REASON, 'X') <> 'D'
       AND d.VALID_START_DATE <
              TO_DATE (
                 SUBSTR (USER, REGEXP_INSTR (USER, '_[[:digit:]]') + 1, 256),
                 'YYYYMMDD')
       AND NOT EXISTS
                  (SELECT 1
                     FROM SOURCE_TO_CONCEPT_MAP_STAGE c
                    WHERE     c.SOURCE_VOCABULARY_ID IN (50)
                          AND c.SOURCE_CODE = D.SOURCE_CODE
                          AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                          AND d.MAPPING_TYPE = c.MAPPING_TYPE
                          AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID
                          AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID)
       AND EXISTS
              (SELECT 1
                 FROM SOURCE_TO_CONCEPT_MAP_STAGE c
                WHERE     d.SOURCE_CODE = c.SOURCE_CODE
                      AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                      AND d.MAPPING_TYPE = c.MAPPING_TYPE
                      AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID);

Current Result:
- Num Rec in stage                                                   62800
- Num Rec in DEV not deleted                                     54867
- How many records would be new in DEV added            3893
- How many DEV active will be marked for deletion          312

    1.5.3    Load new maps into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now.
          Transfer records from SOURCE_TO_CONCEPT_MAP_STAGE to DEV.SOURCE_TO_CONCEPT_MAP table

    $  sqlplus FDASPL_20120806/myPass@DEV_VOCAB @50_load_maps.sql

ċ
50_create_schema.sql
(10k)
Serg Vereshagin,
Jan 31, 2014, 5:35 AM
ċ
50_load_maps.sql
(6k)
Serg Vereshagin,
Jan 31, 2014, 3:22 AM
ċ
50_load_to_prestage.sh
(5k)
Serg Vereshagin,
Jan 31, 2014, 3:22 AM
ċ
50_load_to_prestage_FDA_PRO.ctl
(1k)
Serg Vereshagin,
Jan 31, 2014, 3:22 AM
ċ
50_transform_row_maps.sql
(11k)
Serg Vereshagin,
Jan 31, 2014, 4:57 AM
Comments