Procedure Domain‎ > ‎

CPT-4

Vocabulary_ID=4

The following describes the procedure to update the CPT-4 vocabulary  with the latest release of the source.

The load process is a complete refresh of data. All new vocabulary entieties will be added. All vocabulary entities that do not exist in the latest files set will be marked as deprecated.


1.1 Scripts Transfer

All scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/CPT).


 Script Name Description
04_create_schema.sql Create prestage-stage schema with tables and indexes
04_transform_row_maps.sql Convert and store in stage table SOURCE_TO_CONCEPT_MAP_STAGE
04_load_relations.sql  Load only new relationships into relationship table, add invalid code information.
04_load_concepts.sql
 Load only new concepts into concept table, add invalid code information
04_load_maps.sql Load only new maps into SOURCE_TO_CONCEPT_MAP table, add invalid code information
04_transform_row_concepts.sql   
 Convert and store in stage table concept_stage
04_transform_row_relations.sql 
 Convert and store in stage table concept_relationship_stage


1.2 Download data from source

 

            1.2.1 Download data from UMLS source

                    The most current international release is located at the following URL: http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html

                    Note: Make sure you perform updates in chronological order. So, if you want to process a January and April release, execute the below steps for January, and after successful completion, for April.

                    Download the latest release file   (e.g. ) 

 
2012AA UMLS Files
May 7 2012 11:00:00 ET
2012AA.CHK
2012AA.MD5
2012AA-1-meta.nlm
2012AA-2-meta.nlm
2012AA-otherks.nlm
mmsys.zip
Copyright_Notice.txt
README.txt

                     Transfer downloaded zip file to /data/CPT directory.

                     Latest production zip file is ~4GB in size

  • Download and extract all UMLS data and zip files to the same directory.
  • Read the README file before downloading, which includes information on installing the UMLS Knowledge Sources and highlights changes for this release.
  • Additional documentation is available at http://www.nlm.nih.gov/research/umls.
  • At 1M bits/second, it will take approximately 4 hours to download the complete UMLS release. Downloading is advised only if you have a very fast Internet connection.


  • 1.3 Build Database Schema

    All tables will be located in the schema CPT_<DateOfSourceFile> as following: CPT_20120131
    The following steps will create a schema and will grant to a schema owner appropriate permissions. 


    In addition to the schema, it will create tables to load raw data: MRCONSO  as well as staging tables: CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE, 

    CONCEPT_RELATIONSHIP_STAGE,

    SOURCE_TO_CONCEPT_MAP_STAGE

    $ sqlplus System/<SystemPass>@DEV_VOCAB @01_create_schema.sql <CPT_Schema> <Pass_CPT_Schema>

    Format of CPT_Schema should be CPT_YYYYMMDD (e.g. CPT_20120131)


    1.4 Raw data load into Oracle tables


            1.4.1    Load UMLS raw data into Oracle.  

    INSTALLING FROM DVD OR HARD DISK
    - For best results the DVD drive should be 6X or higher.
    - Insert DVD and start MetamorphoSys:
      - Windows: The DVD should autorun. If not, or if installing from hard disk, go to the DVD root directory and click on "run.bat."

    Click on "Install UMLS" button.
    Each UMLS release includes MetamorphoSys, required to install Knowledge Sources files, and to create, search and browse customized Metathesaurus subsets. MetamorphoSys requires a minimum of 30 GB of free hard disk and takes 2-10 hours to run on a range of platforms tested. The actual time will depend on your configuration, hardware and operating system platforms.

    MetamorphoSys can generate custom load scripts for Oracle when creating a Metathesaurus subset or installing the Semantic Network. Instructions for loading a subset of the UMLS Metathesaurus or Semantic Network into a database, by using the scripts and files created by MetamorphoSys, are provided below.  Some modification of the scripts created by MetamorphoSys might be needed under special circumstances.

    • Create a schema specifically for this data.
    • Set tablespace at unlimited, or high enough to ensure the amount of space available is what is needed by the data. Granting UNLIMITED TABLESPACE is acceptable as long as "AUTOEXTEND" is turned on to allow for unlimited table space growth.
    • Create synonyms for tables or only access tables using the same login used to create them.
    • Do not use the SYS/SYSTEM account to load data into the database since the default table space is SYSTEM, which is reserved for the data dictionary.
    • Use MTH or another account to insert data into the database; do not use "AS SYSDBA" to see the tables. Setting the proper grants from the beginning will solve this problem.
    • The loader script must be changed to SCHEMA.TABLE format to insert data into the proper tables if the SYS/SYSTEM account is used.
    • The database must use the UTF-8 character set.

    Go to the top-level UMLS directory and click on "populate_oracle_db.bat" 

            

            1.4.2 Copy data and intermediate files to backup area In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.         

            $ cp -Ru /data/CPT/*.zip /data/backup_area ; cp -Ru /data/CPT/*.log /data/backup_area


            1.4.3  Verify that number of records loaded is equivalent to prior production load

                         sqlplus  CPT_20120507/myPass@DEV_VOCAB

                Execute the following SQL commands:

      SELECT count(*) FROM MRCONSO;
                Current production row count is:     10,810,680 Rows


    1.5 Loading Staging Tables from raw


    1.5.1 Convert and store in staging table concepts

    Loaded from raw staged data RXNCONSO into the staged data CONCEPT_STAGE

    In this step we will perform substitute GUID identifiers into numerical format.

     

    $  sqlplus CPT_20120507/myPass@DEV_VOCAB @04_transform_row_concepts.sql


    1.5.2  Verify that number of records loaded is equivalent to prior production load

        In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema. 

        a. Number of Records in stage table

        b. Number of records in DEV schema not deleted (active, prior load production)

        c. How many records would be added to DEV schema table

                 d. How many active DEV records will be marked for deletion


        In most cases  number of records in stage (a) should be greater than number of active records in production (b) 
        In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.



         $ sqlplus  CPT_20120507/myPass@DEV_VOCAB

     Execute the following SQL commands:

    SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
      FROM CONCEPT_STAGE c
     WHERE C.VOCABULARY_ID = 04
    UNION ALL
    SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
      FROM DEV.CONCEPT d
     WHERE d.VOCABULARY_ID = 04 AND NVL (d.INVALID_REASON, 'X') <> 'D'
    UNION ALL
    SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
      FROM CONCEPT_STAGE c
     WHERE C.VOCABULARY_ID = 04
           AND NOT EXISTS
                  (SELECT 1
                     FROM DEV.CONCEPT d
                    WHERE d.VOCABULARY_ID = 04 AND c.CONCEPT_CODE = D.CONCEPT_CODE)
    UNION ALL
    SELECT '- How many DEV active will be marked for deletion' AS scr,
           COUNT (8) cnt
      FROM DEV.CONCEPT d
     WHERE     d.VOCABULARY_ID = 04
           AND NVL (d.INVALID_REASON, 'X') <> 'D'
           AND NOT EXISTS
                  (SELECT 1
                     FROM CONCEPT_STAGE c
                    WHERE c.VOCABULARY_ID = 04 AND c.CONCEPT_CODE = D.CONCEPT_CODE);

    Current Result:
    - Num Rec in stage                                                    9706
    - Num Rec in DEV not deleted                                     9932
    - How many records would be new in DEV added            185
    - How many DEV active will be marked for deletion         420

        1.5.3    Load new concepts into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now.
              Transfer records from CONCEPT_STAGE to DEV.CONCEPT table


        $  sqlplus CPT_20120507/myPass@DEV_VOCAB @04_load_concepts.sql



    1.5.4    Load new concept relationships stage, identify invalid codes  information 

           Loaded from the raw staged data RXNCONSO into the staged data CONCEPT_RELATIONSHIP_STAGE, identify invalid code information     

          $ 
    sqlplus CPT_20120507/myPass@DEV_VOCAB @04_transform_row_relations.sql 


    1.5.5  Verify that number of records loaded is equivalent to prior production load





            SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
              FROM CONCEPT_RELATIONSHIP_STAGE c
            UNION ALL
            SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
              FROM DEV.CONCEPT_RELATIONSHIP d
             WHERE     EXISTS
                          (SELECT 1
                             FROM DEV.CONCEPT c
                            WHERE c.VOCABULARY_ID  IN (01, 04) AND c.CONCEPT_ID = CONCEPT_ID_1)
                   AND EXISTS
                          (SELECT 1
                             FROM DEV.CONCEPT c
                            WHERE c.VOCABULARY_ID  IN (01, 04) AND c.CONCEPT_ID = CONCEPT_ID_2)
                   AND NVL(d.INVALID_REASON, 'X') <> 'D'
                   AND d.RELATIONSHIP_ID IN (94, 228)
            UNION ALL
            SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
              FROM CONCEPT_RELATIONSHIP_STAGE c
             WHERE NOT EXISTS
                          (SELECT 1
                             FROM DEV.CONCEPT_RELATIONSHIP d
                            WHERE     c.CONCEPT_ID_1 = d.CONCEPT_ID_1
                                  AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2
                                  AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID)
            UNION ALL
            SELECT '- How many DEV active will be marked for deletion' AS scr,
                   COUNT (8) cnt
              FROM DEV.CONCEPT_RELATIONSHIP d
             WHERE     EXISTS
                          (SELECT 1
                             FROM DEV.CONCEPT c
                            WHERE c.VOCABULARY_ID IN (01, 04)  AND c.CONCEPT_ID = CONCEPT_ID_1)
                   AND EXISTS
                          (SELECT 1
                             FROM DEV.CONCEPT c
                            WHERE c.VOCABULARY_ID IN (01, 04)  AND c.CONCEPT_ID = CONCEPT_ID_2)
                   AND NVL(d.INVALID_REASON, 'X') <> 'D'
                   AND d.RELATIONSHIP_ID IN (94, 228)
                   AND NOT EXISTS
                              (SELECT 1
                                 FROM CONCEPT_RELATIONSHIP_STAGE c
                                WHERE     c.CONCEPT_ID_1 = d.CONCEPT_ID_1
                                      AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2
                                      AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID);
    Current Result:
    - Num Rec in stage                                                   383926
    - Num Rec in DEV not deleted                                     382726
    - How many records would be new in DEV added            218
    - How many DEV active will be marked for deletion          570
    1.5.6   Load new concept relationships into DEV schema,mark deprecated concept relatuionships as deleted

    Transfer records from CONCEPT_RELATIONSHIP_STAGE to DEV.CONCEPT_RELATIONSHIP table       


    sqlplus CPT_20120507/myPass@DEV_VOCAB @04_load_relations.sql 


        

               1.5.7 Convert and store in staging table maps

    Loaded from raw staged data RXNCONSO into the staged data SOURCE_TO_CONCEPT_MAP_STAGE


     

    $  sqlplus CPT_20120507/myPass@DEV_VOCAB @04_transform_row_maps.sql


    1.5.7  Verify that number of records loaded is equivalent to prior production load

             In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema. 

        a. Number of Records in stage table

        b. Number of records in DEV schema not deleted (active, prior load production)

        c. How many records would be added to DEV schema table

                d. How many active DEV records will be marked for deletion


                In most cases  number of records in stage (a) should be greater than number of active records in production (b) 
                In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.



         $ sqlplus CPT_20120507/myPass@DEV_VOCAB



            Execute the following SQL commands:
    SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
      FROM SOURCE_TO_CONCEPT_MAP_STAGE c
     WHERE c.SOURCE_VOCABULARY_ID IN (04)
    UNION ALL
    SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
      FROM DEV.SOURCE_TO_CONCEPT_MAP d
     WHERE     d.SOURCE_VOCABULARY_ID IN (04)
           AND D.TARGET_VOCABULARY_ID IN (04)
           AND NVL (d.INVALID_REASON_CD, 'X') <> 'D'
    UNION ALL
    SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
      FROM SOURCE_TO_CONCEPT_MAP_STAGE c
     WHERE c.SOURCE_VOCABULARY_ID IN (04) AND c.TARGET_VOCABULARY_ID IN (04)
           AND NOT EXISTS
                      (SELECT 1
                         FROM DEV.SOURCE_TO_CONCEPT_MAP d
                        WHERE     d.SOURCE_VOCABULARY_ID IN (04)
                              AND c.SOURCE_CODE = D.SOURCE_CODE
                              AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                              AND d.MAPPING_TYPE = c.MAPPING_TYPE
                              AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID
                              AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID)
    UNION ALL
    SELECT '- How many DEV active will be marked for deletion' AS scr,
           COUNT (8) cnt
      FROM DEV.SOURCE_TO_CONCEPT_MAP d
     WHERE     d.SOURCE_VOCABULARY_ID IN (04)
           AND D.TARGET_VOCABULARY_ID IN (04)
           AND NVL (d.INVALID_REASON_CD, 'X') <> 'D'
           AND d.VALID_START_DATE <
                  TO_DATE (
                     SUBSTR (USER, REGEXP_INSTR (USER, '_[[:digit:]]') + 1, 256),
                     'YYYYMMDD')
           AND NOT EXISTS
                      (SELECT 1
                         FROM SOURCE_TO_CONCEPT_MAP_STAGE c
                        WHERE     c.SOURCE_VOCABULARY_ID IN (04)
                              AND c.SOURCE_CODE = D.SOURCE_CODE
                              AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                              AND d.MAPPING_TYPE = c.MAPPING_TYPE
                              AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID
                              AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID)
           AND EXISTS
                  (SELECT 1
                     FROM SOURCE_TO_CONCEPT_MAP_STAGE c
                    WHERE     d.SOURCE_CODE = c.SOURCE_CODE
                          AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID
                          AND d.MAPPING_TYPE = c.MAPPING_TYPE
                          AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID);

                    Current Result:
    - Num Rec in stage                                                   9639
    - Num Rec in DEV not deleted                                     9453
    - How many records would be new in DEV added            284
    - How many DEV active will be marked for deletion          0


        1.5.8   Load new maps into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now.
              Transfer records from SOURCE_TO_CONCEPT_MAP_STAGE to DEV.SOURCE_TO_CONCEPT_MAP table

        $  sqlplus CPT_20120507/myPass@DEV_VOCAB @04_load_maps.sql



    ċ
    04_create_schema.sql
    (9k)
    Serg Vereshagin,
    Oct 24, 2013, 2:42 AM
    ċ
    04_load_concepts.sql
    (4k)
    Serg Vereshagin,
    Oct 24, 2013, 2:42 AM
    ċ
    04_load_maps.sql
    (6k)
    Serg Vereshagin,
    Oct 24, 2013, 2:43 AM
    ċ
    04_load_relations.sql
    (4k)
    Serg Vereshagin,
    Oct 24, 2013, 2:43 AM
    ċ
    04_transform_row_concepts.sql
    (3k)
    Serg Vereshagin,
    Oct 24, 2013, 2:43 AM
    ċ
    04_transform_row_maps.sql
    (3k)
    Serg Vereshagin,
    Oct 24, 2013, 2:43 AM
    ċ
    04_transform_row_relations.sql
    (4k)
    Serg Vereshagin,
    Oct 24, 2013, 2:43 AM
    Comments