Condition Domain‎ > ‎

SNOMED

Vocabulary_ID=1

The following describes the procedure to update the SNOMED vocabulary (concepts, synonyms, relationships, ancestry) with the latest release of the source.

The load process is a complete refresh of data. All new vocabulary entieties will be added. All vocabulary entities that do not exist in the latest files set will be marked as deprecated.

The concept hierarchy is defined in OMOP for SNOMED concepts that are part of the following:

       Clinical finding (Condition) hierarchy

       Procedure hierarchy


1.1 Scripts Transfer

All scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/SNOMED).


 Script Name  Description
01_create_schema.sql  Create prestage-stage schema with tables and indexes
01_load_to_prestage_concepts.ctl  Concept Descriptons control file
01_load_to_prestage_relations.ctl  Concept Relationship control file
01_load_to_prestage_ref.ctl Association Reference control file
01_load_to_prestage_concepts_short.ctl Concept control file
01_load_to_sct1_concept.ctl SCT1 Concept control file
01_load_to_prestage_core.ctl SCT1 Concept Relationship control file
01_transform_row_relations.sql
 Convert and store in stage table concept_relationship_stage
01_transform_row_concepts.sql      Convert and store in stage table concept_stage
01_load_concepts.sql   Load only new concepts into concept table, add invalid code information
01_load_relations.sql    Load only new relationships into relationship table, add invalid code information.
01_build_hierarchy.sql    Generated the ancestry data (010 RelID) without cycles
01_build_stage_ancestry.sql    Generated unique ancestor-descendant concept pairs from the staged data
01_convert_foreign_characters.pl
 Perl script to concert foreign characters to plain ASCII.
01_set_concept_class.sql  Refresh DEV.CONCEPT based on CONCEPT_ANCESTOR_STAGE



1.2 Download data from source

1.2.1 Download data from SNOMED source

The most current international release is located at the following URL: http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html

Historical archives located at: http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctarchive.html

Note: Make sure you perform updates in chronological order. So, if you want to process a January and April release, execute the below steps for January, and after successful completion, for April.

Download the latest release file (e.g. SnomedCT_Release_INT_20130131.zip)
Transfer downloaded zip file to
/data/SNOMED directory.

Zip file suffix contains release date (e.g. 20130131=31-Jan-2013)
Latest production zip file is ~0.5GB in size

Note: For the rest of the document we will be referring to current schema as SNOMED_20120731 you would have to adjust this entry to correspond to current release date file


1.2.2 Download data from Health and Social Care Information Centre (HSCIC) source

                    The most current international release is located at the following URL: https://www.uktcregistration.nss.cfh.nhs.uk/trud3/user/authenticated/group/0/pack/8

 Download the latest release file (e.g. nhs_snomed_13.7.0_20120627000001.zip )


1.3 Build Database Schema

All tables will be located in the schema SNOMED_<DateOfSourceFile> as following: SNOMED_20120131
The following steps will create a schema and will grant to a schema owner appropriate permissions. 


In addition to the schema, it will create tables to load raw data: SCT2_DESC_FULL_EN_INT and SCT2_RELA_FULL_INT as well as staging tables: CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE, 

CONCEPT_RELATIONSHIP_STAGE,

SOURCE_TO_CONCEPT_MAP_STAGE

$ sqlplus System/<SystemPass>@DEV_VOCAB @01_create_schema.sql <SNOMED_Schema> <Pass_SNOMED_Schema>

Format of SNOMED_Schema should be SNOMED_YYYYMMDD (e.g. SNOMED_20120131)


Staging tables match Vocabulary schema version 4.0 with slight modification. Modifications made to the tables are:

CONCEPT_STAGE.CONCEPT_ID is nullable

           CONCEPT_RELATIONSHIP_STAGE

.REL_ID is nullable

We also create intermediate staging table: CONCEPT_TREE_STAGE


1.4 Raw data load into Oracle tables

1.4.1 Extract files from the zip archives into the current directory

  $ unzip SnomedCT_Release_INT_*.zip -u -W "SnomedCT_Release_INT_*/RF2Release/Full/Terminology/*.*" -d $PWD
            $ unzip SnomedCT_Release_INT_*.zip -u -W "SnomedCT_Release_INT_*/RF2Release/Full/Refset/Content/*.*" -d $PWD

            $  mv $PWD/SnomedCT_Release_INT_*/RF2Release/Full/Terminology/*.* $PWD
            $  mv $PWD/SnomedCT_Release_INT_*/RF2Release/Full/Refset/Content/*.* $PWD

  $  unzip nhs_snomed_*.zip -u -W "SnomedCT_GB1000001_*/Terminology/Content/*.*" -d $PWD
            $  mv $PWD/SnomedCT_GB1000001_*/Terminology/Content/*.* $PWD



1.4.2 Verify that the following 2 files have been created:

        sct2_Relationship_Full_INT_20120731.txt     (Latest file size ~470MB)

        sct2_Description_Full-en_INT_20120731.txt  (Latest file size ~180MB)


1.4.3 Filter out extraneous copyright messages in the data file. Create consistent file names that are not date dependent. Some of the SNOMED code descriptions contain foreign language characters that may not display correctly in the database.  The perl script 01_convert_foreign_characters.pl will convert these characters to their English equivalent.  The script reads the file passed in an write to STDOUT.
$ perl 01_convert_foreign_characters.pl inFile.txt >outFile.txt
outFile.txt will have the English equivalents.


$ grep     'All rights reserved.'  sct2_Description_Full-en_INT_20120731.txt  > sct2_Description_Full-en_INT.Copyright

grep -v 'All rights reserved.'  sct2_Description_Full-en_INT_20120731.txt  > sct2_Description_Full-en_INT.tx

$ perl 01_convert_foreign_characters.pl   sct2_Description_Full-en_INT.tx  >  sct2_Description_Full-en_INT.txt

$ mv 
sct2_Relationship_Full_INT_20120731.txt   sct2_Relationship_Full_INT.txt

$ mv der2_cRefset_AssociationReferenceFull_INT_20120731.txt  der2_cRefset_AssociationReferenceFull_INT.txt

$ grep -v 'All rights reserved.'  sct2_Concept_Full_INT_20120731.txt > sct2_Concept_Full_INT.txt

$ mv sct1_Concepts_National_GB1000001_20120627.txt sct1_Concepts.txt

$ mv sct1_Relationships_National_GB1000001_20120627.txt sct1_Relationships_Core_INT.txt



1.4.4.    Load raw data into Oracle.

        Each sqlldr process will run for about 1min.

          $ sqlldr SNOMED_20120731/myPass@DEV_VOCAB control=01_load_to_prestage_concepts.ctl

          $ sqlldr SNOMED_20120731/myPass@DEV_VOCAB control=01_load_to_prestage_relations.ctl

          $ sqlldr SNOMED_20120731/myPass@DEV_VOCAB control=01_load_to_prestage_ref.ctl

          $ sqlldr SNOMED_20120131/myPass@DEV_VOCAB control=01_load_to_prestage_concepts_short.ctl

          $ sqlldr SNOMED_20120131/myPass@DEV_VOCAB control=01_load_to_sct1_concept.ctl

          $ sqlldr SNOMED_20120131/myPass@DEV_VOCAB control=01_load_to_prestage_core.ctl




1.4.4.1 Verify that files with extension .bad are empty, no records have been rejected.

1.4.4.2 Copy data and intermediate files to backup area In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.         

        $ cp -Ru /data/SNOMED/*.zip /data/backup_area ; cp -Ru /data/SNOMED/*.log /data/backup_area

1.4.4.3  Verify that number of records loaded is approximate equivalent to prior stage load

 $ sqlplus SNOMED_20120731/myPass@DEV_VOCAB

Execute the following SQL commands:

 SELECT count(*) FROM SCT2_DESC_FULL_EN_INT;
Current production row count is:  1,448,898 Rows

 SELECT count(*) FROM SCT2_RELA_FULL_INT;
Current production row count is:  4,458,960  Rows

 SELECT count(*) FROM der2_cRefset_AssRefFull_INT;
Current production row count is:  135,445  Rows


 SELECT count(*) FROM SCT2_CONCEPT_FULL_INT;
Current production row count is:  496,827  Rows

 
SELECT count(*) FROM sct1_Concepts;
Current production row count is:  355,035 Rows

 SELECT count(*) FROM sct1_Relationships_Core_INT;
Current production row count is:  1,442,518  Rows



1.5 Loading Staging Tables from raw


1.5.1 Convert and store in staging table concepts

Loaded from raw staged data SCT2_DESC_FULL_EN_INT into the staged data CONCEPT_STAGE

In this step we will perform substitute GUID identifiers into numerical format.

 

$  sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_transform_row_concepts.sql




1.5.2  Verify that number of records loaded is equivalent to prior production load

    In this step we will verify that number of active records in staging tables is valid prior to transfer data to historical DEV schema. 

a. Number of Records in stage table

b. Number of records in DEV schema not deleted (active, prior load production)

c. How many records would be added to DEV schema table

d. How many active DEV records will be marked for deletion


    In most cases  number of records in stage (a) should be greater than number of active records in production (b) 
    In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.



     $ sqlplus SNOMED_20120731/myPass@DEV_VOCAB



        Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
  FROM CONCEPT_STAGE c
 WHERE C.VOCABULARY_ID = 01
UNION ALL
SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
  FROM DEV.CONCEPT d
 WHERE d.VOCABULARY_ID = 01 AND NVL (d.INVALID_REASON, 'X')  NOT IN ( 'D', 'U')
UNION ALL
SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
  FROM CONCEPT_STAGE c
 WHERE C.VOCABULARY_ID = 01
       AND NOT EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT d
                WHERE d.VOCABULARY_ID = 01 AND c.CONCEPT_CODE = D.CONCEPT_CODE)
UNION ALL
SELECT '- How many DEV active will be marked for deletion' AS scr,
       COUNT (8) cnt
  FROM DEV.CONCEPT d
 WHERE     d.VOCABULARY_ID = 01
       AND NVL (d.INVALID_REASON, 'X')  NOT IN ( 'D', 'U')
       AND NOT EXISTS
              (SELECT 1
                 FROM CONCEPT_STAGE c
                WHERE c.VOCABULARY_ID = 01 AND c.CONCEPT_CODE = D.CONCEPT_CODE);               

Current Result:
- Num Rec in stage                                                   298818
- Num Rec in DEV not deleted                                    297944
- How many records would be new in DEV added            1119
- How many DEV active will be marked for deletion          304


1.5.3    Load new concepts into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid reason are left empty for now.
      Transfer records from CONCEPT_STAGE to DEV.CONCEPT table

$  sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_load_concepts.sql


1.5.4   
Load new concept relationships stage, identify invalid codes 
information 
     Loaded from the raw staged data SCT2_RELA_FULL_INT into the staged data CONCEPT_RELATIONSHIP_STAGE, identify invalid code information     

$ 
sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_transform_row_relations.sql 


SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
  FROM CONCEPT_RELATIONSHIP_STAGE c
UNION ALL
SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt
  FROM DEV.CONCEPT_RELATIONSHIP d
 WHERE     EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT c
                WHERE c.VOCABULARY_ID = 01 AND c.CONCEPT_ID = CONCEPT_ID_1)
       AND EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT c
                WHERE c.VOCABULARY_ID = 01 AND c.CONCEPT_ID = CONCEPT_ID_2)
       AND NVL(d.INVALID_REASON, 'X') <> 'D'
UNION ALL
SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt
  FROM CONCEPT_RELATIONSHIP_STAGE c
 WHERE NOT EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT_RELATIONSHIP d
                WHERE     c.CONCEPT_ID_1 = d.CONCEPT_ID_1
                      AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2
                      AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID)
UNION ALL
SELECT '- How many DEV active will be marked for deletion' AS scr,
       COUNT (8) cnt
  FROM DEV.CONCEPT_RELATIONSHIP d
 WHERE     EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT c
                WHERE c.VOCABULARY_ID = 01 AND c.CONCEPT_ID = CONCEPT_ID_1)
       AND EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT c
                WHERE c.VOCABULARY_ID = 01 AND c.CONCEPT_ID = CONCEPT_ID_2)
       AND NVL(d.INVALID_REASON, 'X') <> 'D'
       AND NOT EXISTS
                  (SELECT 1
                     FROM CONCEPT_RELATIONSHIP_STAGE c
                    WHERE     c.CONCEPT_ID_1 = d.CONCEPT_ID_1
                          AND c.CONCEPT_ID_2 = d.CONCEPT_ID_2
                          AND c.RELATIONSHIP_ID = d.RELATIONSHIP_ID);
Current Result:
- Num Rec in stage                                                          1780178
- Num Rec in DEV not deleted                                         1654886
- How many records would be new in DEV added            139814
- How many DEV active will be marked for deletion            16452

 

1.5.5    Load new concept relationships into DEV schema,mark deprecated concept relatuionships as deleted

Transfer records from CONCEPT_RELATIONSHIP_STAGE to DEV.CONCEPT_RELATIONSHIP table       


sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_load_relations.sql 





1.6 Updating CONCEPT, CONCEPT_ANCESTOR.

Creates the ancestry data between concepts based on their relationships.

There should be only two hierarchies in the table CONCEPT_ANCESTOR in case of SNOMED: "Procedure" and "Clinical finding".

SNOMED hierarchy provided with the source data does not function properly. It contains invalid relationships such as: Father-Son-Father... (cycle relationships)

In Staging schema,we  create 2 concept relationship tables with hierarchy matching source SNOMED hierarchy but have removed cycle relationships. Those tables are:

 Table NameDescriptionTop Level Concept ID
 CONCEPT_REL_STAGE_PROCEDURE Procedure hierarchy 4322976
 CONCEPT_REL_STAGE_CLIN_FINDING Clinical Findings hierarchy 441840

The following 2 steps will create table in staging schema, populating concept relationship tree.



1.6.1 Create staging table CONCEPT_REL_STAGE_PROCEDURE, Populate hierarchy based on top "Procedure" concept


 $  sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_build_hierarchy.sql PROCEDURE 4322976



1.6.2 Create staging table CONCEPT_REL_STAGE_CLIN_FINDING, Populate hierarchy based on top "Clinical Finding" concept


 $  sqlplus SNOMED_20120731/myPass@DEV_VOCAB @01_build_hierarchy.sql CLIN_FINDING 441840



1.6.3 Generate unique ancestor-descendant concept pairs from the staged data
    In staging schema, populate CONCEPT_ANCESTOR_STAGE for "Procedure" and "Clinical Finding" hierarchies that reside in CONCEPT_REL_STAGE_PROCEDURE and
    CONCEPT_REL_STAGE_CLIN_FINDING

     $  sqlplus SNOMED_20120131/myPass@DEV_VOCAB @01_build_stage_ancestry.sql PROCEDURE

     $  sqlplus SNOMED_20120131/myPass@DEV_VOCAB @01_build_stage_ancestry.sql CLIN_FINDING



1.6.4 Verify effect of data load on DEV.CONCEPT_ANCESTOR table will have

In this step we will verify that number of active records in staging tables is valid prior to transfer data to  DEV schema. 

a. Number of Records in stage table

b. Number of records in DEV schema (prior load production)


    In most cases  number of records in stage (a) should be greater than number of active records in production (b) 
    In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid.


   
    Execute the following SQL commands:


     $  sqlplus SNOMED_20120131/myPass@DEV_VOCAB



        Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt
      FROM CONCEPT_ANCESTOR_STAGE c
    UNION ALL
    SELECT '- Num Rec in DEV' AS scr, COUNT (8) cnt
          FROM DEV.CONCEPT_ANCESTOR da
          WHERE EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT dca
                WHERE dca.VOCABULARY_ID = 01
                      AND dca.CONCEPT_ID = da.ANCESTOR_CONCEPT_ID)
     AND EXISTS
              (SELECT 1
                 FROM DEV.CONCEPT dcd
                WHERE dcd.VOCABULARY_ID = 01
                      AND dcd.CONCEPT_ID = da.DESCENDANT_CONCEPT_ID)  ;


         Current Result:

- Num Rec in stage                            2,947,403
- Num Rec in DEV                              2,881,194




1.6.5 Based on CONCEPT_ANCESTOR_STAGE we will refresh DEV.CONCEPT table

     sqlplus SNOMED_20130131/myPass@DEV_VOCAB @01_set_concept_class.sql



Feedback - Snomed


------------------------------------------------------------------------------------------------------ END ---------------------------------------------------------------------------------------------------------
ċ
01_build_hierarchy.sql
(4k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_build_stage_ancestry.sql
(6k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_convert_foreign_characters.pl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_create_schema.sql
(10k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_concepts.sql
(5k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_relations.sql
(5k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_prestage_concepts.ctl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_prestage_concepts_short.ctl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_prestage_core.ctl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_prestage_ref.ctl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_prestage_relations.ctl
(1k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_load_to_sct1_concept.ctl
(0k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_set_concept_class.sql
(4k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_transform_row_concepts.sql
(11k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
ċ
01_transform_row_relations.sql
(10k)
Serg Vereshagin,
Sep 25, 2013, 8:44 AM
Comments