Vocabulary_ID=5
The following describes the procedure to update the HCPCS vocabulary with the latest release of the source.
The
load process is a complete refresh of data. All new vocabulary
entieties will be added. All vocabulary entities that do not exist in
the latest files set will be marked as deprecated.
1.1 Scripts TransferAll scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/HCPCS).
Script Name | Description |
05_create_schema.sql | Create prestage-stage schema with tables and indexes
|
05_transform_row_maps.sql | Convert and store in stage table SOURCE_TO_CONCEPT_MAP_STAGE |
05_load_to_prestage_concepts.ctl | Concepts control file |
05_load_concepts.sql
| Load only new concepts into concept table, add invalid code information |
05_load_maps.sql | Load only new maps into SOURCE_TO_CONCEPT_MAP table, add invalid code information
|
05_transform_row_concepts.sql
| Convert and store in stage table concept_stage |
1.2 Download data from source
1.2.1 Download data from CMS source
The most current international release is located at the following URL: http://www.cms.gov/Medicare/Coding/HCPCSReleaseCodeSets/Alpha-Numeric-HCPCS-Items/CMS1253559.html
Note: Make
sure you perform updates in chronological order. So, if you want to
process a January and April release, execute the below steps for
January, and after successful completion, for April.
Download the latest release file (e.g.12anweb.zip (2012 Alpha-Numeric HCPCS File [ZIP, 808KB] ) )
Transfer downloaded zip file to /data/HCPCS directory.
Latest production zip file is ~1MB in size
1.2.2 Download data from UMLS source
The most current international release is located at the following URL: http://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
Note: Make
sure you perform updates in chronological order. So, if you want to
process a January and April release, execute the below steps for
January, and after successful completion, for April.
Download the latest release file
(e.g. )
Transfer downloaded zip file to /data/HCPCS directory. Latest production zip file is ~4GB in size
Download and extract all UMLS data and zip files to the same directory.
Read
the README file before downloading, which includes information on
installing the UMLS Knowledge Sources and highlights changes for this
release.
Additional documentation is available at http://www.nlm.nih.gov/research/umls.
At
1M bits/second, it will take approximately 4 hours to download the
complete UMLS release. Downloading is advised only if you have a very
fast Internet connection.
1.3 Build Database Schema
All tables will be located in the schema HCPCS_<DateOfSourceFile> as following: HCPCS_20120131
The following steps will create a schema and
will grant to a schema owner appropriate permissions.
In addition to the schema, it will create tables to load raw data: MRCONSO as well as staging tables: CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE,
CONCEPT_RELATIONSHIP_STAGE,
SOURCE_TO_CONCEPT_MAP_STAGE
$ sqlplus System/<SystemPass>@DEV_VOCAB @05_create_schema.sql < HCPCS_Schema> <Pass_HCPCS_Schema>
Format of HCPCS_Schema should be HCPCS_YYYYMMDD (e.g. HCPCS_20120507)
1.4 Raw data load into Oracle tables
1.4.1 Extract files from the zip archive into the current directory
$ unzip -u 12anweb.zip
1.4.2 Verify that the following file(s) have been created:
12anweb.txt (Latest file size ~1MB)
1.4.3 Create consistent file names that are not release dependent.
$ mv 12anweb.txt tXXanweb_V3.txt
1.4.4 Load CMS raw data into Oracle. Each sqlldr process will run for about 1min.
$ sqlldr HCPCS_20120507/myPass@DEV_VOCAB control=05_load_to_prestage_concepts.ctl
1.4.5 Verify that files with extension .bad are empty, no records have been rejected.
1.4.6 Verify that number of records loaded is equivalent to prior production load
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT count(*) FROM txxanweb_V3; |
Current production row count is: 9,439 Rows
1.4.7
Load UMLS raw data into Oracle.
INSTALLING FROM DVD OR HARD DISK
- For best results the DVD drive should be 6X or higher.
- Insert DVD and start MetamorphoSys:
- Windows: The DVD should autorun. If not, or if installing from hard
disk, go to the DVD root directory and click on "run.bat."
Click on "Install UMLS" button.
Each UMLS release includes MetamorphoSys, required to install Knowledge
Sources files, and to create, search and browse customized Metathesaurus
subsets. MetamorphoSys requires a minimum of 30 GB of free hard disk
and takes 2-10 hours to run on a range of platforms tested. The actual
time will depend on your configuration, hardware and operating system
platforms.
MetamorphoSys can generate custom load scripts for Oracle when creating a Metathesaurus subset or installing the Semantic
Network. Instructions for loading a subset of the UMLS Metathesaurus or
Semantic Network into a database, by using the scripts and files created
by MetamorphoSys, are provided below. Some modification of the scripts
created by MetamorphoSys might be needed under special circumstances.
- Create a schema specifically for this data.
- Set tablespace at unlimited, or high enough to ensure the amount of
space available is what is needed by the data. Granting UNLIMITED
TABLESPACE is acceptable as long as "AUTOEXTEND" is turned on to allow
for unlimited table space growth.
- Create synonyms for tables or only access tables using the same login used to create them.
- Do not use the SYS/SYSTEM account to load data into the database
since the default table space is SYSTEM, which is reserved for the data
dictionary.
- Use MTH or another account to insert data into the database; do not
use "AS SYSDBA" to see the tables. Setting the proper grants from the
beginning will solve this problem.
- The loader script must be changed to SCHEMA.TABLE format to insert
data into the proper tables if the SYS/SYSTEM account is used.
- The database must use the UTF-8 character set.
Go to the top-level UMLS directory and click on "populate_oracle_db.bat"
1.4.8 Copy data and intermediate files to backup area. In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator.
$ cp -Ru /data/HCPCS/*.zip /data/backup_area ; cp -Ru /data/HCPCS/*.log /data/backup_area
1.4.9 Verify that number of records loaded is equivalent to prior production load
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT count(*) FROM MRCONSO; |
Current production row count is: 10,810,680 Rows
1.5 Loading Staging Tables from raw
1.5.1 Convert and store in
staging table concepts
Loaded from raw staged data RXNCONSO into the staged data CONCEPT_STAGE
In this step we will perform substitute GUID
identifiers into numerical format.
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB @05_transform_row_concepts.sql
1.5.2 Verify that number of records loaded is equivalent to prior production load
In this step we will verify that number of active records in staging
tables is valid prior to transfer data to historical DEV schema.
a. Number of Records in stage table
b. Number of records in DEV schema not deleted (active, prior load production)
c. How many records would be added to DEV schema table
d. How many active DEV records will be marked for deletion
In most cases number of records in stage (a) should be greater than number of active records in production (b)
In rare occasion that might not be the case. It is responsibility of
the developer running the scripts to verify that that is the the valid.
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 05 UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 05 AND NVL (d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM CONCEPT_STAGE c WHERE C.VOCABULARY_ID = 05 AND NOT EXISTS (SELECT 1 FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 05 AND c.CONCEPT_CODE = D.CONCEPT_CODE) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.CONCEPT d WHERE d.VOCABULARY_ID = 05 AND NVL (d.INVALID_REASON, 'X') <> 'D' AND NOT EXISTS (SELECT 1 FROM CONCEPT_STAGE c WHERE c.VOCABULARY_ID = 05 AND c.CONCEPT_CODE = D.CONCEPT_CODE);
|
Current Result:
- Num Rec in stage 5952
- Num Rec in DEV not deleted 5819
- How many records would be new in DEV added 209
- How many DEV active will be marked for deletion 78
1.5.3 Load new concepts into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid
reason are left empty for now.
Transfer records from CONCEPT_STAGE to DEV.CONCEPT table
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB @05_load_concepts.sql
1.5.4 Convert and store in
staging table maps
Loaded from raw staged data RXNCONSO into the staged data SOURCE_TO_CONCEPT_MAP_STAGE
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB @05_transform_row_maps.sql
1.5.5 Verify that number of records loaded is equivalent to prior production load
In this step we will verify that number of active records in staging
tables is valid prior to transfer data to historical DEV schema.
a. Number of Records in stage table
b. Number of records in DEV schema not deleted (active, prior load production)
c. How many records would be added to DEV schema table
d. How many active DEV records will be marked for deletion
In most cases number of records in stage (a) should be greater than number of active records in production (b)
In rare occasion that might not be the case. It is responsibility of
the developer running the scripts to verify that that is the the valid.
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB
Execute the following SQL commands:
SELECT '- Num Rec in stage' AS scr, COUNT (8) cnt FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (05) UNION ALL SELECT '- Num Rec in DEV not deleted' AS scr, COUNT (8) cnt FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (05) AND D.TARGET_VOCABULARY_ID IN (05) AND NVL (d.INVALID_REASON, 'X') <> 'D' UNION ALL SELECT '- How many records would be new in DEV added' AS scr, COUNT (8) cnt FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (05) AND c.TARGET_VOCABULARY_ID IN (05) AND NOT EXISTS (SELECT 1 FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (05) AND c.SOURCE_CODE = D.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID) UNION ALL SELECT '- How many DEV active will be marked for deletion' AS scr, COUNT (8) cnt FROM DEV.SOURCE_TO_CONCEPT_MAP d WHERE d.SOURCE_VOCABULARY_ID IN (05) AND D.TARGET_VOCABULARY_ID IN (05) AND NVL (d.INVALID_REASON, 'X') <> 'D' AND d.VALID_START_DATE < TO_DATE ( SUBSTR (USER, REGEXP_INSTR (USER, '_[[:digit:]]') + 1, 256), 'YYYYMMDD') AND NOT EXISTS (SELECT 1 FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE c.SOURCE_VOCABULARY_ID IN (05) AND c.SOURCE_CODE = D.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_CONCEPT_ID = c.TARGET_CONCEPT_ID AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID) AND EXISTS (SELECT 1 FROM SOURCE_TO_CONCEPT_MAP_STAGE c WHERE d.SOURCE_CODE = c.SOURCE_CODE AND d.SOURCE_VOCABULARY_ID = c.SOURCE_VOCABULARY_ID AND d.MAPPING_TYPE = c.MAPPING_TYPE AND d.TARGET_VOCABULARY_ID = c.TARGET_VOCABULARY_ID);
|
Current Result:
- Num Rec in stage 6533
- Num Rec in DEV not deleted 6752
- How many records would be new in DEV added 209
- How many DEV active will be marked for deletion 0
1.5.6 Load new maps into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid
reason are left empty for now.
Transfer records from SOURCE_TO_CONCEPT_MAP_STAGE to DEV.SOURCE_TO_CONCEPT_MAP table
$ sqlplus HCPCS_20120507/myPass@DEV_VOCAB @05_load_maps.sql