Vocabulary_ID=13 The following describes the procedure to update the Race vocabulary with the latest release of the source. The load process is a complete refresh of data. All new vocabulary entieties will be added. All vocabulary entities that do not exist in the latest files set will be marked as deprecated. 1.1 Scripts Transfer All scripts and files listed at the bottom of this page should be transferred to specially created directory (eg /data/Race).
1.2 Download data from source1.2.1 Download data from CMS sourceThe most current international release is located at the following URL: http://www.cdc.gov/phin/library/resources/vocabulary/CDC%20Race%20&%20Ethnicity%20Background%20and%20Purpose.pdf Note: Make sure you perform updates in chronological order. So, if you want to process a January and April release, execute the below steps for January, and after successful completion, for April. Download the latest release file (e.g. Race_Ethnicity_CodeSet.pdf) Latest production zip file is ~1MB in size 1.3 Build Database SchemaAll tables will be located in the schema Race_<DateOfSourceFile> as following: Race_20120131 In addition to the schema, it will create tables to load raw data: RACE_CONCEPTS as well as staging tables: CONCEPT_STAGE, CONCEPT_ANCESTOR_STAGE, CONCEPT_RELATIONSHIP_STAGE, SOURCE_TO_CONCEPT_MAP_STAGE
1.4 Raw data load into Oracle tables1.4.1 Extract data from the pdf file into the csv file in the current directory RACE_CONCEPTS.txt (Latest file size ~1MB) 1.4.3 Load RACE raw data into Oracle. Each sqlldr process will run for about 1min. $ sqlldr RACE_20120918/myPass@DEV_VOCAB control=13_load_to_prestage_concepts.ctl 1.4.5 Verify that files with extension .bad are empty, no records have been rejected. 1.4.6 Copy data and intermediate files to backup area. In this step, downloaded files will be archived to preserve source of the data. Discuss a location of backup area with your system administrator. $ cp -Ru /data/Race/*.zip /data/backup_area ; cp -Ru /data/Race/*.log /data/backup_area 1.4.6 Verify that number of records loaded is equivalent to prior production load $ sqlplus RACE_20120918/myPass@DEV_VOCAB Execute the following SQL commands:
1.5 Loading Staging Tables from raw 1.5.1 Convert and store in staging table concepts
$ sqlplus RACE_20120918/myPass@DEV_VOCAB @13_transform_row_concepts.sql
d. How many active DEV records will be marked for deletion In most cases number of records in stage (a) should be greater than number of active records in production (b) In rare occasion that might not be the case. It is responsibility of the developer running the scripts to verify that that is the the valid. $ sqlplus RACE_20120918/myPass@DEV_VOCAB Execute the following SQL commands:
1.5.3 Load new concepts into DEV schema concept table,mark deprecated concepts as deleted. Valid start date, valid end date and invalid
reason are left empty for now. $ sqlplus RACE_20120918/myPass@DEV_VOCAB @13_load_concepts.sql |
Demographic Domain >