As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Importing CSV Files with All Character Data

From sasCommunity
Jump to: navigation, search


As part of the decommissioning effort for sasCommunity.org this article/tip has been migrated to communities.sas.com.

The new home for this article/tip is Importing CSV Files with All Character Data (https://communities.sas.com/t5/SAS-Communities-Library/Importing-CSV-Files-with-All-Character-Data/ta-p/475811)



When using PROC IMPORT to read CSV files, the SCANTYPE=MIXED option is not available. This means that we do not have an option to force variables to be character. The GUESSINGROWS option can be used, but it can be costly for large files, or potentially not sufficient. Although it does require two passes of the data, the following technique can be used to force all columns to be character when using PROC IMPORT on a CSV file.

The technique works by inserting an asterisk ( * ) in each data column, before the file is seen by PROC IMPORT. A potential side benefit is that this allows us to use a smaller value for the GUESSINGROWS option, however remember that the variable length is determined during the scanning process and also depends on GUESSINGROWS.

First Step

Insert a row of asterisks, one for each incoming column, into the data steam. This requires a first pass of the data.

We read the first row which contains the names of the variables. These are counted and a series of asterisks are written on the first data line following the variable names. The asterisk forces PROC IMPORT to see each column as a character variable.

   * HOLDIT is a temporary location;
   filename holdit temp  lrecl=32000;
 
   * Point to the csv file;
   filename rawcsv "YOUR PATH.csv"  lrecl=32000;
 
   data _null_;
      file holdit;
      infile rawcsv end=done;
 
      * Read the first observation (variable names);
      input;
 
      * Write the var names;
      put _infile_;
      * Count the variables (the names are comma separated);
      wcount = countw(_infile_,',');
      * Write an asterisk for each variable - forces each to be character;
      * but will not set the var length;
      * string = catt('*',repeat(',*',wcount-2));
      put string;
 
      * Read and write the data portion of the raw data;
      do until(done);
         input;
         **********
         *  Pre processing of the data could be done here
         **********;
         * Write the record;
         put _infile_;
      end;
      stop;
      run;

Reading the Data

PROC IMPORT is then used to import the modified CSV file into a SAS data set. The GUESSINGROWS option can be set to a small number, and the DATAROW=2 is used to make sure that the row of asterisks is included in the read, but not the variable names.

   * Read the Altered CSV file into a SAS dataset;
   PROC IMPORT OUT= fromcsv 
               DATAFILE= holdit 
               DBMS=CSV 
               REPLACE;
      guessingrows=4;
      GETNAMES=YES;
      DATAROW=2; 
      run;
   * Clear the filerefs for the next data set;
   filename holdit;
   filename rawcsv;

Removing the Asterisk

The first time this new data set is used the asterisks can be removed using the FIRSTOBS option.

   data noasterisk;
      set fromcsv(firstobs=2);
      * Doing other stuff here.;
      run;