Numeric representation issues in SAS

From sasCommunity
Jump to: navigation, search
This is a work in progress. You can contribute to this article.


Introduction

Things are not always what they seem with numbers in SAS.

This is not really the fault of SAS. (add more)

Problem areas

(need to fill in details)

  • Comparing calculated non-integer values to constant values
  • Inexact accumulation (special case of the above)
  • Insufficient storage to hold even an integer exactly (e.g. large 16-digit numbers on other than mainframe platforms)
  • Losing precision when storing non-integer values in fewer than 8 bytes
  • Changes in numeric representation or calculated values when moving from one platform to another

Best practices

  • Don't use the LENGTH statement to store a numeric variable in fewer than 8 bytes if it might contain non-integer values.
  • Be careful when using the LENGTH statement to store numeric variables in fewer than 8 bytes, even if all of its values are integers. Think about whether the values that need to be stored may increase at some later time. Also take into account the possibility that your program and data might at some point migrate to a different platform that has different numeric storage characteristics. Check a reference table or use the ??? function to determine the largest integer that your platform can store in a given number of bytes. Finally, consider using the COMPRESS=BINARY data set option rather than the LENGTH statement to reduce storage requirements for numeric variables.
  • Another option to consider is to convert values to integer. For example, currency values can easily be converted to integer by multiplying them by an appropriate factor (e.g., by 100 for USD, CAD, others) and then using the ROUND function.


Largest integer that can be safely stored in a given length, by platform
Length in bytes Mainframe Unix / PC
2 256 N/A
3 65,536 8,192
4 16,777,216 2,097,152
5 4,294,967,296 536,870,912
6 1,099,511,627,776 137,438,953,472
7 281,474,946,710,656 35,184,372,088,832
8 72,057,594,037,927,900 9,007,199,254,740,990

(Source: Tech Report TS-654


  • Be careful when comparing numeric values that may be non-integer. If the values were arrived at by accumulation or through other calculations, they may not be exactly what you expect. Use the ROUND function as necessary to eliminate these effects.
  • Be careful when incrementing the index value in a DO-loop by a non-integer value. You may not get the number of iterations that you expect. Rather than specifying your ending value after TO, you can put the test in a WHILE or UNTIL clause instead, where you can avoid the problem by using ROUND in the comparison.
  • Be careful when using PROC FORMAT to specify ranges for numeric values that are not guaranteed to be integers. Either round the values before applying the format, or use the FUZZ= option in PROC FORMAT.
  • Be aware that there may be (very slight) differences in numeric values when data sets or programs are moved from one platform or another.

References

SAS documentation (9.1.3)

Numeric Precision in SAS Software

Describes how SAS stores numeric values, discusses how to troubleshoot problems relating to floating-point representation, provides instructions on determining how many bytes are needed to store a number accurately, and summarizes problems that may occur when transferring data between operating systems.

The ROUND function

The FORMAT Procedure, Informat and Format Options (including FUZZ=)

Other references

Dealing with Numeric Representation Error in SAS Applications (TS-230)

Very detailed discussion, including numeric representation theory, how numbers are represented in SAS, when you may have problems with representation error or loss of significance, how to display exactly what you have, and different methods for handling the problems.

Numeric Precision 101 (TS-654)

Shorter than TS-230. Somewhat less discussion of numeric theory, less formal tone, a number of examples. Suggests things to keep in mind, includes short section on moving data between operating systems.

Quality Control With SAS® Numeric Data (NESUG 16 paper by Paul Gorrell)

Contains an excellent summary of how SAS stores numeric data and what issues you can run into, in an overall quality control context.

Numeric Data in SAS®: Guidelines for Storage and Display (NESUG 15 paper by Paul Gorrell)

Numeric Length: Concepts and Consequences (NESUG 20 paper by Paul Gorrell)