As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.

Here are some tips for How to share your SAS knowledge with your professional network.

# Difference between revisions of "Driving Distances and Drive Times using SAS and Google Maps"

## STRAIGHT LINE DISTANCE

Version 9.2 of SAS contains new functions that allow a user to compute geodesic distances. The ZIPCITYDISTANCE function uses information from the SASHELP.ZIPCODE data set and computes the distance between zip code centroids for any two zip codes specified by the user. The GEODIST function allows a user to specify two coordinates in terms of latitude and longitude and computes the distance between those coordinates. Both functions use the Vincenty distance formula. ZIPCITYDISTANCE produces distances in miles to one decimal place, while GEODIST will produce distances in either miles or kilometers and is not restricted to only one decimal place.

Prior to Version 9.2, a user had to use a data step and write an equation to compute such distances. The most common method was to use the Haversine formula. The Haversine formula distances are not as accurate as the new Vincenty-based computations. The following is an example that computes the distance between the centroids of two zip codes: 12203, my residence; 20500, the White House in Washington DC.

```data _null_;

* ZIPCITYDISTANCE function;
d1 = zipcitydistance(12203,20500);

* find lat/long of center of 12203;
zip = 12203;
set sashelp.zipcode (rename=(x=long1 y=lat1)) key=zip/unique;

* find lat/long of center of 20500;
zip = 20500;
set sashelp.zipcode (rename=(x=long2 y=lat2)) key=zip/unique;

* GEODIST function;
d2 = geodist(lat1, long1, lat2, long2, 'M');

*
convert latitude and longitude from degrees
to radians for use in Haversine formula
;
d_to_r = constant('pi')/180;
lat1  = lat1  * d_to_r;
long1 = long1 * d_to_r;
lat2  = lat2  * d_to_r;
long2 = long2 * d_to_r;

*
HAVERSINE formula (earth radius of 3949.99 miles
used to produce distance in miles)
;
d3 = 3949.99 * arcos(sin(lat1) * sin(lat2) + cos(lat1) *
cos(lat2) * cos(long2 - long1));

put
"ZIPCITYDISTANCE: " d1 /
"GEODIST        : " d2 /
"HAVERSINE      : " d3;

stop;
run;```

The SAS functions produced the same results, with more "accuracy" in the GEODIST result. The Haversine formula produced a a value close to that of the SAS functions.

```ZIPCITYDISTANCE: 311.1
GEODIST        : 311.08525431
HAVERSINE      : 310.38688606
```

Before proceeding to the next section, please note the following. In the data set SASHELP.ZIPCODE, the variable ZIP is numeric and the ZIPCITYDISTANCE function that uses the that data set set requires either numeric constants or numeric variables as arguments. Also, in the above example, two observations were read from SASHELP.ZIPCODE using KEY= to find an observation that contains data for the specified zip code (a numeric variable). That type of access requires that the data set SASHELP.ZIPCODE be indexed. If it is not you can add an index to that data set using PROC DATASETS.

```proc sort data=sashelp.zipcode;
by zip;
run;

proc datasets lib=sashelp nolist;
modify zipcode;
index create zip;
quit;```

Finally, SASHELP.ZIPCODE is updated quarterly and the latest copy can be found at SAS Maps Online.

## DRIVING DISTANCE AND DRIVE TIME

The three distance estimates computed in the previous section with SAS functions and the Haversine formula are all straight line distances. There are occasions where that type of estimate is what you desire (for example, how far away is my house from a pollution source). There are other occasions where what you want is not the straight line distance but a driving distance. Again, there are instances where the straight line and driving distances will be close, but that is not the usual case. If Google Maps is used to find the distance between zips 12203 and 20050, the value is 376 miles (with an estimated driving time of about 6 hours and 20 minutes).

Given only one combination of coordinates (or in this case, zip codes), entering the values in Google Maps to get the driving distance and drive time is no problem. If one has a large number of coordinates, manual entering of the values on the Google Maps web site might be very time consuming. In that situation, URL Access Method within SAS can be used to access the Google Maps web site multiple times and extract both the driving distance and drive time each time the site is accessed.

The following shows how to access the Google Maps web site once

```* enter two zip codes;
%let z1=12203;
%let z2=20500;

* no changes required below this line;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
time     = scan(text,5,"<>");
file print;
put "DRIVING DISTANCE BETWEEN &z1 AND &z2 : "
distance units" (TIME: " time ")";
stop;
done:
file print;
put "CANNOT FIND THE DISTANCE OR TIME BETWEEN &z1 AND &z2, TRY ANOTHER COMBINATION";
run;

filename x clear;
filename z clear;```

The first FILENAME statement shows the minimum amount of information Google Maps requires to generate a map showing driving directions between two zip codes. There is an "&saddr" in the URL that is part of the web address and not a macro variable. The %NRSTR function prevents SAS from interpreting "&saddr" as the name of a macro variable. The two macro variables &z1 and &z2 are substituted in the web address. The next FILENAME statement creates a FILEREF Z that is used in the next data step (the destination TEMP puts anything written to FILEREF Z into SAS WORK space and the file is deleted when the WORK library is cleared at the end of the SAS session).

The two data steps used to read Google Maps output and then to find the driving distance and time are based on the papers by Rick Langston cited at the end of this posting. The first data step reads the entire contents of the web page that Google Maps produces when it "sees" the URL in the first FILENAME statement. The file is read one byte at a time and each byte is written to the temporary output file defined in the second FILENAME statement. After all the bytes are read, the total number of bytes read is stored in the macro variable &filesize.

The second data step reads the file created by the first data step and treats the entire string of bytes as one (long) record. The "@ string" feature of the INPUT statement is used to search for the word "distance". When that word is found, the next 50 characters of text are read as the variable TEXT and the contents of TEXT are parsed using various functions to find the driving distance (and units since Google Maps can produce either miles or kilometers) and the driving time (for now, as a character variable ... more on that later).

The above code produced the following in the output window of an interactive SAS session.

```DRIVING DISTANCE BETWEEN 12203 AND 20500 : 369 mi  (TIME: 6 hours 50 mins )
```

Now that it has been demonstrated how to compute one distance and time, let's move to the situation of multiple zip codes and multiple uses of Google Maps using SAS. The following is an example that shows how to find the driving distance and drive time from zip 12203 to five zips that all are related in some way to user MSZ03.

```* data set with zip codes (99999 is an invalid zip code);
data zip_info;
input zip @@;
datalines;
02138 13502 99999 20037 94117 12144
;
run;

* place number of zip in a macro variable;
data _null_;
call symputx('nzips',obs);
stop;
set zip_info nobs=obs;
run;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

* create a macro that contains a loop to access Google Maps multiple time;
%macro distance_time;
%do j=1 %to &nzips;
data _null_;
nrec = &j;
set zip_info point=nrec;
call symputx('z2',put(zip,z5.));
stop;
run;

* zip 12203 hard-coded as part of the URL;
filename z temp;

* same technique used in the example with two zip codes;
data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

* drive time as a numeric variable;
data temp;
retain zip &z2;
infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
text     = scan(text,5,"<>");
* convert times to seconds;
select;
* combine days and hours;
when (find(text,'day') ne 0)  time = 86400*input(scan(text,1,' '),best.) +
3600*input(scan(text,3,' '),best.);
* combine hours and minutes;
when (find(text,'hour') ne 0) time = 3600*input(scan(text,1,' '),best.) +
60*input(scan(text,3,' '),best.);
* just minutes;
otherwise                     time = 60*input(scan(text,1,' '),best.);
end;
output;
keep zip distance units time;
stop;
done:
output;
run;

filename x clear;
filename z clear;

* add an observation to the data set DISTANCE_TIME;
proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;

%distance_time;

* add the straight line distance to the data set;
data distance_time;
set distance_time;
zip_city = zipcitydistance(12203,zip);
run;

proc print data=distance_time;
format zip z5. time time6.;
run;```

In the above SAS code, the data step that reads the finds the driving distance and time from the Google Map web page is modified to produce the variable TIME as a numeric variable with a value in seconds. This would allow a SAS-based comparison of drive times between zips. The value 12203 was hard-coded in the URL since we are interested in the distance between that zip code and group of other zips. The results of the SAS code are as follows.

```Obs      zip    distance    units      time    zip_city
1     02138      173.0      mi        3:04      139.4
2     13502       90.7      mi        1:38       76.5
3     99999         .                    .         .
4     20037      374.0      mi        6:48      311.2
5     94117     2931.0      mi       47:00     2558.0
6     12144       11.7      mi        0:16        7.6
```

Notice the difference between the driving distances and the distances produced using ZIPCITYDISTANCE. The values of TIME are hours and minutes (one day and 22 hours from Albany/12203 and San Francisco/94117). A slightly modified version of the above SAS coded was used to produce a similar data set with 1,200+ observations. Using a moderately fast web connection, the job took only a few minutes.

One last item, the Google Map access is not limited to just zip codes.

```* my home;

* someplace in North Carolina;

data _null_;
run;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
time     = scan(text,5,"<>");
file print;
stop;
done:
file print;
run;

filename x clear;
filename z clear;```

One could also find distances between locations defined by address. The above SAS code is similar to that used with zip codes and it produced the following.

```DISTANCE BETWEEN
59 Lenox Ave 12203
SAS Campus Drive 27513
638 MILES
```

If the SAS and Google Maps combination cannot find a distance, you should see the following (notice the zip code used on the second address).

```CANNOT FIND THE DISTANCE BETWEEN
59 Lenox Ave 12203
SAS Campus Drive 99999
```

The following is a suggestion for finding driving distances and times for a number of different pairs of addresses. The ZIPCITYDISTANCE function is also used to show a comparison between driving and straight line distances.

```* data set with IDs and ADDRESSES;
input
@01 id      \$5.
;
datalines;
12345    59 LENOX AVE 12203            1 UNIVERSITY PL 12144
98989    616 COSBY ROAD 13502          59 LENOX AVE 12203
99999    59 Lenox Ave 12203            SAS Campus Drive 27513
87878    370 Frederick St 94117        2129 N St NW 20037
;
run;

* place number of addresses in a macro variable;
data _null_;
stop;
run;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

* use a loop within a macro to access Google Maps multiple time;
%macro distance_time;
data _null_;
nrec = &j;
call symputx('id',id);
stop;
run;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data temp;
id = "&id";

infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
text     = scan(text,5,"<>");
* convert times to seconds;
select;
* combine days and hours;
when (find(text,'day') ne 0)  time = 86400*input(scan(text,1,' '),best.) +
3600*input(scan(text,3,' '),best.);
* combine hours and minutes;
when (find(text,'hour') ne 0) time = 3600*input(scan(text,1,' '),best.) +
60*input(scan(text,3,' '),best.);
* just minutes;
otherwise                     time = 60*input(scan(text,1,' '),best.);
end;
output;
label
id    = 'ID #'
distance = 'DISTANCE (MILES)'
time     = 'TIME (HR:MIN)'
;
stop;
done:
output;
run;

filename x clear;
filename z clear;

proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;
%distance_time;

*
add the distance between ZIP centroids
(should be "in same ballpark" as driving distance)
;
data distance_time;
set distance_time;
* ZIP should be LAST entry in the addresses;
label zip_city = 'ZIPCITYDISTANCE FINCTION';
run;

title "DRIVING DISTANCE AND TIMES";
proc print data=distance_time label noobs;
format time time6.;
run;```

The above macro produced the following results (scroll to right to see ZIPCITYDISTANCE results).

```DRIVING DISTANCE AND TIMES

DISTANCE      TIME      ZIPCITYDISTANCE

12345    59 LENOX AVE 12203        1 UNIVERSITY PL 12144         5.5        0:15             7.6
98989    616 COSBY ROAD 13502      59 LENOX AVE 12203           91.3        1:41            76.5
99999    59 LENOX AVE 12203        SAS CAMPUS DRIVE 27513      638.0       11:35           544.6
87878    370 FREDERICK ST 94117    2129 N ST NW 20037         2817.0       45:00          2441.8
```

Though the variable UNITS was kept in the data set DISTANCE_TIME, it is not shown in the PROC PRINT output given that all addresses are within the US and Google Maps produces all such distances in miles.

If you have locations expressed in terms of latitude and longitude, the following will calculate driving distance and time.

```%let ll1=%str(42.691560,-73.827840);
%let ll2=%str(35.805410,-78.797679);

* no changes required below this line;

filename z temp;

data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

data _null_;
infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
time     = scan(text,5,"<>");
file print;
put "DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : "
distance units" (TIME: " time ")";
stop;
done:
file print;
put "CANNOT FIND THE DRIVING DISTANCE BETWEEN &ll1 AND &ll2 : " /
"TRY ANOTHER PAIR OF COORDINATES";
stop;
run;

filename x clear;
filename z clear;```

The above code produced the following in the output window of an interactive SAS session (if you look at the first example that used two zip codes and look at the calculated driving distance, you will see that the latitude and longitude just could be Albany, NY and Cary, NC).

```DRIVING DISTANCE BETWEEN 42.691560,-73.827840 AND 35.805410,-78.797679 :

644 mi  (TIME: 11 hours 38 mins )
```

A macro version of the above that would find the distance from a specified location to a series of other locations with all locations in terms of latitude and longitude might look as follows. The example uses a random sample of locations from the data set SASHELP.ZIPCODE and finds distances and times to each sample observation from the centroid of zip 12203.

```*
create a data set with locations specified in latitude and longitude
a random sample of 5 observations from SASHELP.ZIPCODE
use SEED=0 to get a new sample each time program is run
;

proc surveyselect data=sashelp.zipcode (keep=zip city statecode x y)
out=lat_long sampsize=5 seed=0;
run;

*
place number of zip in a macro variable
in this example you know it is 5
but you might not know in another use of the SAS code
;
data _null_;
call symputx('nlls',obs);
stop;
set lat_long nobs=obs;
run;

* create a macro that contains a loop to access Google Maps multiple time;

%macro distance_time;

* delete any data set named DISTANCE_TIME that might exist in the WORK library;
proc datasets lib=work nolist;
delete distance_time;
quit;

%do j=1 %to &nlls;
data _null_;
nrec = &j;
set lat_long point=nrec;
call symputx('ll2',catx(',',y,x));
stop;
run;

* lat/long of centroid of zip 12203 hard-coded as part of the URL;
filename x url
filename z temp;

* same technique used in the example with a pair of lat/long coodinates;
data _null_;
infile x recfm=f lrecl=1 end=eof;
file z recfm=f lrecl=1;
input @1 x \$char1.;
put @1 x \$char1.;
if eof;
call symputx('filesize',_n_);
run;

* drive time as a numeric variable;
data temp;
infile z recfm=f lrecl=&filesize. eof=done;
input @ '<div class="altroute-rcol altroute-info">  <span>'  @;
input text \$50.;
distance = input(scan(text,1," "),comma12.);
units    = scan(text,2,"< ");
text     = scan(text,5,"<>");
* convert times to seconds;
select;
* combine days and hours;
when (find(text,'day') ne 0)  time = 86400*input(scan(text,1,' '),best.) +
3600*input(scan(text,3,' '),best.);
* combine hours and minutes;
when (find(text,'hour') ne 0) time = 3600*input(scan(text,1,' '),best.) +
60*input(scan(text,3,' '),best.);
* just minutes;
otherwise                     time = 60*input(scan(text,1,' '),best.);
end;
output;
keep distance time;
stop;
done:
output;
run;

filename x clear;
filename z clear;

* add an observation to the data set DISTANCE_TIME;
proc append base=distance_time data=temp;
run;
%end;
%mend;

* use the macro;
%distance_time;

*
add variables from original data set to new data set distance_time
use geodist function to calculate straight line distance
;
data distance_time;
set distance_time;
set lat_long point=_n_;
straight_line = round(geodist(42.691560,-73.827840,y,x,'DM'), 0.01);
run;

proc print data=distance_time noobs label;
var x y time distance straight_line zip city statecode;
format zip z5. time time6. ;
run;```

The variable labels shown in the output have been propagated to the data set from SASHELP.ZIPCODE.

``` Longitude      Latitude
(degrees) of  (degrees) of
the center    the center                                  The                    Two-letter
(centroid)    (centroid)                     straight_  5-digit   Name of        abbrev. for
of ZIP Code.  of ZIP Code.    time  distance     line    ZIP Code  city/org       state name.

-74.511826     39.522108     4:27     260      221.65    08241    Port Republic      NJ
-89.925360     35.191860    19:13    1187     1007.45    38134    Memphis            TN
-93.806961     45.281139    21:12    1257     1009.35    55588    Monticello         MN
-97.204245     27.869269    32:00    1975     1661.00    78362    Ingleside          TX
-122.146785     44.731480    47:00    2885     2388.54    97342    Detroit            OR
```

If you do not already know about the data set SASHELP.ZIPCODE, you can read about it in the paper ZIP Code 411: A Well-Kept SAS Secret and ZIP Code 411: ZIP Code 411: Decoding SASHELP.ZIPCODE and Other SAS® Maps Online Mysteries.

## WALKING, BIKING, OR DRIVING

There is an additional argument you can supply in any of the FILENAME statments that produce the URLs used to access Google Maps. That argument can be used to switch from driving distance/time to either walking or biking. In the last example, the changes to be made are as follows.

```from ...

to ...

and from ... %macro %distance_time;
to ... %macro distance_time(type);

then ...
walking:  %distance_time(w);
biking:  %distance_time(b);
driving: %distance_time;
```

These distances/times would probably be most helpful when using locations in an urban area.

## PUBLIC TRANSIT TIMES

Several changes have been made to the SAS code posted above that will produce public transportation transit times. Though no Google-produced distance is produced when finding transit times, a straight-line distance is added using the ZIPCITYDISTANCE function. The SAS code that produces the transit times can be downloaded TRANSIT_TIME. As with walking or biking, these times would probably be most helpful when using locations in an urban area. The following shows an example of transit times for three sets of addresses ...

```PUBLIC TRANSIT TIMES
MILES
TIME      ZIPCITYDISTANCE
12345    59 LENOX AVE 12203           1 UNIVERSITY PL 12144       1:28           7.6
23456    61 ACADEMY RD 12208          59 LENOX AVE 12203          0:34           2.9
99999    45 ROCKFELLER PLAZA 10111    116TH ST 10027              0:22           3.9
```

There are potential problems with all of the above SAS code. Finding the desired values of distance and time within the web page produced by Google Maps is based on the current structure of those web pages. This page has been edited several times to take into account changes made in the web page structure that occurred since the various examples were first posted (latest changes on April 12, 2011).

NOTE ... The SGF paper that accompanies the methods shown on this page ("Driving Distances and Times Using SAS® and Google Maps") contains SAS code based on an earlier web page structure and that code no longer works.

The URL used in the FILENAME statement is also the current way of doing things with Google Map. If either or both (i.e. web page content, URL construction) change, one would have to modify the SAS code.

For more about using URL access in SAS, you can read Rick Langston's papers: "Handling Large Stream Files with the @'string' Feature" and "Creating SAS® Data Sets from HTML Table Definitions"

NOTE ... There is a SAS problem note "Problem Note 34098: SAS® is unable to connect to the host when specifying the host name while using the FTP/URL/Socket/EMAIL FILENAME engine" related to using the URL access method. If the code posted here produces an error message about not being able to connect to Google Maps, you may have to apply a HOTFIX to your SAS installation.

If you have any questions about this posting, you are welcome to send me a note by clicking here ... email Mike. To see my other SAS Community postings click here.

This work was funded in part by NIH grant HHSN267200700019C from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.