Macro method to use Google Maps™ and SAS® to find the shortest driving and straight line distances between 2 addresses in the United States

From sasCommunity
Jump to: navigation, search

Abstract

Google Maps™ is a very useful tool in finding driving distances between addresses and defining a geographic representation of an address which can be used to find straight line distances between two addresses. Often distances between addresses and geocoding for locations can be useful in research. Using macro and data step SAS® code, the shortest driving distance between two addresses can be found by searching the html code resulting from a Google Maps™ driving directions search between two addresses in the United States. In addition, a search of the html code resulting from a singular address Google Maps™ search, will enable one to define the latitude and longitude for a location. This will allow use of the SAS® GEODIST function to find the straight line distance between locations. Partial addresses are taken into consideration with this method, which was developed using SAS® v9.3 for Windows®.


Online Materials

View the pdf for Macro method to use Google Maps™ and SAS® to find the shortest driving and straight line distances between 2 addresses in the United States.

MidWest SAS User group presentation for "Macro method to use Google Maps™ and SAS® to find the shortest driving and straight line distances between 2 addresses in the United States" MWSUG 2016 Presentation.

SAS Program (zip file) GoogleAndSAS.
Spreadsheet with Address pairs (zip file) Addresses.

Full Code:

*Define macro to clear work location when program is run;
%macro ClearWk();
	proc datasets lib=work nolist kill;
	quit;
run;
%mend;
 
*Data directory;
%let FinData=<Data Location>;
libname FinData "&FinData";
 
*Clear 'Work';
%ClearWk();
 
*Import Raw data/create calculated variables;
proc import out=AddressX
			datafile="&FinData\Addresses.xlsx"
			dbms=excel replace;
			range="Addresses$";
			getnames=yes;
			mixed=yes;
			scantext=yes;
			usedate=yes;
			scantime=yes;
run;
 
/*Adjust format of address to be Google Maps™ friendly*/
data Address;
	retain subjid Address1 Address2;
	set AddressX nobs=obsnum;
	length Address1 Address2 $200.;
	format Address1 Address2 $200.;
	informat Address1 Address2 $200.;
*If just starting Street Address is complete;
if Address^='' and City1^='' and State1^='' and Zip1^=. then Address1=tranwrd(left(trim(Address))," ","+")||"+"||
	tranwrd(left(trim(City1))," ","+")||","||left(trim(State1))||"+"||left(strip(put(Zip1,best12.)));
*If just starting City is missing;
else if Address^='' and State1^='' and Zip1^=. then Address1=tranwrd(left(trim(Address))," ","+")||"+"||left(trim(State1))||
	"+"||left(strip(put(Zip1,best12.)));
*If just starting State is missing;
else if Address^='' and City1^='' and Zip1^=. then Address1=tranwrd(left(trim(Address))," ","+")||"+"||
	tranwrd(left(trim(City1))," ","+")||"+"||left(strip(put(Zip1,best12.)));
*If just starting Zip Code is missing;
else if Address^='' and City1^='' and State1^='' then Address1=tranwrd(left(trim(Address))," ","+")||"+"||
	tranwrd(left(trim(City1))," ","+")||","||left(trim(State1));
*If just starting Street Address is missing;
else if City1^='' and State1^='' and Zip1^=. then Address1=tranwrd(left(trim(City1))," ","+")||","||left(trim(State1))||
	"+"||left(strip(put(Zip1,best12.)));
*If starting State and Zip Code is missing;
else if Address^='' and City1^='' then Address1=tranwrd(left(trim(Address))," ","+")||"+"||
	tranwrd(left(trim(City1))," ","+");
*If starting City and Zip Code are missing;
else if Address^='' and State1^='' then Address1=tranwrd(left(trim(Address))," ","+")||"+"||left(trim(State1));
*If starting City and State are missing;
else if Address^='' and Zip1^=. then Address1=tranwrd(left(trim(Address))," ","+")||"+"||left(strip(put(Zip1,best12.)));
*If starting Street Address and State is missing;
else if City1^='' and Zip1^=. then Address1=tranwrd(left(trim(City1))," ","+")||"+"||left(strip(put(Zip1,best12.)));
*If starting Street Address and Zip Code are missing;
else if City1^='' and State1^='' then Address1=tranwrd(left(trim(City1))," ","+")||","||left(trim(State1));
*If starting Street Address and City are missing;
else if State1^='' and Zip1^=. then Address1=left(trim(State1))||"+"||left(strip(put(Zip1,best12.)));
*If starting Street Address, City, and State are missing;
else if Zip1^=. then Address1=left(strip(put(Zip1,best12.)));
*If starting City, State, and Zip Code are missing;
else if Address^='' then Address1=tranwrd(left(trim(Address))," ","+");
*If starting Street Address, State, and Zip Code are missing;
else if City1^='' then Address1=tranwrd(left(trim(City1))," ","+");
*If starting Street Address, City, and Zip Code are missing;
else if State1^='' then Address1=left(trim(State1));
*If just ending Street Address is complete;
if Hospital_Address^='' and HCity^='' and HState^='' and HZip^=. then Address2=tranwrd(left(trim(Hospital_Address)),
	" ","+")||"+"||tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState))||"+"||left(strip(put(HZip,best12.)));
*If just ending City is missing;
else if Hospital_Address^='' and HState^='' and HZip^=. then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
	"+"||left(trim(HState))||"+"||left(strip(put(HZip,best12.)));
*If just ending State is missing;
else if Hospital_Address^='' and HCity^='' and HZip^=. then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
	",+"||tranwrd(left(trim(HCity))," ","+")||"+"||left(strip(put(HZip,best12.)));
*If just ending Zip Code is missing;
else if Hospital_Address^='' and HCity^='' and HState^='' then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||
	",+"||tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState));
*If just ending Street Address is missing;
else if HCity^='' and HState^='' and HZip^=. then Address2=tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState))||
	"+"||left(strip(put(HZip,best12.)));
*If ending State and Zip Code is missing;
else if Hospital_Address^='' and HCity^='' then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
	tranwrd(left(trim(HCity))," ","+");
*If ending City and Zip Code are missing;
else if Hospital_Address^='' and HState^='' then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
	left(trim(HState));
*If ending City and State are missing;
else if Hospital_Address^='' and HZip^=. then Address2=tranwrd(left(trim(Hospital_Address))," ","+")||"+"||
	left(strip(put(HZip,best12.)));
*If ending Street Address and State is missing;
else if HCity^='' and HZip^=. then Address2=tranwrd(left(trim(HCity))," ","+")||"+"||left(strip(put(HZip,best12.)));
*If ending Street Address and Zip Code are missing;
else if HCity^='' and HState^='' then Address2=tranwrd(left(trim(HCity))," ","+")||","||left(trim(HState));
*If ending Street Address and City are missing;
else if HState^='' and HZip^=. then Address2=left(trim(HState))||"+"||left(strip(put(HZip,best12.)));
*If ending Street Address, City, and State are missing;
else if HZip^=. then Address2=left(strip(put(HZip,best12.)));
*If ending City, State, and Zip Code are missing;
else if Hospital_Address^='' then Address2=tranwrd(left(trim(Hospital_Address))," ","+");
*If ending Street Address, State, and Zip Code are missing;
else if HCity^='' then Address2=tranwrd(left(trim(HCity))," ","+");
*If ending Street Address, City, and Zip Code are missing;
else if HState^='' then Address2=left(trim(HState));
*Assign macro variable obsnum=number of observations;
call symput('obsnum',obsnum);
run;
 
/*Find shortest driving distance for each address combination*/
*Create dataset to append all distances;
data drdist;
run;
 
%macro ShortDrDist(j=);
	*Create loop to access each observation;
	%do i=1 %to &j;
 
	*Assign macro variables needed (Addresses and Subject ID);
	data _null_;
		set Address;
	if _n_=&i then do;
		call symput('addr1',trim(strip(Address1)));
		call symput('addr2',trim(strip(Address2)));
		call symput('Subjid',Subjid);
	end;
	run;
 
		/*Find Road distances*/
		*Define Google Maps url with starting and ending addresses;
		filename gmaps url "http://maps.google.com/maps?daddr=&addr1.%nrstr(&saddr)=&addr2";
		*Find all driving distances Google provides;
		data drdistX&i (drop=sitecode idrivdist rdstart);
			infile gmaps recfm=f lrecl=1000 end=eof;
			input sitecode $1000.;
		Subjid="&subjid";
		AddStart=tranwrd(tranwrd("&addr1","+"," "),",",", ");
		AddEnd=tranwrd(tranwrd("&addr2","+"," "),",",", ");
		*Search html code for the word miles, keep only those entries;
		if find(sitecode,'miles');
		*Find column where word miles starts;
		idrivdist=find(sitecode,'miles');
		*Define length of substring to keep, ending right before miles start;
		rdstart=idrivdist-1;
		*Scan the substring ending right before miles and keep the numeric value right before miles;
		drivdist=input(compress(scan(substr(sitecode,1,rdstart),1,' "','bc'),','),best12.);
		run;
		*Sort results by distance;
		proc sort data=drdistX&i; by drivdist; run;
		*Keep observation with the shortest distance;
		data drdist&i;
			set drdistX&i;
			by drivdist;
		if _n_=1;
		run;
		*Add observation to final driving distances dataset;
		data drdist;
			set drdist
				drdist&i;
		if drivdist^=.;
		label AddStart='Starting Address'
			  AddEnd='Ending Address'
			  drivdist='Shortest Driving Distance (miles)'
			  Subjid='Subject Id';
		run;
	*Disassociate current assigned gmaps;
	filename gmaps clear;
	%end;
%mend;
 
%ShortDrDist(j=&obsnum);
 
/*Find straightline distance for each address combination*/
*Create datasets to append latitudes and longitudes;
data sldist1;
run;
data sldist2;
run;
%macro StrLineDist(j=);
	*Create loop to access each observation;
	%do i=1 %to &j;
		*Assign macro variables needed (Addresses and Subject ID);
		data _null_;
			set Address;
		if _n_=&i then do;
			call symput('addr1',trim(strip(Address1)));
			call symput('addr2',trim(strip(Address2)));
			call symput('Subjid',Subjid);
		end;
		run;
 
		/*Find Lattitude and longitude in sitecode*/
		*Define Google Maps url with starting addresses;
		filename gmaps url "http://maps.google.com/maps?daddr=&addr1";
		*Define sitecode as 1000 character parts of location html for starting address;
		data sldist1&i (drop=sitecode markst);
			infile gmaps recfm=f lrecl=1000 end=eof;
			input sitecode $1000.;
		Subjid="&subjid";
		AddStart=tranwrd(tranwrd("&addr1","+"," "),",",", ");
		*Find text proceeding latitude and logitude in html code for starting address;
		if find(sitecode,'",[null,null,') then do;
			*Assign variable to starting location of procedding text for starting address;
			markst=find(sitecode,'",[null,null,');
			*Scan substring starting with proceeding text for starting address latitude and longitude;
			lat1=input(scan(substr(sitecode,markst),4,',]'),best12.);
			long1=input(scan(substr(sitecode,markst),5,',]'),best12.);
			output;
		end;
		run;
		*Append observation to dataset of starting address lat/long;
		data sldist1;
			set sldist1
				sldist1&i;
		if lat1^=.;
		label AddStart='Starting Address'
			  lat1='Starting Address latitude'
			  long1='Starting Address longitude'
			  Subjid='Subject Id';
		run;
		proc sort data=sldist1; by Subjid; run;
		*Disassociate current assigned gmaps;
		filename gmaps clear;
		*Define Google Maps url with ending addresses;
		filename gmaps url "http://maps.google.com/maps?daddr=&addr2";
		*Define sitecode as 1000 character parts of location html for ending address;
		data sldist2&i (drop=sitecode markst);
			infile gmaps recfm=f lrecl=1000 end=eof;
			input sitecode $1000.;
		Subjid="&subjid";
		AddEnd=tranwrd(tranwrd("&addr2","+"," "),",",", ");
		*Find text proceeding latitude and logitude in html code for ending address;
		if find(sitecode,'",[null,null,') then do;
			*Assign variable to starting location of procedding text for ending address ;
			markst=find(sitecode,'",[null,null,');
			*Scan substring starting with proceeding text for ending address latitude and longitude;
			lat2=input(scan(substr(sitecode,markst),4,',]'),best12.);
			long2=input(scan(substr(sitecode,markst),5,',]'),best12.);
			output;
		end;
		run;
		*Append observation to dataset of ending address lat/longs;
		data sldist2;
			set sldist2
				sldist2&i;
		if lat2^=.;
		label AddEnd='Ending Address'
			  lat2='Ending Address latitude'
			  long2='Ending Address longitude'
			  Subjid='Subject Id';
		run;
		proc sort data=sldist2; by Subjid; run;
		*Disassociate current assigned gmaps;
		filename gmaps clear;
	%end;
%mend;
 
%StrLineDist(j=&obsnum);
 
*Merge strting and ending dataset by subjid;
data sldist;
	merge sldist1
	  	  sldist2;
	by Subjid; 
*Calculate straight line distance between starting and ending addessses using latitude, longitude, and SAS geodist 
	function;
sldistg=geodist(lat1,long1,lat2,long2,'M');
*Calculate straight line distance between starting and ending addessses using latitude, longitude, and Haversine
	formula ();
sldisth=3949.99*arcos(sin(lat1*(constant('pi')/180))*sin(lat2*(constant('pi')/180))+cos(lat1*(constant('pi')/180))
		*cos(lat2*(constant('pi')/180))*cos(long2*(constant('pi')/180)-long1*(constant('pi')/180)));
label sldistg='SAS GEODIST function distance (miles)'
	  sldisth='Haversine formula distance (miles)';
run;
 
*Merge both distance variables into one dataset;
data dist;
	retain subjid AddStart lat1 long1 AddEnd lat2 long2 drivdist sldistg sldisth;
	merge sldist (drop=AddEnd AddStart)
		  drdist;
	by subjid;
run;
proc sort data=dist out=FinData.GoogleAndSAS; by subjid; run;

Contact Info

Please check out my Laurie Bishop. You can also email.