As the first step in the decommissioning of sasCommunity.org the site has been converted to read-only mode.


Here are some tips for How to share your SAS knowledge with your professional network.


Creating Manhattan Plots with SAS/Graph

From sasCommunity
Jump to: navigation, search

A Manhattan plot is a type of bar chart that is sometimes used in the visualization of genome-wide association data. Some examples can be found by using Google to search for "Manhattan plots" then clicking on "Images" on the upper left-hand side of the search results. You can see Manhattan plots used in publications by clicking on either of the following links: example 1 or example 2.

A Google search of "Manhattan plots" turns up several references to creating such plots with R but no references to creating them with SAS. Likewise, a search on "Manhattan plots" at Lex Jansen's "one-stop shopping" source for SAS papers finds no papers with SAS code that produce the type of plot shown in the published examples cited above.

The original idea for the SAS code shown below first appeared on the SAS-L listserv and was posted by Kevin Viel. That code included the data step to create the "fake data" used to generate the Manhattan plot. Several modifications have been made to the original SAS code and the following can be used to produce plots with several different color schemes.

The most recent modification of the SAS code uses the NOFRAME option in GPLOT and a STYLE option in an AXIS statement to suppress the drawing of various lines that surround the plot. Those lines are added using ANNOTATE. This modification removes white space that occurs between the last 'vertical bar' and the end of the x-axis when PROC GPLOT is left to chose the x-axis maximum value.

* macro that can be used later to generate symbols for plots with two alternating colors;
%macro twocolors(c1,c2);
%do j=1 %to 21 %by 2;
symbol&j v=dot c=&c1;
symbol%eval(&j+1) v=dot c=&c2;
%end;
%mend;
 
* some fake data;
data manhattan ;
bp=1; 
do c=1 to 22 ;
do _n_=1 to ( 1e6 - c * 10000 ) - 1 by 1000 ;
   bp + _n_ / 1e6 ;
   logp = -log( ranuni(2)) ;
   output ;
end;
end;
run;
 
* find maximum value for the x-axis, store in a macro variable;
proc sql noprint;
select 1.005*ceil(max(bp)) into :maxbp 
from manhattan;
quit;
 
* 
find mean of BP within each chromosome (C)
used later to position x-axis labels
;
proc summary data=manhattan nway;
class c;
var bp ;
output out=mbp mean=;
run;
 
* 
annotate data set used to add x-axis labels
"manually" add the frame around the graph
possibly add a horizontal reference line
;
data anno ;
retain position '8' xsys ysys '2' y 0 function 'label' text 'xx';
do until (last1);
   set mbp (keep = bp c) end=last1;;
   x = round(bp) ;
   text = cat(c);
   output;
end;
 
* top of frame;
xsys = '1'; ysys = '1'; function = 'move'; x = 0; y=100; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
* bottom of frame;
xsys = '1'; ysys = '1'; function = 'move'; x = 0; y=0; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
 
* horizontal reference line (if needed);
xsys = '1'; ysys = '2'; function = 'move'; x = 0; y=4; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
run;
 
* reset all then set some graphics options;
goptions reset=all ftext='calibri' htext=2 gunit=pct 
         dev=gif xpixels=2000 ypixels=1500 gsfname=gout;
 
* make some room around the plot (white space);
title1 ls=2;
title2 a=90 ls=2;
title3 a=-90 ls=2;
footnote1 ls=2;
 
* let SAS choose the colors;
symbol1 v=dot r=22;
 
* two alternating colors;
* gray-scale;
*%twocolors(gray33,graycc);
* blue and blue-green;
*%twocolors(cx2C7FB8,cx7FCDBB);
 
* suppress drawing of any x-axis feature;
axis1 value=none major=none minor=none label=none style=0;
* rotate y-axis label;
axis2 label=(angle=90 "-Log10(p)");
 
* destination for the plot;
filename gout 'z:\manhattan1.gif';
 
* use PROC GPLOT to create the plot;
proc gplot data=manhattan ;
plot logp*bp=c / haxis = axis1
                 vaxis = axis2
                 href  = &maxbp
                 annotate = anno
                 nolegend
		 noframe
;
run;
quit;

The values used in the macro for the two-alternating color schemes were found at the Color Brewer web site. Three plots created using the above SAS code can be found using the following links: SAS-chosen colors; two-alternating gray scale colors; two-alternating colors, blue/blue-green.

A variation on the above SAS code labels "low probability" points. The following shows the modifications made to the above SAS code.

<placed after first data step>
* add some fake info to the data set (SNP name and a p-value);
data manhattan;
set manhattan;
snp_name = cats('rs',_n_);
if ranuni(0) lt .0005 then p_value = 10e-6;
else p_value = 0.1;
run;
 
<modified data step to create the annotate data set>
* 
annotate data set used to add x-axis labels
"manually" add the frame around the graph
possibly add a horizontal reference line
add labels to selected points;
;
data anno ;
length color $8 text $25;
retain position '8' xsys ysys '2' y 0 function 'label' when 'a';
do until (last1);
   set mbp (keep = bp c) end=last1;;
   x = round(bp) ;
   text = cat(c);
   output;
end;
 
* top of frame;
xsys = '1'; ysys = '1'; function = 'move'; x = 0; y=100; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
* bottom of frame;
xsys = '1'; ysys = '1'; function = 'move'; x = 0; y=0; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
 
* horizontal reference line (if needed);
xsys = '1'; ysys = '2'; function = 'move'; x = 0; y=4; output;
xsys = '2'; function = 'draw'; x = &maxbp ; output;
 
* this portion adds labels for points with p_value le 10e-6;
function = 'label';
hsys = '3';
size = 1.5;
position = '5';
cbox = 'white';
color = 'blue';
do until (last2);
   set manhattan end=last2;
   where p_value le 10e-6;
   x = bp;
   y = logp;
   text = snp_name;
   output;
end;
 
run;

Addition of the above modifications produced the following: Manhattan plot with some labeled points. If you try the above code that labels points, you will not get the same set of labeled points as shown in the example since the SAS code uses a seed of zero for the RANUNI function in the data step that adds the low p-value to randomly-selected observations.

If you have any questions about this posting, you are welcome to send me a note by clicking here ... email Mike. To see my other SAS Community postings click here.