17
Apr

Factor Analysis Tips: Unexpected Things I Learned at SAS Global Forum

Are you still re-ordering your factor pattern by sorting columns in Excel? Well, do I have a tip or two for you.

The cool thing about some large conferences is that even the things you hadn’t planned on attending can be worth while. For example, during one time slot, I didn’t have anything particular scheduled and Diane Suhr was doing a talk on factor analysis and cluster analysis. Now, I published my first paper on factor analysis in 1990, so I was mostly interested in the cluster analysis part.

After all of those years, how did I not know that PROC FACTOR had an option to flag factor loadings over a certain value? Somehow, I missed that, can you believe it?

I also missed the REORDER option that reorders the variables in the output from largest to smallest on their loading on the first factor, then in order of their loading on the second factor and so on.

It’s super-simple. Use FLAG = value  to flag loadings and REORDER to reorder them, like so.

proc factor data=principal n=3 rotate=varimax scree FLAG=.35 REORDER ;
var X1 x2 x3 x4;

You can see the results below. With a small number of variables like this example, it doesn’t make much difference but in an analyses with 40 or 50 variables this can make it much easier to identify patterns in your data.

output with reordered factors


I am backwards woman. I write about statistics and statistical software in my spare time and my day job is making video games. In my defense, the latest series of those games teaches statistics – in Spanish and English.

Aztech Games

9
Apr

SAS Global Forum started out as planned …

The first time I went to SAS Global Forum, over 30 years ago, it was actually called SUGI (SAS Users Group International) and it was in Reno, NV. I was a just-divorced single mom and there was no such thing as a Working Mothers Room (which I noticed signs for here in Denver). I paid for a bonded sitter, on contract with the hotel, to come to my room and watch my toddler. That toddler is now CEO of 7 Generation Games. So, yeah, it’s been a minute.

Having been to these events over 30 years, not to mention a dozen or so at WUSS (Western Users of SAS Software) I thought I might need to put some effort into learning new stuff. My plan was to pick one product that I wanted to learn more about and make my own little personal strand on that. I picked SAS Enterprise Miner. I hadn’t used it a lot, and not at all lately, and I thought it might be a good choice to introduce students to a more data mining – a topic I just touch on in my multivariate statistics course.

The first session was 10 Tips Learned in 20 Years of Enterprise Miner, by Melodie Rush. Did you realize that the nodes in EM are in alphabetical order? No, me neither. I also didn’t know that the Reporter node could automatically generate documentation. If you are registered for the conference, you can download the presentation from the app, even if you didn’t attend.

There wasn’t another Enterprise Miner presentation in the morning, so I wandered over to The Quad and talked to Tom Grant in SAS Global Academic who told me that now you can download a file tiny little 26kb file and run SAS Enterprise Miner on the SAS server, whether you use Windows or Mac. I remembered something like this years ago but it was deathly slow and it sucked. Your other option was to install SAS EM on your desktop which did not exactly require sacrificing a goat, taking your computer apart and putting it back together with each piece bathed in goat’s blood – but it wasn’t all that much easier.

Well, times have changed !  I already had a SAS On-demand for Academics account, I clicked to get Enterprise Miner. A file called main.jnlp downloaded and when I double-clicked my Mac said it was from an unidentified developer – so I went into the preferences and selected to open anyway.

Then, I got a message my version of Java was out of date. I clicked to update it and was directed to download and update it.

Did that, clicked on the main.jnlp again and will you look at that …

 

SAS Enterprise Miner

The whole process took less than five minutes …

leaving me time to head over to the convention center and see what Scott Leslie and Tricia Aanderude have to say about health outcomes and visual analytics.

How fast does the EM in the cloud run, you ask? Well, I am in a hotel where the wi-fi is about the same as my apartment in Santiago – that is, somewhere mid-way between Santa Monica and North Dakota speed. It runs fine. I can see using it as a demo in a class or making instructional videos with it. Screens don’t pop up as fast as if it was a regular web page but so far the minimal delay is not enough to be annoying to students using it for analyses or teachers using it to demonstrate.

So far, today’s Enterprise Miner strand plan was a success , however, after that, things definitely did not go according to plan, but still great. I’ll have more on that in my next post.

Speaking of not according to plan … I’m giving a presentation at SAS Global Forum at 11 am , Tuesday April 10 in room 207. I’ll talk about the connections between SAS and building games with JavaScript, how I got from Santa Monica, California to Santiago, Chile and where SAS can take you in the most unexpected ways.

 

26
Feb

Whipping your data into shape with SAS : Day 2 Fixing Errors & Identifying Input Datasets

Last post, we happily uploaded our data, read it into SAS using a combination of SAS utilities and coding, decided all was lovely and used this code to concatenate the 4 datasets.

DATA allplants ;
set import1 – import4 ;

IF you get an error at this point, what should you do?

Let’s say you get the error below?

118
119 Data allplants ;
120 set import1 – import4 ;
ERROR: Variable Finance_Commission___Interest_Co has been defined as both character and numeric.
121 run ;

This is one of those examples where you can be too clever. We aren’t going to use this variable in the analysis so let’s just drop it. Ask yourself, do I need this variable? If the answer is , as in this case, no you don’t, just drop it.

  • The (drop =) after the dataset name will drop the variable you list.
  • The (in = a)  creates a temporary variable, a, that is true of the record comes from the dataset import1 and false otherwise.
  • Since both options go in parentheses after the data set name you include both of these in the same set of parentheses.
  • Now that you have the variables denoting the source dataset , you can use those in IF-THEN-ELSE statements like any other variable.

Data allplants ;
set import1 (drop =Finance_Commission___Interest_Co in=a)
import2 (drop =Finance_Commission___Interest_Co in=b)
import3 (drop =Finance_Commission___Interest_Co in=c)
import4 (drop =Finance_Commission___Interest_Co in = d);
if a then group = “student” ;
else if b then group = “control” ;
else if c then group = “devloper” ;
else if d then group = “testcase” ;
run ;

Now we’ve dropped the troublesome variable and have a group variable based on the source.

So, this code SEEMS like it should work and the data are all good. We look at the log and see no errors, but maybe we should take some more steps just to be safe.

 

14-year-olds

What would those be? Let’s think about this?


If you’d like a whole lot easier statistics and to take a brief break from maturity while learning about Latin American history and culture, check out AzTech: The Story Begins

Aztech Games

24
Feb

Whipping your data into shape with SAS : Part 1 for Today

I’m sure I’ve written about this before – after all, I’ve been writing this blog for 10 years – but here’s something I’ve been thinking about:

Most students don’t graduate with nearly enough experience with real data.

You can use government websites with de-identified data from surveys, and I do, but I teach primarily engineering and business students so it would be helpful to have some business data, too. Unfortunately, businesses aren’t lining up to hand me their financial, inventory and manufacturing data (bunch of jerks!)

So, I downloaded this free app, Medica Scientific from the app store and ran a simulation of data for a medical device company. Some friends did the same and this gave me 4 data sets, as if from 4 different companies.

Now, that I have 4 Excel files with the data, before you get to uploading the file, I’m going to give you a tip. By default, SAS is going to import the first worksheet. So, move the worksheet you want to be first. In this case, it’s a worksheet named “Financials”. Since SAS will use the first worksheet, it could just as well be named “A whale ate my sandwich”, but it wouldn’t be as obvious.

While you are at it, take a look at the data, variable names in the first row.  ALWAYS give your data at least a cursory glance. If it is millions of records, opening the file isn’t feasible and we cover other ‘quick looks’ in class.

1. Upload the file into the desired directory
2. Under Tasks and Utilities select Utilities and then Import Data
3. Click select file and then navigate to the folder where your file is and click open
4. You’ll see a bunch of code but nothing actually happens until you click on the little running guy.

menus to select data to import

First select the data set

the import data window

Have you clicked the running guy? Good!

 

Okay, now you have your code. Not only has SAS imported your data file into SAS, it’s also written the code for you.

FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX OUT=WORK.IMPORT1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT1;
RUN;

Now, if you had a nice professor who only gave you one data set, you would be done, which is why I showed you the easy way to do it.

However, very often, we want to compare several factories or departments or whatever it is.

Also, life comes with problems. Sigh.

One of your problems, which you’d notice if you opened the data set is that the variables have names like “Simulation Day” .  I don’t want spaces in my variable names.

My second problem is that I need to upload all of my files and concatenate them so I have one long file.

Let’s attack both of these at once. First, upload the rest of your files.

Now,  open a new SAS program and at the top of your file, put this:

OPTION VALIDVARNAME=V7 ;

It will make life easier in general if your variable names don’t have spaces in them. The option above automatically recodes the variables to valid variable names without spaces.

Now, to import the next 3 files, just create a new SAS program and copy and paste the code created by your IMPORT procedure  FOUR TIMES (yes, four).

From Captain Obvious:

Captain Obvious wearing her obvious hat

Although you’d think this would be obvious, experience has shown that I need to say it.

  • Do NOT copy the code in this blog post. Copy the code produced by your own IMPORT procedure, it will have your own directory name.
  • Do NOT name every output data set IMPORT1 because if you do, each step will replace the data set and you will end up with one dataset and be sad.

Since I want to replace the first file, I’m going to need to add the REPLACE option in the first PROC IMPORT statement.

OPTION VALIDVARNAME=V7 ;

FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX
REPLACE
OUT=WORK.IMPORT1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT1;
RUN;

FILENAME REFFILE '/home/annmaria/examples/simulation/Tech2Demo2.xlsx';
PROC IMPORT DATAFILE=REFFILEDBMS=XLSX
REPLACE OUT=WORK.IMPORT2;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2;
RUN;

Do that two more times for the last two datasets

Did you need to do the utility? Couldn’t you just have done the code from the beginning? Yes. I just wanted to show you that the utility existed. If you only had one file and it had valid filenames, which is a very common situation, you would be done at that point.

In a real-life scenario, you would want to merge all of these into one file so you could compare clinics, plants, whatever. Super easy.

[IF you have write access to a directory, you could create a permanent dataset here using a LIBNAME statement, but I’m going to assume that you are a student and you do not. The default is to write to the working directory. ] ;

DATA allplants ;
set import1 - import4 ;

IF you get an error at this point, what should you do?

There are a few different answers to that question and I will answer them in my next post.

SUPPORT MY DAY JOB . IT’S FUN AND FREE!
YOU CAN DOWNLOAD A SPIRIT LAKE DEMO FOR YOUR WINDOWS COMPUTER FROM THE MICROSOFT STORE

16
Feb

How SAS Helped Me Make Our Best-Selling Educational Game: Part 2

Last time, I gave a bit about the requirements of a game to match the most synonyms in one minute, and how what I learned using SAS was a basis for several parts of the game. This activity is going into Making Camp Premium, which will be a paid version of our best-selling game, Making Camp Ojibwe. I don’t know if you can call it best-selling because you can download it for free, and Spirit Lake has been around longer so has more players, but Making Camp gets more new downloads each month than any of our other games. This is surprising since the game is written in JavaScript and we have other games made with Unity that have way cooler effects. Just goes to show you can’t predict perfectly what kids will like.

While you are waiting for me to finish this game, head to the app store and get Making Camp Ojibwe , free, for your iPad.

Now, back to the synonyms game.  We’d finished the timer, which, when it ended, showed your title points and a happy or sad image.

Okay, this first part is boring, just initializing a bunch of variables I will use later.

var thisone = 0;
var boxmove = 0;
var thisel;
var thesepts = 0;
var question ;
var correct = 0 ;


// This is the array of words. The first is displayed as the word to match ;
// The next three words are synonyms and the last four words are incorrect answers ;

var words = [
  ["large", "big","enormous","gigantic","awkward","introspective","sane", "bulbous"],
    ["fast", "rapid","quick","speedy","awkward","boring","dull", "bulbous"],
    ["fat", "stout", "thick", "overweight", "thin", "unprofitable", "sense", "dazzling"],
    ["bad", "terrible", "not good", "awful", "couch", "sad", "ugly", "usual"],
    ["angry", "mad", "furious", "livid", "happy", "simple", "connected", "personal"],
    ["tale", "story", "fable", "yarn", "hind leg", "hippo", "newspaper", "earnest"],
    ["little", "small", "tiny", "itty bitty", "large", "thoughtless", "sleek", "perturbed"],
    ["strange", "odd", "queer", "weird", "couch", "sad", "ugly", "happy"],
    ["rare", "uncommon","unusual","not typical","irate","musical","aromatic", "within"]
];

I need more rows in this array. If you feel creative and want to help a sister out, post a word and 3 synonyms in the comments. Getting back to SAS, I have used SAS arrays since they first came out and were implicitly indexed. In other words, it’s been a minute. If one-dimensional arrays were great, two-dimensional arrays were great-squared. Some people will tell you that JavaScript does not have two-dimensional arrays and rather, you have an array of arrays. To those people, I say, “Bah, humbug!”

Systematic Random Sampling Saves the Day

Alrighty, then, on to creating the synonym problem. Sometimes you can be too clever. My challenge was to make sure that the choices were put in random order so that the first 3 boxes weren’t always the correct answer. I went through a lot of possible solutions where I tried to splice the array to pull out a word randomly used, then pull another random choice from the shortened array, using the length attribute.

After all of that, I realized there was a really simple solution. Pull out a random number. Take that and the rest of the items in the row, then start at the beginning again. Systematic random sampling. Yep. Super simple. Every useful programming language on earth has a random number function, including SAS, of course. First, we randomly pull a row out of the array. Then, we start with the n+1 word in that array, when n is a random number between 1 and 7. (Look at qnum to see how we get that). We pull the word that is in the n+1 position in the row and assign it to the first box. Then, the next box gets the next word in order. When we get to the end of the row,  the next box will have the first synonym. So, if my random number is 5, the boxes for the choices are words # 5, 6, 7, 1 , 2, 3, 4 and boxes 4-6 are the correct answers.

Next, we have a

for (var i=1; i < 8; i++) {

some code

}

Really it is the exact same as

DO i = 1 to 7 ;

*** some code ;

END ;

After that, there are some IF- THEN – ELSE and assignment type statements. The only thing not really applicable to SAS is draggable function and appending some divs to the page.

I started this post writing about how everything in SAS made it easy for me to develop games using JavaScript but now that I think of it, it would work just as well the other way and if you know some JavaScript, learning SAS would be a piece of cake.  You can check out the code below. It’s getting late here in Santiago, Chile and I still want to call my infinitely patient husband back in California so I’ll pick up next time on scoring the answers right or wrong.

/* THIS CREATES THE PROBLEM.
A word is selected randomly from the array, then the start point in the list of synonyms is randomly selected.
This is systematic random sampling. The words are put in boxes for the divs starting with
the random number and when it gets to 7, it goes back to the beginning of the word list
(but after the word you are finding the synonym for, that's why you need the 1+ )
Divs that get the first 3 synonyms in the array are assigned a class of 'right'
and the others are assigned a class of 'wrongb'.
Draggable function is assigned to each of the choice boxes created.
If the choice is correct, the variable thisone is assigned the value of 1 when the box is dragged;
*/

function createProblem() {
    question = Math.floor(Math.random()* words.length);

    $("#segment2").text(words[question][0]);
    var qnum =1 +Math.floor((Math.random() * 7)) ; // Start at random number ;

    for (var i=1; i < 8; i++) {

        divid = "#div" + i ;
        var boxid = "#box" + i ;

        if (qnum < 4) {   $(divid).append('<div class="smallbox draggable right" id="' + boxid + '">'+ words[question][qnum] + '</div>');
        }
        else {  $(divid).
                   append('<div class="smallbox draggable wrongb" id="' + boxid + '">' + words[question][qnum] + '</div>');}

        $('.draggable').draggable({

            start: function (event, ui) {
                if ($(this).hasClass('right')) {
                    thisone = 1;
                    thisel = this;
                }
                else { thisone = 0; }
            }


        });
        if (qnum < 7) {qnum++;}
        else {qnum = 1; }

    }
}

// END CREATION OF WORD BANK PROBLEM ;

15
Feb

SAS taught me how to make best-selling games

If you know anything about SAS, you might think from the title that I used my mad data analysis skills to figure out what works and what doesn’t for games. While that is somewhat true, it is not at all what this post is  about.  In fact, learning SAS first helped me a lot when it came to actually MAKING games. No, there is not a lick of SAS code in our games, but the concepts and ideas came to me fairly easily because of my experience using SAS.

(If you read this and start to post a comment saying I could have learned everything here from Python or C or whatever your favorite language is, I am sure you are right. The fact is, though, I didn’t. )

Let me give you an example:

The object of the game is to match as many synonyms as possible in one minute. This is what has to happen:

  1. On loading the page, randomly select a word to display on the screen, start the timer and music
  2. Show the number of seconds on the page, going down every second
  3. On the page, show 7 other words, 3 that are synonyms and 3 that are not synonyms, making sure that the correct and incorrect words show up in random order.
  4. If the player drags a correct word into the box, it turns green and adds 1 point to the score.
  5. If the player drags an incorrect word, the box turns red
  6. If all three choice boxes are filled, all the boxes are cleared and a new word and choice boxes are shown
  7. When the time is up, if the player has a perfect score, show a happy image and appropriate text.
  8. If the player doesn’t have a perfect score, show a less happy image and appropriate text.
  9. When time is up, show a button the player can click to play again.

thumbs up for getting a perfect score

What in the heck does all of that have to do with SAS? It’s all written in JavaScript, the reason for that is a post for another day, but let’s look at some code:

<script type="text/javascript">
    $(document).ready(function () {

        //Timer script ;
        var time = 60000;
        var timer ;

This first bit just starts a script, and the beginning of a function that will execute when the document is ready. That is, I don’t want JavaScript to try acting on elements that aren’t loaded yet. My first exposure to writing functions was in the 1980s. It was a very significant event. I swear, I even remember the cramped graduate assistant office at the University of California, Riverside where I read my first book of SAS macros. I think it was a book of macros written by users. This is how we distributed things before the Internet. I thought the idea of writing my own functions was the coolest thing I had ever heard.

Now, for the timer. Everyone knows what a variable is, or you do if you did anything with any language. Here, I am initializing the time to 60,000 milliseconds. Initializing a variable, another basic idea I learned from SAS. I’m going to use that other variable, timer, later to execute the myTimer function. Just wait.

//Timer script ;
function myTimer() {
    if (time > 0) {
        var nowTime = time/1000 ;
        document.getElementById("timer").innerText = nowTime  ;
        time = time - 1000;
    }
    else if (time <= 0) {
        document.getElementById("timer").innerText = "0";
        clearInterval(timer);

            $("#form1").hide();
            //IF ALL OF YOUR ANSWERS WERE CORRECT ;
            if (correct === boxmove)
            {
                $("#correct").text("PERFECT! You answered " + thesepts + " correctly.");
                $("#correcto").slideDown('slow');
                playAudioLocal("../../sounds/correct1");

            }
            else  {
                $("#wrongo").show();
                $("#incorrect").text("You answered " + thesepts + " correctly.").slideDown('slow');
                playAudioLocal("../../sounds/flute");
            }
            $("#redo").show();

    }

Except for a few specific details, everything in the script above, I learned or improved from using SAS.

IF- THEN-DO-END   – instead of DO and END , I have an opening { and a closing }  but it’s the same thing.

If the time is greater than 0, the variable nowTime is going to be set to time divided by 1,000 since most people would prefer to see their time in seconds rather than milliseconds. By the way, nowTime is a local variable, defined within  a function. Local variables is another idea I first learned from SAS macros, thank you very much. The text of the element in the page named ‘timer’ is now set to whatever the number of seconds remaining is (nowTime). We deduct another milliseconds from time.

ELSE – DO is another common SAS bit of code . If there is no time left, do all of this stuff, e.g., set the time value to 0, stop calling the timer function.

You can have nested IF-THEN-DO code in SAS, as I do here in my JavaScript.

While SAS didn’t introduce me to text functions, it’s where I learned a lot of them. Here, we  have a JavaScript text function where I’m concatenating a string with a variable and then another string.

So, we’ve knocked off numbers 2, 7, 8 and 9. All of the showing and hiding elements had nothing to do with SAS . That part was straight jQuery but that was the easy part. Actually, this whole part was pretty easy. A few tricky bits show up later on.  Maybe I’ll get to them in my next post. While you are waiting with bated breath …..

Check out Making Camp because maturity is overrated.  Learn Ojibwe history, brush up on your math skills and build out your virtual wigwam.

wigwam

 

29
Jan

Maybe you *can* use SAS to teach art majors

I was supposed to be teaching statistics to undergraduate Fine Arts majors this semester but I’m going to Santiago to open a Latin American office for 7 Generation Games instead.

I’m a bit disappointed because even though when I was younger and got asked at cocktail parties what I did for a living, I would say,

I teach statistics to people who don’t want to learn it.

teaching Fine Arts majors would probably be a new experience.

I was planning on using Excel to teach that course. However, as I take a closer look at SAS Studio I think it might be feasible to use SAS.

First of all, it’s free for academics and you can use it on any device, including an iPad. I know because I’ve tested it.

Second, and more important for this group, you can use the tasks and do some real-life analyses with almost no coding.

For example, I want to know if the sample of students we tested on American Indian reservations who had a family member addicted to methamphetamine were, on the average, over the cutoff for depressive symptoms. On the scale we used, the CESD-C , the cutoff score is 15.

Step 1: Run the code to assign the directory with the data I made available for the course, for example,

libname in “/home/annmaria.demars/data_analysis_examples”;
run;

Step 2: Under the TASKS menu on the left select STATISTICS and then t TESTS

selecting t-tests

 

3.  Next to the DATA field you’ll see a thing that looks kind of like a spreadsheet. It’s supposed to symbolize a data file. Click on that and a box will come up that lets you pick the directory (library) and the file within it. In my case, it is the CESD_score file.

selecting the data

4. Now that I have my dataset selected, from the ROLES menu  I select one-sample t-test.

5. Click the + next to Analysis Variable and select the dependent variable, in my case, this is CESDTotal

Data selected for one-sample t-tes

6.  Now click on the OPTIONS tab. Two-tailed test is selected as the default. That’s good, leave it.  The alternative hypothesis tested is usually that the mean is equal to 0, but I want to change that to 15. Just click the little running guy at the top to get results.

options for t-test

 

I showed the results in a previous post, the mean for my sample of 18 youth was 21 (p <.05).

What if we did an UPPER one-tailed t-test? Then my p-value is .015 instead of .03.

What if we did a LOWER one-tailed test? Then my p-value is 1.0.

To get these latter 2 tests takes about 5 seconds. All  I need to do is change the option for tails and click on the running man again.

Now, in just a few minutes, I have data under three different assumptions, from an actual study. My students and I can start discussing what that means.

Bottom line, check out SAS Studio. It may be more of an option for your students than you think.

monkey

Meet the howler monkey in Aztech Games

 

Speaking of baby steps for learning statistics, check out Aztech Games. You can play them in English or Spanish on your iPad. Learn statistics and Latin American history at the same time.

30
Dec

What would you do if one person changed your results?

This is a hypothetical question, but it could easily happen. Let me give you a real example.

Using a mobile phone game, we administered a standard depression screening measure (CESD-C) to 18 children living on or near an American Indian reservation. All children had a family member who was an alcoholic or addicted to drugs.  I decide to do a one-sample t-test of the hypothesis that the mean for this population = 15, which is the cutoff value for symptoms of depression .  Here is the code but I didn’t code it (more about that later).

PROC TTEST DATA=cesd_score SIDES=2 H0=15 plots(showh0);

var CESDTotal;

The results are shown below, with  a mean of 21 and a range from 3 to 38.

ttest results

You can see that the t-value of 2.34 is significant at p < .05, that is the mean for this sample is significantly different than the cutoff score of 15. You can see more results here.  What if it hadn’t been, though? What if, instead of .0317 the probability was .0517?

What if dropping out this one person with a score of 3 changed the result? In fact, it did change the mean to 22, and the p-value to .0115 . You can see all of those results here.

So, let’s say that hypothetically dropping out this outlier WOULD change your results. Would you do it? Would you report it?

Think about it. In a couple of days, I will give you my answer and my justification.

As to not having coded it – I used the tasks in SAS Studio which I found to be pretty fun, but more on that in my next post.


Play Aztech: Meet the Maya – for your iPad in the app store, in Spanish and English.  The second in our series of bilingual games teaching basic statistics and Latin American history. Only $1.99 

girl in jungle

P.S. There is a third possibility here, which is changing the test from a two-tailed test to one-tailed test. Surely, an argument can be made that we don’t expect children with a family member who is addicted to alcohol or drugs to be less depressed than the cut-off score? They would either be equal or more depressed. Personally, I don’t buy that argument. I could accept that the sample might be more depressed than the average but I’m not sure one could justify that the mean necessarily MUST be more than the cut-off for depressive symptoms. 

 

 

 

27
Dec

DO statistics and you can go almost anywhere

Let me say right off the bat that the number of contracts I’ve had where people wanted me to tell them what to do I can count on one hand – and I’ve been in business 30 years. Generally, whether it is an executive in an organization where I’m an employee or a client for my consulting services, people don’t want me to tell them what to do,

Hey, you should do a repeated measures ANOVA.

Nope, they want me to DO it. It’s funny how often I find myself doing the same procedures for vastly different organizations, everywhere from the middle of Missouri to downtown Los Angeles to American Indian reservations in North Dakota to (soon) Santiago, Chile.

view over the top of my ipad

There are also those procedures I only use once in a great while, but that’s the topic of another post. Here are a couple of my go-to procedures.

Fisher’s Exact Test

Earlier this year I wrote about the Fisher’s Exact Test and how I had used this teeny bit of code

PROC FREQ DATA = install ;
TABLES rural*install / CHISQ ;

is an example of how you do it in SAS for everything from testing whether urban school districts have significantly more bureaucratic barriers to using educational technology than rural districts (they do) to whether mortality rates are lower in a specialized unit in a hospital than for patients with the same diagnosis in a standard unit.

Confidence Limits for the Mean

Working with small samples in rural communities, I often don’t have the luxury of a control group. I know this makes me sound like a terrible researcher and that I never read a quantitative methods or experimental design textbook. However, let me give you an example of the types of conversations I have all of the time.

Me:  I’d like to use your program as a control group. I’ll come in and test all of your students and then two months later, I’ll test them all again.

Principal/ Superintendent/ Program Director:  You mean you want me to take up two periods of class / counseling time for your tests?

Me: Yes.

Them: You wouldn’t actually be giving our students any services or educational program, you’d just be taking two hours from all of our students.

Me: Yes, and then I’ll compare their results to those of the students who do get services.

Them: What do our students get out of it?

You can see where this conversation is going. One solution might be to pay all of the students some amount to stay after school or come in for an extra counseling period or whatever is being compared, so they aren’t missing out on services to take the test. However, Institutional Review Boards are cautious about having substantial incentives because then they feel very low income might be coerced into participating – for some of the people on our research, $10 is a lot of money.

The result is that I don’t always have a control group, but all is not lost. Being smarter than I look (yes, really),  I often use standardized measures for which there is a lot of research documenting the mean and I can do a one-sample test.

proc means data=cesd_score alpha=.05 clm mean std ;
var cesdtotal ;

This will give me the 95% confidence interval for the mean and I can see if my sample is significantly different from the mean .  For example, with a sample of 18 children from an American Indian reservation, the mean score on the CESD – C, a measure of depression, the mean score was 21. The cutoff for considering the respondent as showing depressive symptoms is 15. With a confidence interval from 15.6 to 26.4  I can say that there is a greater than 95% probability that the population mean fits the cutoff for depressive symptoms. Notice that the lower confidence limit still is above the screening cutoff point of 15.

There is an interesting question related to this specific study, but it will have to wait for tomorrow since I have to head to the airport in a few hours. This week, I’m heading to Missouri. If you want to meet up and talk statistics, video games or just drink beer, let me know.


Play Aztech: The Story Begins – free for your iPad in the app store, in Spanish and English.  The first in our series of bilingual games teaching math and history.

girl in jungle

 

3
Oct

Becoming a real software developer, using SAS, or whatever

God spare me from the self-taught software developer who knows only the latest thing.

God

I’m not against the latest thing, whether it is react or ember or Python games on Raspberry Pi or whatever it is today. My objection is to the fallacy that it is the only thing or even the most import thing.  Let me enlighten you with why I am loathe to hire self-taught programmers no matter how many of the ‘most elegant’ techniques their example project showcases.

There are several things you learn as a grown-up programmer (which The Invisible Developer tells me I should not call myself because it sounds lower than software developer. Again, I ignore him. Do not be misled by this to believe he is not high on my list.  He just brought me a martini, with bleu cheese stuffed olives. )

martini

What Self-Taught Programmers Aren’t Taught

If you taught yourself to code by some online coding school or watching videos or reading books from Safari O’Reilly that shows an admirable amount of motivation. If you already have some experience as a software developer and this is how you learned a new language, that’s great. Maybe we can hang out and work together. If, however, that is your ONLY source of knowledge and experience, probably not. There are a few things self-taught programmers are generally not taught simply because they are not working as part of a team.

  1. Testing. Testing. Testing. I said it three times because it was important. I think  I will say it again. Testing. Testing. This is why I need the martini. If you are developing an application, you need to test EVERYTHING. If I had a dollar for every time someone told me, “I tested everything but …” I would never need to seek investor funding again, I would just pull money from the piles in every room in my house. However much you think you need to test your software, you are wrong. The answer is, “More.” You need to test it on other machines besides yours. I learned this from SAS code that ran on Mac (yes, there was SAS on Mac a very long time ago) but not on Windows or on Windows but not on Unix. You can’t look down your nose at those people who aren’t running Windows 10 because that is only half of people who run Windows and less than 20% of the total market. SAS is actually a good starting point for learning this because it runs on a lot of devices with few changes but you do need to change the LIBNAME and FILENAME statements, for example. Similarly, we make games now that run on Mac, Windows, iOS and Android . At a minimum, you need to do a separate build , but sometimes you need to make major changes. For example, Android has some limitations on app size that iOS does not. Test whether your software installs. Test whether it opens. Test the most basic applications. For SAS, this would be creating a temporary data set, reading in data with a DATALINES statement and doing a PROC MEANS. For our educational games, it might be playing all the way through getting all of the answers correct. Test extreme cases. For S AS this might be merging several enormous datasets, applying user created formats, calling macros to manipulate the data and then performing a multivariate analysis of variance.

    For our games, it would mean getting every single problem wrong and quitting the game and logging back in many times, maybe after every problem. It would include entering completely illogical numbers, say, that you had picked 9,145,087 berries and and seeing if the program really tried to put over 9 million berries in the baskets.

    I’m sure you can think of some more extreme cases, but you get the idea.

    I can’t emphasize testing enough. The problem with someone who creates applications on his or her own is that person understands completely how the software is supposed to work. Real testing includes things like wandering off the path in a game with the path clearly marked, “just to see what would happen”. It is having people enter “as often as I can” instead of male or female for sex.

    I once asked someone how he managed to test a game where the image that showed the key for deciphering the message was missing and he said, “I knew what the image was supposed to be.” This was not the answer I was looking for.

  2. Debugging is most of your life as a software developer. Basically, you write code for a few minutes and then swear and debug it for hours. Once you have a little experience, you learn to test and debug as you go and never write huge blocks of code that you then find doesn’t work and you have to figure out where in there the bugs occurred. You will learn all types of tricks of the trade for debugging. These include, printing out the first few records of a data set to make sure it looks like you expect. With JavaScript it might be writing the value of a variable to the console. Either way, the point is the same, you are testing little bits of code as you go and seeing that the result is what you expected. You also learn to debug all the way through. With SAS, you might apply the statements you have written to a data set in the documentation and verify that you got the same results. With a game, you might collect all of the objects in a scene and then check that the variable recording the number of objects is equal to what you expect.

    In any program that you are writing, you learn to break it into modules and test each of those modules. So you are debugging it in chunks by writing out the values of some number both in small steps, say even after each statement if you are really running into problems, and also in medium steps, say, at the end of each S A S data step or procedure, or after the execution of each function.

    I’m not saying that self-taught programmers don’t debug their code because obviously they do. No one always writes code that works perfectly the whole time. What I am saying, is that if you are self-taught, you only know the debugging techniques that you have figured out for yourself, as opposed to picking up ideas from your colleagues.

  3. A third part of being a grown-up software developer often missed by those who are self taught is how to document the software. Comments are your friend. I had a colleague who made fun of me for how much I would put comments in the code but when the next year we had to do a similar project again I could turn to him and say, “who’s laughing now, bitches?” I have never met the programmer who enjoyed writing documentation. I have met a lot of programmers who were happy they had written it. if you are always chasing the latest thing, you might not be in that situation where you need to revisit something that you did a year or two ago. If you are not part of the team, you probably are not worrying whether some nonexistent team member can understand your code. On the contrary, you might be trying some really cool new ideas just because they’re interesting. I’m not against that, in fact, I completely understand. However, you need to document those cool new things. And if you take the attitude, “well, everyone should be expected to know the function call to integrate Lua with PHP”, come here little closer so I can slap you.

Here’s why being part of a software development is usually a crucial aspect of your career progress – all of the things I mentioned, most people don’t really want to do. Testing isn’t nearly as fun as writing code. No one likes to write documentation. Everyone knows that debugging is crucial but it usually seems at the time as if putting in all of those statements to check every single variable’s value after every manipulation is so time-consuming when you are just sure it was correct anyway. When you’re on a team, you can’t get away with cutting corners and skipping the not fun parts nearly so much. You also realize how crucial those parts are when other people on the team have no idea what in the hell you were doing when you wrote that function or macro or nested do loop.

Sorry, but I don’t think a weekend hackathon is any substitution no matter how many prizes you won. Not unless you had to return to the same hackathon six months later and update the project with a completely new set of people.

I don’t want to leave you all depressed, though. So, I do have two pieces of advice. For the debugging part there are plenty of software conferences you can attend, and find sessions on tips for debugging software. you may also meet people at those conferences that you could end up working with on a team for some project interests all of you.

Blogging – is a great way to document what you have been doing. On this blog, and my other company blog, I often write down what ever I have been working on lately just so I remember when I run into a similar problem six months down the road. You’d be surprised how often I Google a question and one of the first answers that pops up is a blog post I wrote years ago.

Speaking of  games – check out Making Camp, you can get it here for free. Play it and learn stuff because maturity is overrated.

wigwam

If you want to learn even more stuff, you can get a bilingual version of Making Camp for your iPad for only $1.99 and brush up on your Spanish like you always said you were going to do but didn’t

Back to Top