Compatibility between model data and the target population

To make accurate predictions, it is necessary that the sample data you use for model development is compatible with the target population. The distribution of each input used in the model should be similar in the sample and the target population. In your model you should include only those variables [...]

The post Compatibility between model data and the target population appeared first on SAS Learning Post.


RDU airport -vs- surrounding parks and nature

In growing areas such as the Research Triangle Park, there is always the tough decision between developing land for business, or keeping it natural for parks and recreation. SAS is fortunate to have the great hiking trails at Umstead Park just across the road to the north - it's the [...]

The post RDU airport -vs- surrounding parks and nature appeared first on SAS Learning Post.


A preliminary analysis of the Nobel Laureates

Every year in early October, the eyes of the world turn to Sweden and Norway, where the Nobel Prize winners are announced to the world. The Nobel Prize is considered the world's most prestigious award. Since 1901, the Prize has been presented to individuals and organizations that have made significant achievements in the fields of physics, chemistry, physiology or medicine, world peace and literature in each year (there were several exceptions during war years). In 1968, Sveriges Riksbank established the Sveriges Riksbank Prize in Economic Sciences in memory of Alfred Nobel, founder of the Nobel Prize. Today, individuals or organizations who are awarded Nobel Prizes and the Prize in Economic Sciences are called Nobel Laureates.

So far, more than 900 Nobel Laureates have been awarded. In this post, I wanted to learn a little more about these impressive individuals. Where were these Nobel Laureates from? Why do they get awarded? Is there any common characteristics you’ll find in these Laureates? Below you’ll find a preliminary analysis of Nobel Laureates using SAS Visual Analytics.

The analysis is based on data from List_of_Nobel_laureates, List of Nobel laureates by university affiliation and Nobel Laureates datasets at Kaggle, which definitely has some missing and inconsistent values. I have cleaned the data to correct for some obvious inconsistency as possible for my analysis.

How many Nobel Laureates have their been so far?

Recently, 12 new Nobel Laureates were awarded by the 2017 Nobel Prizes and Prize in Economic Sciences, and that makes 923 Laureates in total since the first Nobel Prize in 1901. Some Laureates share one prize, so we see more shared Laureates total in below table. While we see 27 organization winners of the Peace prize, most Laureates are individual winners.

analysis of the Nobel Laureates

The chart below shows the overall trend of annual total Nobel Laureates is increasing year-over-year, as more and more winners are sharing the Prize. The purple circle on the plot indicates that there are shared winners in that year. The average number of winners is about eight each year. Yet there was only one winner in 1916 for the Literature Prize. The most winners came in 2001, with 15 Laureates sharing the prizes. I also note from the chart that during the First World War, there were very few Nobel Prizes awarded, and during the Second World War, there were none.

Moreover, we know that most Nobel Laureates are awarded one Nobel Prize, yet I learned from childhood that the female scientist Marie Curie received two Nobel Prizes. If you search the datasets for winners awarded more than one Prize, you’ll find four scientists accomplished this feat. They are: Marie Curie, Linus Pauling, John Bardeen and Frederick Sanger.

Do Nobel Laureates live longer?

The answer is YES, per the research by Prof. Andrew Oswald from University of Warwick. Winning a Nobel prize adds about 1.5 years to the lifespan of Nobel Laureates compared to those who were merely nominated. Of course, it is not because of the monetary benefits that come with the Nobel Prize, but because of ‘the deep links between mind and body’, and that ‘happiness’ may make people live longer, which makes sense to me.

Since I don’t have the data of Nobel Prize nominees, let’s only test the lifespan of the Nobel Laureates and the ages they got awarded. The average life age of all Nobel Laureates is nearly 80, much older than the global average life expectancy of 71.4 years-old (according to World Health Organization 2015). Digging a bit more, we see Martin Luther King is the Nobel laureate (Peace, 1964) who died at youngest age. He was assassinated at 39 years old. Laureates who lived longest are Rita Levi-Montalcini (Medicine, 1986) and Ronald H. Coase (Economics, 1991), who both lived to 103 years old. You may also notice that the distribution of the Laureates’ lifespan is left skewed, the Nobel Prize winners certainly live longer than most.

In addition, something more worth noting:

  • The most laureates with the longest lifespans are from the Economics and Medicine categories. The Nobel Prize winning economists live longer than other categories’ winners on average. The average lifespan of these economists is about 86 years-old, five years longer than the second category of Medicine.
  • Economics winners are winning the awards at the highest age – 67 years-old on average. More digging shows that the oldest awarded age is 90 when Leonid Hurwicz (Economics, 2007) was awarded his Prize. We see the average awarded age of Physics winners is 56, which is 10+ years younger than that of the Economics winners. Thus, we get the impression that economists need more time to have outstanding achievements.
  • If we compare the time span between Laurates’ average awarded ages and their lifespan, the Physics Prize winners enjoy the longest life time after winning the award – about 20 years on average.
  • It is also worth noting that the Nobel Peace winners have the largest span of awarded age, about 70 years’ span. That’s because the youngest Nobel Laureates Malala Yousafzai, who got awarded of Nobel Peace Prize at 17 years-old in 2014.

The chart below is created in SAS Visual Analytics and shows the awarded ages of all individual Nobel Laureates in different prize categories. The reference line is the average awarded age of 59. It is very easy to note that no Nobel Prize was awarded during 1940-1943 due to the Second World War.

From which universities have Nobel Laureates graduated?

Next, let’s look at the educational background of Nobel Laureates. The left chart below obviously shows that much more Nobel winners hold Doctorate degrees than those of Bachelor or Master degrees. If we see the chart for Literature and Peace categories on the right, the difference is not that big. From the data, we know that the educational background of Nobel Laureates in Physics, Chemistry, Medicine and Economics categories (I call these four categories the scientific categories for easier description later) has the higher percentage of doctorate than that of winners in the Literature and Peace categories.

To learn more about the universities the Laureates in these scientific categories are graduated from, I ranked the top 10 university affiliations for the scientific categories in below chart, and their distribution among these categories, as well as the countries in which these universities are located.

The top 10 university affiliations were selected basing on the highest degree of the scientific categories’ Laureates obtained. That is, if one winner held a Master degree from Harvard University and a Doctorate degree from University of Cambridge, he/she is counted in University of Cambridge but not in the Harvard University. From the parallel coordinates plot, you may have noticed that the Physics in University of Cambridge and the Medicine in Harvard University are their greatest majors respectively. On the right, it shows the countries where these top 10 university affiliations are in United States, United Kingdom, France and Germany. The bar charts on the left show the percentage of educational degrees (Doctorate, Master, Bachelor) of each in the scientific categories (according to the available dataset). In the bottom chart, top 10 universities are ranked by their percentages. Perhaps now you have a great university in your mind for future education?

Next, I created the chart below to show the top eight countries having the university affiliations that more Nobel Prize winners graduated from. (Here the chart only shows for scientific categories, thus it excludes the Nobel Literature Prize and Peace prize.). An obvious trend we see from the chart is that the United States has the most Laureates spanning in the scientific categories after the Second World War, while Germany has more Laureates in the scientific categories comparatively before World War II.

Why do the Nobel Laureates get awarded?

Per the ‘nobelprize.org’, in his excerpt of the will, Alfred Nobel (1833-1896) dictates that his entire remaining estate should be used to endow "prizes to those who, during the preceding year, shall have conferred the greatest benefit to mankind." So Alfred's interests are reflected in the Prize, which said “The whole of his remaining realizable estate constitutes a fund, and the annually interest shall be divided into five equal parts, which shall be apportioned as follows: one part to the person who shall have made the most important discovery or invention within the field of physics; one part to the person who shall have made the most important chemical discovery or improvement; one part to the person who shall have made the most important discovery within the domain of physiology or medicine; one part to the person who shall have produced in the field of literature the most outstanding work in an ideal direction; and one part to the person who shall have done the most or the best work for fraternity between nations, for the abolition or reduction of standing armies and for the holding and promotion of peace congresses.”

Since it’s not easy to seek evidence in the datasets that Nobel Laureates are awarded by fulfilling Alfred’s will, what I do is to use SAS Visual Analytics text topics analysis performing some preliminary text analysis of the ‘Motivation’ field in the dataset for a validation to some extent. The ‘Motivation’ is given by ‘nobelprize.org’ for why the Laureate gets awarded. The analysis shows that the most frequently mentioned word is ‘discovery’, while the most 5 frequently appeared words include ‘work’, ‘development’, ‘contribution’, and ‘theory’. And from the topics analysis result, the top 10 topics are about ‘discovery’, ‘human”, “structure”, “economic”,” technique”, etc., which are reflecting Alfred Nobel‘s will in establishing the Prize. Moreover, the sentimental analysis result shows that the statements in the ‘Motivation’ field are mainly neutral (being ‘objective’), even though there are few positive and negative sentimental statements.


I hope you’ve found this analysis of Nobel Laureates data interesting. I believe there are still many other perspectives you can analyze to get insights. Is there anything interesting you see?

A preliminary analysis of the Nobel Laureates was published on SAS Users.


Transforming Analytics into an Art Form

As analytics advances into more areas of our everyday life there is an increased need to understand just what it is that analytics is accomplishing for us.  I find there is a quicker acceptance of an analytical solution and even excitement if there is a good conceptual understanding of what [...]

The post Transforming Analytics into an Art Form appeared first on SAS Learning Post.


Los Angeles ozone levels - 30th anniversary revisit

The movie Blade Runner came out about 30 years ago, portraying a dystopian future set in 2019 Los Angeles. Many of the technology predictions in the movie have actually become reality - but how about the weather and pollution? The movie was all about smog and a lack of sunshine, and I wondered [...]

The post Los Angeles ozone levels - 30th anniversary revisit appeared first on SAS Learning Post.


Trends in U.S. refugee admissions

"Coming to America" - it's been the name of a funny movie and a dramatic song (can you name the actor & singer without cheating?!?) It's also been a dream for many, and an action for some. People have been coming to America both legally and illegally for many years, but there's one special category [...]

The post Trends in U.S. refugee admissions appeared first on SAS Learning Post.


SAS and sports analytics (plotting kayak/SUP race results)

I recently paddled in a boat race, and was wondering how I did compared to all the other paddlers. And being a Graph Guy, I decided I should find a cool way to graph the data ... Here's some background information ... There's a great organization called Bridge II Sports [...]

The post SAS and sports analytics (plotting kayak/SUP race results) appeared first on SAS Learning Post.


Most popular international destination, from each US state

Orbitz recently published a map showing the most popular international travel destination for each of the 50 US states. It was an interesting map ... but of course, me being the Graph Guy, I had to pick it apart and create my own version. Follow along, and explore this interesting [...]

The post Most popular international destination, from each US state appeared first on SAS Learning Post.


The 'virtual wall' along the US/Mexican border

Recent news reports indicate that the number of illegal immigrants apprehended along the US/Mexican border has dropped significantly since Trump took office. So, even though Trump hasn't had time to build a physical wall, perhaps there's a virtual wall in place now (built from the change in attitudes)? I've seen [...]

The post The 'virtual wall' along the US/Mexican border appeared first on SAS Learning Post.


Amusement park attendance (could Wikipedia be wrong?!?)

How do the North American amusement parks compare in popularity? If this question was to come up during a lunch discussion, I bet someone would pull out their smartphone and go to Wikipedia for the answer. But is Wikipedia the definitive answer - how can we tell if Wikipedia is wrong? [...]

The post Amusement park attendance (could Wikipedia be wrong?!?) appeared first on SAS Learning Post.

Back to Top