24
Jun

Add loess smoothers to residual plots

When fitting a least squares regression model to data, it is often useful to create diagnostic plots of the residuals versus the explanatory variables. If the model fits the data well, the plots of the residuals should not display any patterns. Systematic patterns can indicate that you need to include [...]

The post Add loess smoothers to residual plots appeared first on The DO Loop.

Read More
21
Jun

Learn the three easiest ways to load data into CAS tables

For every project in SAS®, the first step is almost always making your data available. This blog shows you how to load three of the most common input data types—a data set, a text file, and a Microsoft Excel file—into SAS® Cloud Analytic Services (CAS) tables. The three methods that [...]

Learn the three easiest ways to load data into CAS tables was published on SAS Users.

Read More
20
Jun

A playful way to get your hands on SAS: the Data Science Escape Rooms

Move over video games and sports. Make room for escape rooms. This burgeoning form of entertainment found its roots in the video gaming movement. Escape rooms tap into a player's drive to reach the next level, solve a puzzle and win. Escape rooms present a physical game that traps you [...]

A playful way to get your hands on SAS: the Data Science Escape Rooms was published on SAS Users.

Read More
20
Jun

A playful way to get your hands on SAS: the Data Science Escape Rooms

Move over video games and sports. Make room for escape rooms. This burgeoning form of entertainment found its roots in the video gaming movement. Escape rooms tap into a player's drive to reach the next level, solve a puzzle and win. Escape rooms present a physical game that traps you [...]

A playful way to get your hands on SAS: the Data Science Escape Rooms was published on SAS Users.

Read More
19
Jun

Airbnb's top wish-listed homes in NC!

As the sharing economy grows, you can catch a ride on Uber, have meals delivered by Grubhub ... and even stay in someone else's house (rather than a hotel) via sites like Airbnb. And speaking of that last one, I recently read an article listing Airbnb's top 20 'wish-listed' homes [...]

The post Airbnb's top wish-listed homes in NC! appeared first on Graphically Speaking.

Read More
19
Jun

Python Pandas : Drop columns from Dataframe

In this tutorial, we will cover how to remove or drop one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
drop columns python
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038 1.655995 -1.413243
1 0.507747 0.710933 -1.335381 0.832619
2 0.280036 -0.411327 0.098119 0.768447
3 0.858730 -0.093217 1.077528 0.196891
4 -0.905991 0.302687 0.125881 -0.665159
5 -2.012745 -0.692847 -1.463154 -0.707779

Drop a column in python

In pandas, drop( ) function is used to remove column(s).axis=1 tells Python that you want to apply function on columns instead of rows.
df.drop(['A'], axis=1)
Column A has been removed. See the output shown below.
          B         C         D
0 -1.656038 1.655995 -1.413243
1 0.710933 -1.335381 0.832619
2 -0.411327 0.098119 0.768447
3 -0.093217 1.077528 0.196891
4 0.302687 0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779
In order to create a new dataframe newdf storing remaining columns, you can use the command below.
newdf = df.drop(['A'], axis=1)
To delete the column permanently from original dataframe df, you can use the option inplace=True
df.drop(['A'], axis=1, inplace=True)
#Check columns in df after dropping column A
df.columns

Output
Index(['B', 'C', 'D'], dtype='object')
The parameter inplace= can be deprecated (removed) in future which means you might not see it working in the upcoming release of pandas package. You should avoid using this parameter if you are not already habitual of using it. Instead you can store your data after removing columns in a new dataframe (as explained in the above section).

If you want to change the existing dataframe, try this df = df.drop(['A'], axis=1)

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.
Method I
df2 = df.drop(['B','C'], axis=1)
Method II
cols = ['B','C']
df2 = df.drop(cols, axis=1)
Select or Keep Columns
If you wish to select a column (instead of drop), you can use the command
df['A']
To select multiple columns, you can submit the following code.
df[['A','B']]

How to drop column by position number from pandas Dataframe?

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.
df.drop(df.columns[0], axis =1)
To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].
cols = [0,2]
df.drop(df.columns[cols], axis =1)

Drop columns by name pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})
   X1  X_2  YX  Y_1  Z
0 1 2 3 2 5
1 2 3 4 3 6
2 3 4 5 4 7
3 4 5 6 5 8
4 5 6 7 6 9

Drop column whose name starts with letter 'X'

df.loc[:,~df.columns.str.contains('^X')]
How it works?
  1. ^X is a expression of regex language which refers to beginning of letter 'X'
  2. df.columns.str.contains('^X') returns array [True, True, False, False, False].
    True where condition meets. Otherwise False
  3. Sign ~ refers to negate the condition.
  4. df.loc[ ] is used to select columns
It can also be written like :
df.drop(df.columns[df.columns.str.contains('^X')], axis=1)
Other Examples
#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Drop columns where percentage of missing values is greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
'B':[4,np.nan,np.nan,5,np.nan]
})
% of missing values can be calculated by mean of NAs in each column.
cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

Read More
19
Jun

Influential observations in a linear regression model: The DFFITS and Cook's D statistics

A previous article describes the DFBETAS statistics for detecting influential observations, where "influential" means that if you delete the observation and refit the model, the estimates for the regression coefficients change substantially. Of course, there are other statistics that you could use to measure influence. Two popular ones are the [...]

The post Influential observations in a linear regression model: The DFFITS and Cook's D statistics appeared first on The DO Loop.

Read More
18
Jun

Deploy SAS Viya on AWS - Quick Start

As a data scientist, did you ever come to the point where you felt the need for an evolved analytics platform bringing together the disparate skills of open source and commercial software? A system that can enable advanced analytic capabilities. This is now possible and easy to implement. With many [...]

Deploy SAS Viya on AWS - Quick Start was published on SAS Users.

Read More
18
Jun

Create Infographics with R

This tutorial explains how to create charts used for Infographics in R. The word Infographics is made up of two words Information and Graphics. It simply means graphical visual representation of information. They are visually appealing and attracts attention of audience. In presentations, it adds WOW factor and makes you stand out in a crowd.
Install the packages used for Infographic Charts
You can install these packages by running command install.packages(). The package echarts4r.assets is not available on CRAN so you need to install it from github account by running this command devtools::install_github("JohnCoene/echarts4r.assets")
  1. waffle
  2. extrafont
  3. tidyverse
  4. echarts4r
  5. echarts4r.assets

Waffle (Square Pie Chart)

In this section we will see how to create waffle chart in R. Waffle charts are also known as square pie or matrix charts. They show distribution of a categorical variable. It's an alternative to pie chart. It should be used when number of categories are less than 4. Higher the number of categories, more difficult would be read this chart. In the following example, we are showing percentage of respondents who answered 'yes' or 'no' in a survey.

library(waffle)
waffle(
c('Yes=70%' = 70, 'No=30%' = 30), rows = 10, colors = c("#FD6F6F", "#93FB98"),
title = 'Responses', legend_pos="bottom"
)
waffle in r
Use Icon in Waffle
Steps to download and install fontawesome fonts
  1. First step is to load extrafont library by running this command library(extrafont)
  2. Download and install fontawesome fonts from this URL https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/fonts/fontawesome-webfont.ttf
  3. Import downloaded fontawesome font by using this command extrafont::font_import (path="C:\Users\DELL\Downloads", pattern = "awesome", prompt = FALSE)
  4. Load fonts by using the command loadfonts(device = "win")
  5. Check whether font awesome is installed successfully by running this command fonts()[grep("Awesome", fonts())]. It should return FontAwesome
In the example below, we are showing performance of girls in a particular subject. The option use_glyph= refers to icon you want to show in the chart and glyph_size= refers to size of the icon.

waffle(
c(`Poor=10` =10, `Average=18` = 18, `Excellent=7` =7), rows = 5, colors = c("#FD6F6F", "#93FB98", "#D5D9DD"),
use_glyph = "female", glyph_size = 12 ,title = 'Girls Performance', legend_pos="bottom"
)
waffle icon
How to align multiple waffle charts
By using iron( ) function you can left-align waffle plots. You can use ggplot2 functions to customize the plot (like I did in the program below to center align the title using plot.title = )

iron(
waffle(
c('TRUE' = 7, 'FALSE' = 3),
colors = c("pink", "grey70"),
use_glyph = "female",
glyph_size = 12,
title = "Female vs Male",
rows = 1,
legend_pos = "none"
) + theme(plot.title = element_text(hjust = 0.5))
,
waffle(
c('TRUE' = 8, 'FALSE' = 2),
colors = c("skyblue", "grey70"),
use_glyph = "male",
glyph_size = 12,
rows = 1,
legend_pos = "none"
)
)
multiple waffle plots

Pictorial Charts in R

Pictorial charts show data scaled in picture or image form instead of bars or columns. They are also called pictogram charts. Let's create fake data for illustrative purpose.

df22 x = sort(LETTERS[1:5], decreasing = TRUE),
y = sort(sample(20:80,5))
)

x y
1 E 27
2 D 29
3 C 45
4 B 46
5 A 78
e_pictorial(value, symbol) function is used for pictorial plots. The second parameter symbol refers to built-in symbols like circle, rect, roundRect, triangle, diamond, pin, arrow, icon, images and SVG Path. Built-in symbols can be used like symbol = "rect"

library(echarts4r)
library(echarts4r.assets)

df22 %>%
e_charts(x) %>%
e_pictorial(y, symbol = ea_icons("user"),
symbolRepeat = TRUE, z = -1,
symbolSize = c(20, 20)) %>%
e_theme("westeros") %>%
e_title("People Icons") %>%
e_flip_coords() %>%
# Hide Legend
e_legend(show = FALSE) %>%
# Remove Gridlines
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(splitLine=list(show = FALSE)) %>%
# Format Label
e_labels(fontSize = 16, fontWeight ='bold', position = "right", offset=c(10, 0))
Add Images in Chart
If you are using images, make sure to precede it with image:// before image address. In the code below, we have used paste0( ) function to concatenate it before image address.

Unity Buddha
data x = c("Statue of Unity", "Spring Temple Buddha"),
value = c(182, 129),
symbol = c(paste0("image://", Unity),
paste0("image://", Buddha))
)

data %>%
e_charts(x) %>%
e_pictorial(value, symbol) %>%
e_theme("westeros") %>%
e_legend(FALSE) %>%
# Title Alignment
e_title("Statues Height", left='center', padding=10) %>%
e_labels(show=TRUE) %>%
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(show=FALSE, min=0,max=200, interval=20, splitLine=list(show = FALSE))
Pencil Chart in R
Instead of bars, we are using pencil to show comparison of values.

df02 x = LETTERS[1:10],
y = sort(sample(10:80,10), decreasing = TRUE)
)

df02 %>%
e_charts(x) %>%
e_pictorial(y, symbol = paste0("image://","https://1.bp.blogspot.com/-klwxpFekdEQ/XOubIhkalyI/AAAAAAAAHlE/25psl9x4oNkbJoLc2CKTXgV2pEj6tAvigCLcBGAs/s1600/pencil.png")) %>%
e_theme("westeros") %>%
e_title("Pencil Chart", padding=c(10,0,0,50))%>%
e_labels(show = TRUE)%>%
e_legend(show = FALSE) %>%
e_x_axis(splitLine=list(show = FALSE)) %>%
e_y_axis(show=FALSE, splitLine=list(show = FALSE))

Fill Male, Female Icons based on percentage

To find SVG Path, download desired SVG file from https://iconmonstr.com/ and open it in chrome and then find path in page source.

gender = data.frame(gender=c("Male", "Female"), value=c(65, 35),
path = c('path://M18.2629891,11.7131596 L6.8091608,11.7131596 C1.6685112,11.7131596 0,13.032145 0,18.6237673 L0,34.9928467 C0,38.1719847 4.28388932,38.1719847 4.28388932,34.9928467 L4.65591984,20.0216948 L5.74941883,20.0216948 L5.74941883,61.000787 C5.74941883,65.2508314 11.5891201,65.1268798 11.5891201,61.000787 L11.9611506,37.2137775 L13.1110872,37.2137775 L13.4831177,61.000787 C13.4831177,65.1268798 19.3114787,65.2508314 19.3114787,61.000787 L19.3114787,20.0216948 L20.4162301,20.0216948 L20.7882606,34.9928467 C20.7882606,38.1719847 25.0721499,38.1719847 25.0721499,34.9928467 L25.0721499,18.6237673 C25.0721499,13.032145 23.4038145,11.7131596 18.2629891,11.7131596 M12.5361629,1.11022302e-13 C15.4784742,1.11022302e-13 17.8684539,2.38997966 17.8684539,5.33237894 C17.8684539,8.27469031 15.4784742,10.66467 12.5361629,10.66467 C9.59376358,10.66467 7.20378392,8.27469031 7.20378392,5.33237894 C7.20378392,2.38997966 9.59376358,1.11022302e-13 12.5361629,1.11022302e-13',
'path://M28.9624207,31.5315864 L24.4142575,16.4793596 C23.5227152,13.8063773 20.8817445,11.7111088 17.0107398,11.7111088 L12.112691,11.7111088 C8.24168636,11.7111088 5.60080331,13.8064652 4.70917331,16.4793596 L0.149791395,31.5315864 C-0.786976655,34.7595013 2.9373074,35.9147532 3.9192135,32.890727 L8.72689855,19.1296485 L9.2799493,19.1296485 C9.2799493,19.1296485 2.95992025,43.7750224 2.70031069,44.6924335 C2.56498417,45.1567684 2.74553639,45.4852068 3.24205501,45.4852068 L8.704461,45.4852068 L8.704461,61.6700801 C8.704461,64.9659872 13.625035,64.9659872 13.625035,61.6700801 L13.625035,45.360657 L15.5097899,45.360657 L15.4984835,61.6700801 C15.4984835,64.9659872 20.4191451,64.9659872 20.4191451,61.6700801 L20.4191451,45.4852068 L25.8814635,45.4852068 C26.3667633,45.4852068 26.5586219,45.1567684 26.4345142,44.6924335 C26.1636859,43.7750224 19.8436568,19.1296485 19.8436568,19.1296485 L20.3966199,19.1296485 L25.2043926,32.890727 C26.1862111,35.9147532 29.9105828,34.7595013 28.9625083,31.5315864 L28.9624207,31.5315864 Z M14.5617154,0 C17.4960397,0 19.8773132,2.3898427 19.8773132,5.33453001 C19.8773132,8.27930527 17.4960397,10.66906 14.5617154,10.66906 C11.6274788,10.66906 9.24611767,8.27930527 9.24611767,5.33453001 C9.24611767,2.3898427 11.6274788,0 14.5617154,0 L14.5617154,0 Z'))

gender %>%
e_charts(gender) %>%
e_x_axis(splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel= list(show=FALSE)) %>%
e_y_axis(max=100,
splitLine=list(show = FALSE),
axisTick=list(show=FALSE),
axisLine=list(show=FALSE),
axisLabel=list(show=FALSE)) %>%
e_color(color = c('#69cce6','#eee')) %>%
e_pictorial(value, symbol = path, z=10, name= 'realValue',
symbolBoundingData= 100, symbolClip= TRUE) %>%
e_pictorial(value, symbol = path, name= 'background',
symbolBoundingData= 100) %>%
e_labels(position = "bottom", offset= c(0, 10),
textStyle =list(fontSize= 20, fontFamily= 'Arial',
fontWeight ='bold',
color= '#69cce6'),
formatter="{@[1]}% {@[0]}") %>%
e_legend(show = FALSE) %>%
e_theme("westeros")

Show icon as label in plot

In label =, mention unicode of the fontawesome icon.

library(ggplot2)
ggplot (mtcars) +
geom_text( aes ( mpg , wt , colour = factor ( cyl )),
label = "uf1b9" ,
family = "FontAwesome" ,
size = 7)
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

Read More
18
Jun

Understanding Item Response Theory with SAS

What is Item Response Theory? Item Response Theory (IRT) is a way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability. A common application is in testing a student’s ability or knowledge. Today, all major psychological and educational tests are built using IRT. [...]

Understanding Item Response Theory with SAS was published on SAS Users.

Read More
Back to Top