29
Jun

Key Value Object in SAS Visual Analytics

Introduced with SAS Visual Analytics 8.2 is a new object named: Key Value. The intent of this object is call attention to an aggregated value for a measure, a category, or both. For additional specifics,

I’ve mocked up several reports to show some of the combinations available to give you an idea of what the Key Value object can look like. Toward the end of the blog, I will add additional reports to provide design ideas for placement and action assignments. Click on any image to enlarge.

Text Style with Measure Value Highlight

Here you can see in the report, I am highlighting two measure values. Both are representing the Highest value but I selected one to show the aggregation and the other not. I was able to mimic similar headings by renaming my data item. Be sure to look at the Options and Roles pane for the assignments to understand how I accomplished this.

Infographic Style with Measure Value Highlight

Here is the same information represented using the Infographic style. Seeing the same report using the different styles allows you to quickly determine the most powerful and appropriate visualization to meet your needs. We cannot control the size of the circle, only the color. In this case, the circles are different thicknesses because of the number of characters used to represent the measure values inside the visual.

Text Style displaying both Measure Value and Category

In this report, I have shown how to use the Text style to display both a Category value and a Measure value. As you can see, only one can be Highlighted, i.e. given the largest font. Notice that in this report I used the object’s Title to help explain the key value being displayed, this is a recommended best practice.

Infographic Style displaying both Measure Value and Category

Below I am using the Infographic style and notice in the Options pane below that I had to use the Additional information attribute to better label the data. Make sure that when you are in the design phase and toggling between text and infographic to review and test the available Key Value Style attributes to better label the visual.

Text Style with Category Value Highlight

In this report, I show how to highlight the Category value using the Text style. Since I chose to not use any of the available label attributes it is critical that I use the object’s Title to better explain the key value displayed.

Infographic Style with Category Value Highlight

In this report, you can see how I changed the layout from the previous report to make the Key Value object side-by-side the other report objects. If you are interested in using the Infographic style with the circle enabled then you may have to adjust your report design to accommodate for the space the circle needs to display. Remember not to shy away from adding white space to your report, it can assist when adding emphasis to a particular visual, in this case a key value.

Summary

Some important things to remember about using the Key Value object in your reports:

  • Use the Key Value object’s Title to inform your users what the number or category value represents so there is no ambiguity as to if they are looking at the maximum or minimum value.
  • When determining which style you prefer, Text or Infographic, it may be easier to make a duplicate of the Page and then adjust the style attributes till you find the desired combination.
  • Take time to adjust the arrangement of objects on your report to get the most pleasing configuration. Don’t shy away from leaving white space in your report. You can also experiment with the Container object using the Precision container type to layer the Key Value object.
  • The Key Value object will be affected by Report and Page Prompts like any other report object and you can even define Actions to filter the Key Value object.

Here are some additional examples of using the Key Value object:

In this example, you can see from the Actions Diagram how the Key Value object is being filtered. First, by the two page prompts and second, there is a direct filter action defined from the List Control object

In this example, we can see from the Actions Diagram that I used the Container object. I then selected the Precision container type and overlaid the Key Value object on the Line Chart. The only filters applied to these report objects are the page prompts.

And in this last example, you can see how I have no Report or Page prompts or any other filters impacting the Key Value objects. Therefore, these values are representative for the entire data.

Key Value Object in SAS Visual Analytics was published on SAS Users.

13
Jun

A look at the new data pane and data item features in SAS Visual Analytics 8.2

In SAS Visual Analytics 8.2 on SAS Viya 3.3, there are a number of new data features available. Some of these features are completely new, and some are features from the 7.x release that had not yet been included in the 8.1 release.  I’ll cover a few of these new features in this post.

First of all, the Data pane interface has changed to enable users to access actions via fewer and better organized menus.

Data item properties can also be displayed for viewing or editing with a single click.

The new Change data source action displays a Repair report window if report data items are not in the new data source.  The window enables you to replace the missing data items with replacement data items from the new data source before continuing with the change.

Speaking of mapping, in SAS Visual Analytics 8.2, linked selections and filters can automatically be add to objects, and the objects may use different data sources. In that case, you can manually map data sources from the data pane.  The + icon enables you to add additional pairs of mappings.

When you create a new Geography data item in SAS Visual Analytics 8.2, in addition to using Predefined names and codes or your own custom latitude and longitude data items, you can now also use custom polygon shapes to display your own custom regions. Once you select Custom polygon shapes, you specify, in additional dialogs, the characteristics of your polygon provider.  You can use a CAS table or an Esri Feature Service.

For more information on custom polygons, see my previous blog here.

If you need to use and Esri shape file for your polygon data, there are macros available in VA 8.2 to convert the data to a SAS dataset and to load the data into CAS.

  • %SHPCNTNT display the contents of the shape file
  • %SHPIMPRT converts the shapefile into a SAS dataset and loads it into CAS.

The Custom Sort feature is also back in SAS Visual Analytics 8.2. Just right-click the data item, select Custom sort, and then select and order your data values.

For creating a new derived data item, there are several new calculations available for measures:

And speaking of creating calculated data items, you’ll want to check out three useful new operators that are available in SAS Visual Analytics 8.2:

A look at the new data pane and data item features in SAS Visual Analytics 8.2 was published on SAS Users.

8
Jun

Saving and reloading SAS Viya configuration

SAS Viya provides import and export functionality for user-created content like reports and data plans. Often, in addition to content, an administrator will want to save configuration so that it can be reloaded or updated and applied to a different system. SAS Viya provides the capability to save and reload configuration using the SAS Viya command-line interfaces that are previous blog post.

The

It is possible to save a set of configuration settings and reload them to the same or a different system. This can be useful when you have your configuration established and you wish to keep a backup, or make a selective backup of configuration prior to making a change.

The connection to LDAP is a key early step in a SAS Viya implementation. With the configuration CLI, once you have the SAS Viya LDAP configuration established, you can export it to a file, and then use that file (with any necessary modifications) to stage additional systems, or as a backup prior to making changes to your existing systems configuration.

How to save and reload configuration

As always, when using the command-line interfaces you must

./sas-admin configuration configurations list --definition-name sas.identities.providers.ldap.user  --service identities

 

Next, using the id from the previous step you can list the configuration properties.

./sas-admin configuration configurations show -id b313a5a7-1c73-4f4a-9d3d-bba05b626939

 

Save LDAP Configuration

The save process creates json files. The following steps use the download command to save to json files the connection, user and group configuration instances for the SAS Viya connection to LDAP.

./sas-admin configuration configurations download --target /tmp/ldapconnection.json  --definition-name sas.identities.providers.ldap.connection  --service identities
 
 
./sas-admin configuration configurations download --target /tmp/user.json  --definition-name sas.identities.providers.ldap.user  --service identities
 
 
./sas-admin configuration configurations download --target /tmp/group.json  --definition-name sas.identities.providers.ldap.group  --service identities

 

You should open the json files and check that the correct configuration has been saved. It is possible for the process to complete without errors and return json that is not what you are expecting. This would cause problems with your reload, so checking the saved json is important.

You can keep the JSON file as is, or make changes to key attributes. You may want to do this if you are importing to a different system.

Load the SAS Viya LDAP Configuration

To load you simply use the update command and pass the json file.

./sas-admin configuration configurations update --file /tmp/ldapconnection.json
 
./sas-admin configuration configurations update --file /tmp/user.json
 
./sas-admin configuration configurations update --file /tmp/group.json

 

The impact of isDefault

There is a value, isDefault, stored within the configuration which has an impact on the persistence of changes made to configuration.

isDefault impacts how services treat existing configuration when a service starts. When a service starts a setting of:

  • isDefault=true in the existing configuration means the service will overwrite the configuration object with new defaults.
  • isDefault=false in the existing configuration means the service will NOT overwrite the existing configuration object.

In other words, if the configuration is flagged as “default” then the service is permitted to update or add to the default values.

Objects created by the services at startup always have isDefault set to true. Objects created in Environment Manager always have isDefault set to false. This means changes in Environment Manager are always respected by services on restart, they will not be overwritten.  But services are allowed to overwrite their own defaults at startup.

When using the CLI, the administrator needs to decide what is the appropriate value for isDefault. If you require the configuration change to persist across service restarts then set isDefault=false.

Saving and Reloading Micro-Service Logging Levels

Let’s look at another use case for save and reload of configuration. Updating micro-service logging configuration levels in batch can be very useful. You may want to save your current logging configuration and modify it to raise logging levels. You may create multiple json files with different logging configurations for different scenarios. When debugging an issue in the environment you could load a verbose logging configuration. If you wish to keep the new configuration you would edit the json and set IsDefault=false.

The step below saves all configuration instances created from the logging.level configuration definition. These configuration instances control the logging level for the SAS Viya microservices and servers.

./sas-admin configuration configurations download --definition-name logging.level -target /tmp/default_logging.level.txt

 

If you wish to persist your new logging configuration, edit the file to set metadata.isDefault=false, save the new file and then and update the logging configuration using the update command:

./sas-admin configuration configurations update --file /tmp/new_logging.level.txt

 

When you are done, you can use the original file to reset the logging level back to default values.

In most cases a server restart is not required after a configuration update, find details in the administration guide.

Saving and reloading SAS Viya configuration was published on SAS Users.

7
Jun

SAS Visual Analytics on the new SAS platform

Who doesn’t love a good makeover story? We know we do at BNL Consulting—and so do our customers. That’s why we’re thrilled with SAS Visual Analytics sophisticated new look and feel as well as its expanded functionality. SAS Visual Analytics takes advantage of the performance and scalability of the SAS Viya platform, providing a Business Intelligence framework that can work with massive amounts of data, bringing forward the powerful analytics that has made SAS the market leader in this space.

With SAS Viya, we get an even more unified platform with a consistent look and feel across applications which have been rewritten as zero client HTML5 web applications. The visualizations that are available in SAS Visual Analytics are embedded within the other tools as well. Now customers—from analysts to business executives—can point and click all from their web browser to get the information they need to make better business decisions right out of the box.

The newest version of SAS Visual Analytics provides direct access to an in-memory set of data as well. The process of loading data that can be made available to SAS Visual Analytics has been streamlined to give users fast access to information to create dashboards, reports and perform ad-hoc analysis. And because the application is so intuitive, users can be up and running in a matter of hours, not days.

In what has become a highly competitive market for business analytics—with BI tools being ubiquitous—SAS Visual Analytics running on SAS Viya provides a formidable means to analyze data, helping organizations tell a story—simply and easily. With SAS Visual Analytics on their side, customers will no longer need to reach outside the platform and make additional purchases. The unified platform is extremely compelling.

New features and benefits

SAS Visual Analytics customers will also benefit from a number of new features, including:

  • A new look and feel that provides drag and drop layout controls to size dashboard objects. New graph, chart, and map objects offer greater control of the properties to make them presentation-ready.
  • Objects on the screen can “talk” to each other, listening in on object events to filter accordingly in unison.
  • With no more LASR server, the process of loading and accessing data is much faster, with even better performance than the previous in-memory LASR server architecture.
  • There is now the ability to create custom HTML5/JavaScript objects, which can be hooked up to the data loaded in memory (data-driven content). HTML/JavaScript/D3 has become the standard in web development, opening up almost countless options for SAS to visualize data and turn it into useful information. Before, custom development was sometimes needed, but SAS Visual Analytics can handle a customer’s needs, all with point and click, drag and drop.

Next steps

When organizations use BI tools, they expect appealing visuals on top of rich data that can tell a story. SAS Visual Analytics on SAS Viya delivers in a big way, ushering in a new era for SAS BI users. We encourage customers to try out SAS Visual Analytics and see for themselves. We have found that a successful path is to start with small-scale proof of concepts, designed to satisfy a few critical uses. As you become more comfortable, expand to include more users and different types of projects.

With all the software options out there—both COTS and Open Source—we want customers to feel confident that SAS Visual Analytics for SAS Viya is the right choice. For more details, you can contact BNL at BNLConsulting.com.

SAS Visual Analytics on the new SAS platform was published on SAS Users.

31
May

Using Dynamic Text in a SAS Visual Analytics Report

You will not find an object in SAS Visual Analytics named Dynamic Text. Instead, you will find a Text object that allows you to insert dynamically driven data items. By using the Text object’s dynamic capabilities you can build custom report titles, object titles, emphasize measures and even supply the last modified time of the data source in your SAS Visual Analytics Report. In this post, I will outline the ways how you can leverage the Text object’s dynamic capabilities.

In this example report below, I have used a red font color to indicate the dynamically driven text.
Dynamic Text in a SAS Visual Analytics Report

Let’s take a look the available dynamic roles in the Text object. You can see from the Objects pane that the Text object is grouped under Other.

From the Data pane we have the ability to add both Measure and Parameter data items. From the interactive editor of the Text object shown below, we also have the ability to insert the Table Modified Time and Interactive Filters.

The following sections will demonstrate how to configure each of these dynamically driven elements of the Text object.

Interactive Filters

The out of the box display for Interactive Filters includes the selected values for control objects added to either the Report or Page Prompt areas.

To edit, be sure you are in Edit mode of Explore and Visualize. Click on the Text object to make it the active window and double click inside, then the interactive editor will open. Next, click on the Interactive Filters button. Use your cursor to position where you would like to add static text. In this case, I added the qualifier Default filter information:.

Multiple control object values are separated by a comma and also accommodates multi-value control objects.

Parameters

While the Interactive Filter functionality is extremely useful, you may want to use prompt values more granularly to create custom report titles or even object titles. To do this, you must first create a parameter to hold the value selected in the control object, then use that parameter in the Text object.

In my example report, I have two prompts and two custom object titles leveraging parameters. Let’s look at each one individually.

First is the Report Prompt, which prompts for year.

1.     Create your prompt by using the Control object of your choice and assigning the desired data role.
2.     Create a parameter that corresponds to the data type and assign it to the Control object’s Parameter Role.
3.     For the Text object, assign the same parameter to the Text object’s Parameter Role.
4.     Double click on the Text object, use your cursor to add static text as you like.

The steps are similar for the Page Prompt, which prompts for region.

1.     Create your prompt by using the Control object of your choice and assigning the desired data role.
2.     Create a parameter that corresponds to the data type and assign it to the Control object’s Parameter Role.
3.     For the Text object, assign the same parameter to the Text object’s Parameter Role.
4.     Double click on the Text object, use your cursor to add static text as you like.

Even though I demonstrate how to do this for both Report and Page Prompts, this same technique can be used for report canvas prompts. You just have to be sure you store the selected value(s) in a parameter that you can then use in the Text object’s Parameter Role.

Measures

Very much the same way the Text object’s Roles are used to assign the Parameter values, we can do the same thing with a measure. This measure will be affected by any Report or Page Prompts automatically, but if you want to use a report canvas prompt you will need to create the Actions to the Text object appropriately.

Here you can see we are using the measure TotalExpense which is an aggregated measure of Expenses. Like in the previous examples, be sure to assign the measure to the Text object then double click to open the editor and use your cursor to add the static text.

The only applied filters for this aggregated measure are the selected year and region, therefore this Sum _ByGroup_ will return the Total Expenses for that Year and Region.

Table Modified Time

The last capability of dynamic text available in the Text object is the Table Modified Time.

The out of the box display uses the fully qualified datetime stamp and cannot be altered to a different format. To edit, double click inside the Text object and the editor will open. Then click on the Table Modified Time button. Next, use your cursor to position where you would like to add static text. In this case, I added the qualifier Data last updated:.

Conclusion

There are two main takeaways from this blog post. First is that you can easily build dynamic customizable titles, emphasize measures or parameter values.

Second, look to use the Text object for your dynamic text needs.

Here is a quick mapping as a review of what was detailed in the steps above.

 

Using Dynamic Text in a SAS Visual Analytics Report was published on SAS Users.

7
May

SAS tools for GDPR privacy compliant reporting

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym still does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

SAS tools for GDPR privacy compliant reporting was published on SAS Users.

19
Apr

SAS Visual Analytics filters on periodic calculations: Apply them or ignore them!

In SAS Visual Analytics 7.4 on 9.4M5 and SAS Visual Analytics 8.2 on SAS Viya, the periodic operators have a new additional parameter that controls how filtering on the date data item used in the calculation affects the calculated values.

The new parameter values are:

SAS Visual Analytics filters

These parameter values enable you to improve the appearance of reports based on calculations that use periodic operators. You can have periods that produce missing values for periodic calculations removed from the report, but still available for use in the calculations for later periods. These parameter settings also enable you to provide users with a prompt for choosing the data to display in a report, without having any effect on the calculations themselves.

The following will illustrate the points above, using periodic Revenue calculations based on monthly data from the MEGA_CORP table. New aggregated measures representing Previous Month Revenue (RelativePeriod) and Same Month Last Year (ParallelPeriod) will be displayed as measures in a crosstab. The default _ApplyAllFilters_ is in effect for both, as shown below, but there are no current filters on report or objects.

The Change from Previous Month and Change From Same Month Last Year calculations, respectively, are below:

The resulting report is a crosstab with Date by Month and Product Line in the Row roles, and Revenue, along with the 4 aggregations, in the Column roles.  All calculations are accurate, but of course, the calculations result in missing values for the first month (Jan2009) and for the first year (2009).

An improvement to the appearance of the report might be to only show Date by Month values beginning with Jan2010, where there are no non-missing values.  Why not apply a filter to the crosstab (shown below), so that the interval shown is Jan2010 to the most recent date?

With the above filter applied to the crosstab, the result is shown below—same problem, different year!

This is where the new parameter on our periodic operators is useful. We would like to have all months used in the calculations, but only the months with non-missing values for both of the periodic calculations shown in the crosstab. So, edit both periodic calculations to change the default _ApplyAllFilters_ to _IgnoreAllTimeFrameFilters_, so that the filters will filter the data in the crosstab, but not for the calculations. When the report is refreshed, only the months with non-missing values are shown:

This periodic operator parameter is also useful if you want to enable users to select a specific month, for viewing only a subset of the crosstab results.

For a selection prompt, add a Drop-Down list to select a MONYY value and define a filter action from the Drop-Down list to the Crosstab. To prevent selection of a month value with missing calculation values, you will also want to apply a filter to the Drop-Down list as you did for the crosstab, displaying months Jan2010 and after in the list.

Now the user can select a month, with all calculations relative to that month displayed, shown in the examples below:

Note that, at this point, since you’ve added the action from the drop-down list to the crosstab, you actually no longer need the filter on the crosstab itself.  In addition, if you remove the crosstab filter, then all of your filters will now be from prompts or actions, so you could use the _IgnoreInteractiveTimeFrameFilters_ parameter on your periodic calculations instead of the _IgnoreTimeFrameFilters_ parameter.

You will also notice that, in release 8.2 of SAS Visual Analytics that the performance of the periodic calculations has been greatly improved, with more of the work done in CAS.

Be sure to check out all of the periodic operators, documented here for SAS Visual Analytics 7.4 and SAS Visual Analytics filters on periodic calculations: Apply them or ignore them! was published on SAS Users.

30
Mar

Play with classification of Iris data using gradient boosting

Gradient boosting is one of the most widely used machine learning models in practice, with more and more people like to use it in Kaggle competitions. Are you interested in seeing how to use gradient boosting model for classification in SAS Visual Data Mining and Machine Learning? Here I play with the classification of Fisher’s Iris flower dataset using gradient boosting, and this may serve as a start point to those interested in trying the classification models in SAS Visual Data Mining and Machine Learning product.

Fisher’s Iris data is a well-known dataset in data mining. Per Wikipedia, Fisher developed a linear discriminant model to distinguish the species from each other by the features provided in the dataset. You may already see people run different classification models on this dataset, such as neural network. What I am interested in, is to see how well SAS gradient boosting model will do the species classification.

#1  Explore the dataset

We can easily load Fisher’s Iris dataset from SASHelp.Iris into SAS Viya. The dataset consists of 50 samples each species of Iris SetosaVirginica and Versicolor, totally 150 records with five attributes: Petal Length, Petal Width, Sepal Length, Sepal width and Iris Species. The dataset itself is already well-formed, with neither missing values, nor outliers. Take a quick look of the dataset in SAS Visual Analytics as below.

Gradient boosting

From the chart, we see that the iris species of ‘Setosa’ can be easily distinguished from the ‘Versicolor’ and ‘Virginica’ species by the length and width of their petals and sepals. However, this is not the case for the latter two species, some of them are staggered closely, which makes it a little hard to distinguish each other by these features.

#2  Prepare Data

There is not much effort needed to prepare the data for the prediction. But one thing I’d like to mention here is about the standardization of measure variables. By viewing the measure details in SAS Visual Analytics, we see that neither Petal Length distribution nor Petal Width distribution is normal. You may wonder if we need to normalize the data before applying it to the model for analysis, but this leads to one great thing I like the Gradient Boosting model. Users do not need to explicitly standardize quantitative data. Tree-base models should be robust to such problem in an input feature, since the algorithm is based on node splits. (Here is an article discussing a similar problem.)

So, here my data preparation is just doing the data partitioning before starting the classification on iris species. I need to make sure each partition will follow the same distribution on different species in the iris dataset. This can be achieved easily in SAS Visual Analytics by adding a partition data item - by setting the Sampling method to ‘Stratified sampling’ and add the ‘Iris Species’ as the column to be stratified by. I define two partitions so I have training partition, validation partition. I set 60% for training, and 40% for validation partition, with random seed 1234. Thus, a categorical data item ‘Partition’ is added, with value of 0 for validation, 1 for training partition. (For easier understanding in the charts, I’ve created a custom category called ‘Partitions’ based on the ‘Partition’ data item values.)

The charts below show that the 150 rows in Fisher’s Iris dataset are distributed equally into three species, and the created partitions are sampled with the same percentage among the three species.

#3  Train the gradient boosting model

Training various models in SAS Visual Data Mining and Machine Learning allows us to appreciate the advantages of visualization, and it’s very straight-forward for users. In ‘Objects’ tab, drag and drop the ‘Gradient Boosting’ to the canvas. Assign the ‘Iris Species’ as response variable, and ‘Petal Length, Petal Width, Sepal Length, Sepal width’ as predictors. Then set the ‘Partition’ data item for Partition ID. After that, the system will train the model and show the model assessment. I’ve taken a screenshot for ‘Virginica’ event as below.

The response variable of Iris Species has three event levels – ‘Setosa’, ‘Versicolor’ and ‘Virginica’, and we can choose desired event level to have a look of the model output. In addition, we may switch the assessment plot of Lift to ROC plot, or to Misclassification plot (Note: the misclassification plot is based on event level, thus it will show the ‘Setosa’ and ‘NOT Setosa’ species if we choose the ‘Setosa’ event.). Below is a screenshot with ROC plot and the model assessment statistics.

In practice, training models usually cost a lot of effort in tuning model parameters. SAS Visual Data Mining and Machine Learning has provided the ‘Autotune’ feature that can help this, users may decide some settings like maximum iterations, seconds, and evaluations and the product will choose the optimal values for the hyperparameters of the model. Considering that this dataset only has 150 samples, I won’t bother to do the hyperparameters tuning.

#4  Make prediction by the model

Now I can start to make predictions from the gradient boosting model for the data in testing partition. There are several ways to go here. In Visual Data Mining and Machine Learning, on the right-button mouse menu, either click the ‘Export model…’ or click the ‘Derive predicted…’ menu. The first one will export the model codes, so you can run it in SAS Studio with your data to be predicted. The latter one is very straight-forward in SAS Visual Data Mining and Machine Learning. It will pop up the ‘New Prediction Items’ page, where you may choose to get the predicted value and its probability values for all the levels of Iris Species. These data items will be added to the iris CAS table for further evaluation. Since the iris dataset has three species in the sample, I need to set ‘All levels’ so the prediction will give out the classification in three species and their probabilities.

#5  Review the prediction result

In the model assessment tab, we already see the model assessment statistics for model evaluation. We may also switch to ‘Variable Importance’ tab, or ‘Lift’ tab, ‘ROC’ tab, and ‘Misclassification’ tab to see more about the model. Here I’d like to visually compare the predicted species value with the iris species value provided in the dataset.

To show how many failures of the classification visually, I perform following actions:

  • In SAS Visual Analytics, create a list table to show all 150 rows of the iris dataset. Since there is no primary key in the dataset, the SAS Visual Analytics list table will do aggregation for measure variables by default, so be sure to set the ‘Detail data’ option in the Options tab.
  • Create a calculated item (named ‘equals’) to compare if the values of ‘Iris Species’ and ‘Predicted: Iris Species’ columns are equal: {IF ( 'Iris Species'n = 'Predicted: Iris Species'n ) RETURN 1 ELSE 0. }
  • Define a display rule with the calculated item to highlight the misclassified rows. I’ve sorted the table by above ‘equals’ value so those rows without equal value of ‘Iris Species’ and ‘Predicted : Iris Species’ columns are shown on top.

We see four rows are misclassified by the model, 3 of them are from training partition and 1 from validation partition. So far, the result looks not bad, right?

We may continue to tune the parameters of gradient boosting model easily in SAS Visual Data Mining and Machine Learning, to improve the model. For example, if I set smaller leaf size value to 2 instead of the default value of 5, the model accuracy will be improved (too good to be true?). See below screenshot for a comparison.

Of course, people may like to try tuning other parameters, or to generate more features to refine the model. Anyway, it is easy-to-use and straight-forwarded to do classification using gradient boosting model in SAS Visual Data Mining and Machine Learning. In addition, there are many other models in SAS Visual Data Mining and Machine Learning people may like to run for classification. Do you like to play with the other models for practicing?

Play with classification of Iris data using gradient boosting was published on SAS Users.

26
Mar

Authoring a data-driven book - yes, we do that!

At SAS we use data to accomplish many of our everyday tasks. At SAS Books, we have now even used data to create a data-driven book, An Introduction to SAS Visual Analytics.

The post Authoring a data-driven book - yes, we do that! appeared first on SAS Learning Post.

16
Mar

SAS Data Studio Code Transform (Part 1)

SAS Data Studio is a new application in SAS Viya 3.3 that provides a mechanism for performing simple, self-service data preparation tasks to prepare data for use in SAS Visual Analytics or other applications. It is accessed via the Prepare Data menu item or tile on SAS Home. Note: A user must belong to the Data Builders group in order to have access to this menu item.

In SAS Data Studio, you can either select to create a new data plan or open an existing one. A data plan starts with a source table and consists of transforms (steps) that are performed against that table. A plan can be saved and a target table can be created based on the transformations applied in the plan.

SAS Data Studio Code Transform

SAS Data Studio

In a previous blog post, I discussed the Data Quality transforms in SAS Studio.  This post is about the Code transform which enables you to create custom code to perform actions or transformations on a table. To add custom code using the Code transform, select the code language from the drop-down menu, and then enter the code in the text box.  The following code languages are available: CASL or DATA step.

Code Transform in SAS Data Studio

Each time you run a plan, the table and library names might change. To avoid errors, you must use variables in place of table and caslib names in your code within SAS Data Studio. Indicating variables in place of table and library names eliminates the possibility that the code will fail due to name changes.  Errors will occur if you use literal values. This is because session table names can change during processing.  Use the following variables:

  • _dp_inputCaslib – variable for the input CAS library name.
  • _dp_inputTable – variable for the input table name.
  • _dp_outputCaslib – variable for the output CAS library name.
  • _dp_outputTable –  variable for the output table name.

Note: For DATA step only, variables must be enclosed in braces, for example, data {{_dp_outputTable}} (caslib={{_dp_outputCaslib}});.

The syntax of “varname”n is needed for variable names with spaces and/or special characters.  Refer to the Avoiding Errors When Using Name Literals help topic for more Information.  There are also several

CASL Code Example

The CASL code example above uses the ActionSet fedSQL to create a summary table of counts by the standardized State value.  The results of this code are pictured below.

Results from CASL Code Example

For more information on the available action sets, refer to the SAS® Cloud Analytic Services 3.3: CASL Reference guide.

DATA Step Code Example

In this DATA step code example above, the BY statement is used to group all records with the same BY value. If you use more than one variable in a BY statement, a BY group is a group of records with the same combination of values for these variables. Each BY group has a unique combination of values for the variables.  On the CAS server, there is no guarantee of global ordering between BY groups. Each DATA step thread can group and order only the rows that are in the same data partition (thread).  Refer to the help topic

Results from DATA Step Code Example

For more information about DATA step, refer to the In my next blog post, I will review some more code examples that you can use in the Code transform in SAS Data Studio. For more information on SAS Data Studio and the Code transform, please refer to this SAS Data Studio Code Transform (Part 1) was published on SAS Users.

Back to Top