Creating an AI Assistant for SAS Viya in 5 steps (@sassoftware/viya-assistantjs) - Part I
Recent Library Articles
Recently in the SAS Community Library: SAS' @kumardeva debunks the myth that developing AI assistants is too hard. He shows you how to use the @sassoftware/viya-assistantjs library to jump start your development.
Democratizing in this context means making Databricks accessible to a wider range of users within an organization, not just code focused data scientists or engineers. By combining SAS Viya's user-friendly interfaces and Databricks Lakehouse, more people within an organization can participate in data analysis, decision-making, and innovation, regardless of their technical background. This approach enables organizations to maximize the value of their data assets and drive better business outcomes.
Democratizing tribal knowledge in a digital catalog
Democratizing Databricks with SAS Viya involves converting informal and tacit tribal knowledge about data, analytics, BI dashboards, pipelines, business rules, and decisions into easily accessible digitalized knowledge. This is facilitated by the SAS Information Catalog, which provides effortless access to such information.
Irrespective of the data's location, which in this instance is a Databricks Lakehouse, the SAS Information Catalog offers users a comprehensive understanding of their data from both business and technical perspectives. On the technical front, it provides details like data size, column count, and storage format (e.g., Spark). Meanwhile, from a business standpoint, it covers aspects such as status (approved, under review, flagged etc.), associated business terms and tags, data ownership, automatic explanation, information privacy, and semantic type detection (for PII compliance).
Lastly, from an analytics standpoint, SAS Information Catalog automatically analyzes your columns to aid in understanding quality issues and determining if the data is suitable for analytics purposes.
Figure 1: SAS Information Catalog shows technical and business metadata for a table in Databricks.
Democratizing data literacy through appropriate tools
The SAS Information Catalog is a valuable resource for all users within an organization and facilitates the pursuit of a highly data-literate organization. Improving data literacy is also about providing the proper tools to users. For those eager to delve deep into their Databricks sources, SAS Visual Analytics offers a comprehensive suite of capabilities. Suitable for a wide array of users, including statisticians, data scientists, data engineers, and business analysts, SAS Visual Analytics facilitates ad-hoc analysis for a thorough understanding of Databricks sources. Users can leverage both basic and advanced analytics features extensively. Furthermore, for individuals aiming to create highly informative and visually compelling dashboards that accelerate time to market, SAS Visual Analytics seamlessly integrates robust statistical analysis with appealing business intelligence dashboards.
Figure 2: Dashboard providing insight to readmissions for hospitals in an area of Sweden.
Navigating from SAS Information Catalog to SAS Visual Analytics is as simple as clicking on the Actions menu and choosing "Explore and Visualize." This action automatically transfers the Databricks Spark source into high-speed SAS memory, ensuring immediate accessibility for a wide range of analysis capabilities within SAS Visual Analytics.
Figure 3: Analyzing a Databricks data source in SAS Visual Analytics.
Democratizing code development with the right tools
Truly skilled coders, along with exceptional data scientists and engineers, are a rare and highly sought-after group. This scarcity can lead to a potential lag in important innovations and hinder the achievement of business growth and objectives. However, trying to convert other groups of business users within an organization into proficient coders is a misguided approach. So, how can you empower the rest of your organization—individuals who possess deep business knowledge and a profound understanding of the data—to develop data and analytics pipelines without investing extensive time in enhancing their technical skills?
Patric Hamilton elucidates this development process in his latest blog post, Data Brilliance Unleashed: SAS Data Quality against Databricks - Precision, Performance, Perfection. The key lies in offering user-friendly tools capable of handling both straightforward tasks and intricate, technical coding without requiring users to write a single line of code. The crucial aspect is to ensure these tools facilitate collaboration, enabling both coders and non-coders to assist each other in transforming innovations into production.
Figure 4: SAS Studio data pipeline for entity resolution to create golden or master records.
Conclusion
This comprehensive approach to making Databricks accessible with SAS Viya involves three main strategies: using SAS Viya's user-friendly interfaces and Databricks Lakehouse to increase access, making tribal knowledge easily available through the SAS Information Catalog, and equipping users with the necessary tools for data literacy and code development. By making Databricks available to a wider range of users and providing intuitive tools for analysis and collaboration, organizations can maximize the value of their information assets and achieve better business outcomes.
Learn more about SAS and Databricks
Harness the analytical power of your Databricks platform with SAS
Data everywhere and anyhow! Gain insights from across the clouds with SAS
Elevated efficiency and reduced cost: SAS in the era of Cloud Adoption
SAS and Databricks: Your Practical Guide to Data Access and Analysis
Data to Databricks? No need to recode - get your existing SAS jobs to SAS Viya in the cloud
Maximize Coding and Data Freedom with SAS, Python and Databricks
Data Brilliance Unleashed: SAS Data Quality against Databricks - Precision, Performance, Perfection
Please note: Data used in screenshots are either fictive demo data or open public data.
... View more
Hello, I am getting the following error messages when trying to merge two datasets. One of the datasets I am getting from a csv file, so maybe the issue could be there? I was trying to specify the length of the PID variable for the redcap_sort dataset from the redcap one, which is the one we got from the csv file. However, I keep getting messages that the variable has multiple lengths and it keeps truncating the data. Any PID after 999 gets shortened. So 1000 and 1001 become 100, 1010 becomes 101, etc. Any help or a nudge in the right direction would be greatly appreciated, thank you so much. Edit: The programming with the csv file already has: data work.redcap; %let _EFIERR_ = 0;
infile &csv_file delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=1 ;
informat pid $500. ;
informat pid_ini $500. ; and the code for format: format pid $500. ; It has this for all the variables. I thought the above code would make it so that the variables would have that limit of 500 characters?
... View more
**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on: How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step? Does anyone have alternate/preferred strategies for reordering the variables in a dataset? Thanks! *********************************************************************** Original post: Hello SAS community, I'm very confused about how SAS deciphers "IF" Statements in the DATA step. In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date. Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM. Here's what I tried: DATA WORK.dates_test;
SET WORK.raw_dates;
close_eff_n = acct_close_dte_n; IF closed = 1 AND acct_close_dte_n = . THEN DO; close_eff_n = maxdate_n; END;
*(omitting some additional logic used here for parsimony);
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
RUN; I had earlier written this last segment as: close_dte_eff = put(close_eff_n,yymmn.); but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic. The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect. e.g. I have: close_eff_n 01MAR2023 01APR2023 . . 01JUL2021 I want: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . . 01JUL2021 202107 But instead I get: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . 202304 . 202304 01JUL2021 202107 When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements. I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening? There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic. I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step. I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to. Please let me know if I can provide any other context or information that would be helpful. Many thanks, Scott
... View more