I’m excited to announce the Dataverse 3.1.4 release is now available.
This GA release boosts Dataverse's data discovery and data-driven authoring capabilities by introducing data quality indicators, histograms and analysis statistics in the Data Viewer, and enhancing the Data Viewer to allow the identification of missing data using Is Null/ Is Not Null filter criteria.
The Excel File node has been enhanced to streamline the acquisition of data from Excel files. The introduction of the new Experimental Python Transform node simplifies the processing of data using the industry-standard Python language. The Python Transform node will be used as the basis for new nodes that will progressively replace a number of C++ nodes that currently are based on the existing Transform node which uses the Dataverse script language.
The Convert to Library Node functionality streamlines the process of creating reusable analytic components.
The Dataverse Enterprise Server edition's capabilities have been enhanced with the introduction of the Explorer role, integration with CA Technologies' Single Sign-On (Siteminder) for seamless user authentication, improved integration with LDAP/AD directory systems, and scheduling feature enhancements. The release also delivers numerous performance, stability and usability improvements.
Data Viewer Enhancements
Data Quality Indicators
The new data quality indicators provide an immediate view of potential data quality issues in each field.
The length and color of the bar indicates the scale of the issues in (the sample of) the data displayed in the Data Viewer.
Clicking on a field’s Data Quality Indicator bar provides access to a histogram and field statistics:
The statistics displayed depend on the field’s data type (e.g. string, double or date). You can move the ends of the grey bar under the main histogram to zoom in on a section of the data values. Dragging the bar moves the visible section of the zoomed histogram.
Additional Data Viewer authoring capabilities
The Data Viewer's filter operators now include 'Is Null' and 'Is Not Null' which streamlines the identification and handling of missing data.
When the required filter criteria have been applied in the Data Viewer, the filter criteria can be added to the canvas as a configured Filter node or Split node - without using any scripting:
Excel File Node
The Excel File node now attempts to determine the data type of the imported data fields - simplifying the acquisition of data from spreadsheets. Where it is not possible to determine the data type e.g. because the field’s data contains cells with mixed data types (numeric and string, etc) the data will be output with the ‘default data type’ of unicode.
Note: this enhancement may affect the operation of subsequent nodes in existing data flows, when the subsequent nodes are configured to expect all unicode fields. If the original node behavior is required on a specific node the node's new ‘DetectFieldTypes’ property can be set to False. A Dataverse compatibility property also enables the behavior of all Excel File nodes to be set to the original operation should that be required, see the release notes for additional information.
Python Transform Node (Experimental)
A new Experimental node is now available that enables you to use the Python language to transform data.
Use of a standard programming language reduces the learning curve for new users, allows you to use existing coding skills, and enables you to leverage the wealth of publicly available help/resources on the usage of Python. Standard Python functions can also be used to leverage Python functionality that is not available with Dataverse script.
The node palette’s ‘Show Experimental Nodes’ option controls the visibility of the Experimental nodes:
Note: As the status of the node is currently ‘Experimental’ it is not yet fully supported and may be subject to change. Lavastorm would welcome your feedback on this node.
Unlike with the existing Transform node, the Python Transform node uses a two-step configuration approach:
- Specify input/ output metadata mappings and the metadata for new output fields
- Specify the processing logic to apply to each record
The metadata for the mapping of input fields to output fields is explicitly defined in node’s the ‘ConfigureFields’ property:
New output fields can be defined by specifying the name and the data type of the new field in the ‘ConfigureFields’ property, for example:
out1.newField = Unicode
out1.DOB = datetime.date
You specify the required transform logic in the ‘ProcessFields’ property:
You can define variables and calculated values in the ‘ProcessFields’ property, and use them in your transform logic, for example if a new output field had been defined using:
out1.newValue = int
Then ‘ProcessFields’ could be configured to output a calculated value in the new field using:
offset = 10
out1.newValue = in1.id + offset
See the node help for additional configuration details.
JDBC Query and JDBC Execute Performance
The performance of the JDBC Query node and JDBC Execute node has been significantly improved when querying data in Oracle, especially for larger results sets.
The nodes have a new property ‘FetchSize’ that enables you to specify the number of rows that should be fetched from the database when more rows are needed for a result set.
Convert to Library Node
You can now convert selected nodes on the canvas into a library node – allowing you to re-use proven analytic logic in other projects and share components with colleagues for enhanced trust in the data, improved productivity, and repeatable results.
You can specify the name of the library node, select the category in the node palette, the node icon and color:
Library nodes can be shared with other users, either by publishing them in the Public workspace or exporting the node as an .lna file. Nodes are displayed in the Directory view with a 'cube' icon:
Run Now Option for a Scheduled Data Flow
You can now choose to manually run a selected schedule:
When you select ‘Run Now…’ you are prompted to enter any Run properties for the data flow:
The manual run is included in the list of scheduled runs:
The schedule can be run provided it is not already running.
Single Sign On (SSO)
When deploying an instance of Dataverse Enterprise Server edition the instance can now be configured to integrate with CA Technologies’ Single Sign On Siteminder Access Gateway/ Siteminder Secure Proxy Server to allow users to be authenticated using SSO.
Access to Dataverse is controlled via a reverse proxy method (as outlined in the deployment architecture shown below) using the following methods:
- Integrated to the Siteminder Access Gateway
- Integrated to the Siteminder Secure Proxy Server (component available in older Siteminder versions, pre Access Gateway)
To identify the Dataverse user, Dataverse consumes the ‘SM_User’ header that is passed to it from the Siteminder Access Gateway.
Note: Dataverse v3.1.4 has been developed and tested against CA Single Sign-On (Siteminder) v12.6 using the Siteminder Access Gateway method.
Dynamic User Creation
Where a Dataverse Enterprise Server deployment has been configured to use SSO or LDAP/AD integration, Dataverse can also be configured to automatically create a user within Dataverse the first time they log in if the user does not already exist.
Automatic LDAP/AD Synchronization
When Dataverse Enterprise Server is configured with LDAP/AD integration, Dataverse can be configured to automatically synchronize with the LDAP/AD server at regular intervals.
User Role Enhancements
A new ‘Explorer’ role has been added to allow data flow ‘consumers’ to collaborate with designers and manage scheduling tasks.
The ‘End User’ role has been renamed to ‘Designer’ and the ‘Scheduler’ role has been renamed to ‘Designer – Scheduler’.
Audit Logging Documentation Enhancements
The Dataverse audit logging documentation has been expanded in this release to explain how the audit entries map to user actions.
Resolved issues include:
Processing date data types
When processing input fields that contained a date data type, the Change Metadata node and the Data Converter node would incorrectly output the year element for years after 1970. This issue has now been resolved. LAE-8778, LAE-8734
Decision Forest node
Resolved issue with the Decision Forest node where unstable results were generated due to an incorrect default parameter value being sent to the embedded R node. This issue has now been resolved. LAE-8189
Importing legacy data flows
Previously, Dataverse would fail to import a legacy data flow if the legacy data flow contained a Composite node with multiple Bypass nodes or Bundler nodes, or if the legacy data flow contained an instance of a library node that had an equals sign (=) in its name. These issues have now been resolved. LAE-8826, LAE-8823
Running legacy data flows
Previously, legacy data flows containing run properties that were set as "required" would fail to execute. This issue has now been resolved. LAE-8893
CSV/Delimited Input node
The default delimiter is now correctly dictated by the value set in the Format property if no value is specified in the ‘FieldDelimiter’ property. LAE-8743
Resolved issue with the CSV/Delimited Input node so that it now works correctly when the ‘|’ character is used as the ‘FieldDelimiter’ and a comma-separated list of fields is specified in the ‘FieldNames’ property. LAE-8852
Data viewer display of white space
The data viewer now correctly displays leading and trailing white space for field values. LAE-8854