I’m pleased to announce the Dataverse 3.1.5 release is now available.
This generally available release introduces the Python language as the de facto scripting language for the creation of custom data transformations in Dataverse.
The use of the Python language for data transformation in Dataverse provides you with a number of benefits, including:
- The ability to re-use existing Python programming knowledge and skills
- Access to the wealth of knowledge base information available online
- Accelerated time to value for businesses when hiring new employees to work with Dataverse
- Delivers improved performance when handling larger data sets
As the Dataverse application transitions from using the Lavastorm-proprietary Dataverse script to the Python language, nodes which used the proprietary Dataverse script are being progressively superseded by new Python-based nodes.
This release also introduces a number of application enhancements and a range of operational, performance and stability improvements.
The Python-based Transform node was originally introduced as an Experimental node in Dataverse 3.1.4, it is now Generally Available. The Experimental node was called “Python Transform”, the GA version of the node is now called the “Transform” node.
The new Transform node replaces the previous Transform node. It is available in the palette in the Transformation and Aggregation category and is displayed in the ‘Favorites’ view.
The original Transform node has been renamed to Transform (Superseded) and is available in the ‘All Nodes’ view. This original node remains available for you to use but Lavastorm recommends the new Transform node is used in new projects.
The Transform node now supports the use of GroupBy functionality; allowing it to transform sub-groups of the input data, where the sub-groups are specified using the fields configured in the node's ‘GroupBy’ property.
The grouping functions provide a range of aggregation methods including:
The Transform node now also improves the handling of Null values. Null fields on data records are bound to a Null object. Comparisons are made against the Null object rather than the Python None object, this allows comparisons with Null e.g.
if in1.MyField is Null:
# Do something interesting
In addition, the “fn” module provides a set of Null-safe functions for data comparisons and string manipulation. The functions comprise:
The fn.desc() function is also available for use in grouping and sorting properties, e.g the ‘SortBy’ property of the Sort node and the ‘GroupBy’ property of the Transform and Aggregate nodes.
For further information on the use of the Python scripting within Dataverse, the grouping functions and Null handling see the Python scripting section of your Dataverse installation's help or the Python scripting section of the online Dataverse documentation.
The Transform node is now used as the basis for a number of other Dataverse nodes. For nodes that offer script-less configuration using a grid editor (Filter, Sort), the Advanced expression functionality is now based on the Python language.
When using the ‘Insert Fields’ property menu to add a field reference into the Python script, the inserted field uses the fields() operator, e.g.
- fields.myField – when the field name does not contain spaces
- fields[‘my Field’] – when the field contains spaces
- Nodes based on the Transform node also use the fields() operator
The new Transform node is now used by a number of other Dataverse nodes e.g. Sleep, Get Metadata. As these nodes did not present any script properties, there are no differences in use and they continue to use their previous names.
Where a superseded node allowed the use of the Lavastorm-proprietary script within a property, the new version of the node uses the Python language. The previous version of the node is indicated by ‘(superseded)’ in the node name.
The original nodes are now called:
- Agg Ex (Superseded)
- Band by Strata (Superseded)
- Filter (Superseded)
- Remove Duplicates (Superseded)
- Sort (Superseded)
- Split (Superseded)
- Transform (Superseded)
- Trim Fields (Superseded)
There are configuration differences for some of the node properties due to the differences in the language syntax, for example, when the ‘Input Fields’ property reference is used to insert a field name, e.g.
- OrderID would now be referenced as fields.OrderID or fields[‘OrderID’]
- ‘Due Date’ would now be referenced as fields[‘Due Date’]
The Sort node provides a UI control in the property panel that allows you to easily select the fields to sort by.
You can drag the field placeholders left/right to change the column order in which the records are sorted.
The context menu for each field allows the sort direction to be set and, for string type fields, it provides access to additional sort options.
When an option has been enabled, the field’s menu icon changes to blue + white.
The new Aggregate node is a replacement for the Agg Ex node.
It enables you to aggregate data or sub-groups of the data using a number of aggregation methods, e.g.
The node leverages the Python language to enable the creation of custom aggregation methods.
The ‘GroupBy’ property specifies the field(s) by which the aggregate values will be grouped:
If the 'GroupBy' property is not specified, all the input records are treated as a single group. Multiple fields can be specified as a comma-separated list of field names.
The default sort order is ascending but you can reverse this using the fn.desc() operator, e.g.
fn.desc(fields.[‘My Field']) # where the field name has space characters
The ‘ConfigureFields’ property specifies the metadata of the field(s) to be output in the aggregate records.
Typically, if you have specified GroupBy fields, these should be added to the output record.
You define the variable that holds the aggregate value in the 'ConfigureFields' property.
The method must be one of the group() functions, e.g. group.sum()
The group() functions can accept additional arguments to modify their behavior e.g. whether or not to ignore Null values – see the node help documentation for more details.
The ‘ProcessRecords’ property specifies how the node will use the input records to update the aggregate values.
Typically this would be configured to update the aggregate with each input record, e.g. using out1 += in1
Custom logic can be used to modify this behavior e.g. if, for some reason, you only wanted to update the aggregate value every second record you could use:
if node.execCount % 2 == 0: # Modulus remainder is 0
out1 += in1
The ‘ImplicitWriteEvaluation’ property allows you to configure when the node implicitly outputs the aggregate record:
- At the end of the group (default)
- For each record
For example this could be used to generate a running total for an input field, grouped by another field:
The Cat node now supports automatic promotion of string data type to ‘unicode’ data type:
- When the Cat node is used to combine multiple data sets containing textual information, the field on one input may have a ‘string’ data type and the corresponding named field on other inputs may have a ‘unicode’ data type
- In this situation, the Cat node now auto-promotes the field with the ‘string’ data type to the ‘unicode’ data type when the node’s ‘TypeConversion’ property is set to ‘None’ or ‘Numeric’ (default)
This streamlines the combining of data from multiple data sources that use different data types for textual data.
Node Property Configuration Improvements
Some library nodes provide default property values. These are indicated by the ‘down-arrow’ badge:
In some circumstances you may want to set a property explicitly to a blank value. This can now be done using the ‘Set to Blank’ menu option:
The node property indicates this condition has been set by displaying the ‘Blank’ badge:
You can also restore a Blank property value to its inherited default value using the ‘Restore Default Value’ menu option:
Data Viewer Enhancements
Previously, Dataverse allowed filters to be applied in the Data Viewer and then added to the canvas as a Filter node or a Split node.
When the Filter or Split node is added to the canvas the nodes property panel displays the grid editor as before:
However, the Filter and Split nodes now uses the Python-based Transform node. When the Advanced tab is opened, it shows the Python script that was automatically generated to implement the filter or split criteria:
If required, the Python script can be modified to create a custom filter or split criterion.
Directory View Panels
The width of the side panels in the Directory view can now be adjusted by grabbing the control at the edge of the panels. The panels can also be collapsed by double-clicking on the edge control:
Date times in the Directory view are now displayed in the local format for your system.
Note, the displayed date time format is dependent on your browser's language settings.
The following issues have been resolved in this release which fix reported issues with Dataverse nodes, improve the import of legacy data flows, streamline interworking with LDAP and improve application stability.
LAE-9037, LAE-9053, LAE-9052, LAE-8938, LAE-8758
LAE-9109, LAE-9074, LAE-8984, LAE-8983, LAE-8961
LAE-8951, LAE-8930, LAE-8921, LAE-8783, LAE-8270
LAE-7558, LAE-9016, LAE-9067, LAE-8977, LAE-8769
LAE-9036, LAE-9050, LAE-8986
See the release notes (PDF) for details of the resolved issues.