I’m pleased to announce the Dataverse 3.1.8 release is now available.
This Generally Available release provides a range of improvements that:
- simplifies the configuration of the JDBC nodes
- improves the acquisition of data from Excel files
- extends Dataverse's Python scripting capabilities
- streamlines access to data stored in a cloud-based Data Warehouse
This release also introduces a number of application enhancements and a range of operational, performance and stability improvements.
It is now simpler to configure the JDBC Query, JDBC Execute and JDBC Store nodes when using one of the database drivers that are shipped with Dataverse:
Microsoft SQL Server
When the ‘DbType’ property is set the node automatically uses the appropriate driver that is shipped with Dataverse.
You can configure the node to access other databases using the properties in the Advanced section of the node property panel.
If the target database does not require a username and/or password, e.g. when querying an Access database, the corresponding properties can now be left unset:
You can access other databases or customize the attributes for a database connection using the properties in the node’s Advanced configuration section:
- If the 'DbUrl' property is specified it will override the settings entered in the 'DbType' and 'DbName' properties
- Driver-specific configuration options can be configured as a series of key:value pairs with the ‘&’ character as the delimiter.
You can deploy the JDBC database driver files to an accessible directory.
- The 'DbDriverClasspath' property can then be set to enable the node to identify the required driver file or the directory containing the driver and dependent files
- Multiple directories can be specified as a semi-colon separated list
- It is recommended that a separate sub-directory is used for each database driver to minimize the likelihood of any Java class clashes.
Excel File Node
The Excel File Node now provides an optional property ‘TrimFieldNames’ that allows you to specify whether to trim leading and trailing whitespace characters from the input column names when the data is imported from the Excel workbook(s).
You can use the AutoName option of the ‘DuplicateFieldNameAction’ property to disambiguate output field names where the input column names only differ by leading/ trailing whitespace characters.
Python Scripting Enhancements
A number of nodes utilize ‘record-key’ properties i.e.
- The join key fields in the correlation nodes - Join, Lookup, Merge
- The sort-by fields in the Sort and Aggregation nodes
- The group-by fields in the Transform node.
Pattern objects can now be used in record-key properties. This allows you to identify multiple input fields on which the node will join/sort/group data using the pattern match.
For example, you can specify a wildcard pattern in the ‘GroupBy’ script in the Aggregate node’s Advanced tab. The node will then group data by all input fields matching the pattern:
If a pattern is specified it must match at least one input field name.
Amazon Redshift JDBC Driver
The Amazon Redshift JDBC driver is again included in the set of drivers that are shipped with Dataverse.
Application Start Time
The time taken for the Dataverse application to start-up has been improved.
Application Start Resource Requirements
When running on Linux, the Dataverse application will not start if insufficient system resources have been allocated for the correct operation of the application
i.e. where the ‘ulimit’ maximum permitted number of files or processes is too low.
Access to Node Help
The link to the node help in the node properties panel now includes an ‘information’ icon to assist users discovering how to access node help.
The Dataverse audit log now contains information on the nodes that have been executed
log entries are created when a node is executed and when execution ends. The log entries include details of the data flow name, the node description, the username and whether the node execution passed or failed.
We are continuing to transition the Dataverse application to leverage Python rather than the proprietary Dataverse script language.
This release introduces a replacement for the Hash Split node. The previous version of the node is now called Hash Split (Superseded).
The Hash Split node enables you to split the input record set into multiple streams, allowing parallel processing of subsets of the input data.
The node generates a hash over the fields listed in the ‘SplitFields’ property:
Two output pins are present on the node by default. Additional output pins can be specified in the ‘Outputs’ section on the Define tab of the node’s configuration properties:
The Hash Split (Superseded) node remains available in the node library but the new node is recommended for use in current projects.
The following issues have been resolved in this release:
LAE-9559, LAE-9629, LAE-9701, LAE-9709, LAE-9722,
LAE-9723, LAE-9725, LAE-9731, LAE-9762, LAE-9766,
LAE-9767, LAE-9772, LAE-9786, LAE-9796, LAE-9809,
See the release notes for details of the resolved issues.
SharePoint 2010 nodes
The following nodes are deprecated in this release:
- Folder List for SharePoint 2010
- Download for SharePoint 2010
- Upload for SharePoint 2010
Equivalent functionality is available for use with SharePoint 2013 document libraries using the corresponding SharePoint 2013 nodes:
- Folder List for SharePoint 2013
- Download for SharePoint 2013
- Upload for SharePoint 2013
The SharePoint 2010 nodes are still available in the node library but the SharePoint 2013 nodes are recommended for current projects.
Note: Microsoft ended mainstream support for SharePoint 2010 in October 2015.