Saturday, 28 December 2019

SSAS Processing Error: Unicode string issue during dimension processing

SSAS and the database engine use different comparison rules depending on the collation, character sets, and handling of blanks in the middle or at the end of a string. This becomes an issue during SSAS processing when key values used to establish attribute relationships must be an exact match. Sometimes, what passes as a ‘match’ in the database engine is seen by SSAS as a non-matching value, resulting in processing errors that can be a challenge to track down if the value happens to be a blank! This article describes the problem in more detail and provides various workarounds.

Actual error (with placeholder values)

Errors in the OLAP storage engine: The attribute key cannot be found when processing: Table: 'MyDimTable', Column: 'Column1', Value: 'alzイ'. The attribute is 'Column1'.
Errors in the OLAP storage engine: The attribute key was converted to an unknown member because the attribute key was not found. Attribute MyDimTableKey of Dimension: MyDimTable from Database: MyDB, Record: 3.
Notice the Unicode value, where the blank terminator is from a Japanese character set. If you get the “attribute key cannot be found” error and the value contain a Unicode blank in the middle or end of the string, you are most likely seeing the effects of these different comparison rules.

Cause:

The problem arises when Analysis Services uses different comparison rules when processing attribute relationships.
By default, the relational database engine uses a width-insensitive collation, such that the following strings are interpreted as equivalent values:
  • string1+<double-byte-blank>+string2
  • string1+<single-byte-blank>+string2
Notice the first member has a double-byte space/blank and the second member has a single-byte space/blank, at the same position in the member name.
If these strings were used as keys to relate rows from different tables, the database engine would recognize these strings as the same value and create the relationship accordingly.
Now suppose that you are processing an Analysis Services database that uses these strings as KeyColumns in an attribute relationship. Unlike the database engine (set to width-insensitive collation), SSAS will interpret these as different strings, generating an error that a matching record cannot be found, and possibly registering one or more of the values as the unknown member.
The attribute key cannot be found because to SSAS, string1+<double-byte-blank>+string2 is not the same as string1+<single-byte-blank>+string2, and therefore fails to meet the criteria used to establish an attribute relationship.

Resolution:

If this behavior is acceptable, then you should do nothing.
However, if want to SSAS to exhibit the same behaviors as the relational database engine, you can use one of the following workarounds:
  • Set ProcessingGroup to ByTable (instead of the default, ByAttribute). This setting is specified in Dimension Designer, in SQL Server Data Tools, on the dimension definition.
  • Set server configuration property | Language/Collation, to be width-sensitive and the dimension definition to be width-sensitive. You can set this in Management Studio, in server properties.
By using either workaround, each string in our example, (string1+<double-byte-blank>+string2 and string1+<single-byte-blank>+string2) would each be considered a viable match for the other (in terms of an attribute relationship), allowing processing to succeed.
Alternatively, you can address the issue in the relational database by changing double-byte spaces to single-byte spaces. See Server-Side Programming with Unicode  . For information about the T-SQL REPLACE function, see http://technet.microsoft.com/en-us/library/ms186862.aspx 

Notes:

Width-insensitive is the default collation for the SQL Server relational engine, so if you are working with global data, you are more likely to run into this issue when processing an Analysis Services database.
DBCS is a double-byte character set. A blank character in DBCS is Unicode 12288 (hex 3000). SBCS is a single-byte character set. A blank character in SBCS is 32. Width-sensitivity on the collation will determine whether these are interpreted as the same or different values, for strings having a trailing blank.






No comments:

Post a Comment

Azure AzCopy Command in Action

Azure AzCopy Command  in Action -  Install - Module - Name Az - Scope CurrentUser - Repository PSGallery - Force # This simple PowerShell ...