Enrich Your Data through Attribute Mapping
This article explains attribute mapping in the data consolidation process, discussing different mapping criteria and conversion methods used to match and convert source attributes to the attributes defined in the data models.
Overview
In addition to eliminating duplicates in datasets, consolidation also helps to create a comprehensive and integrated view of the information provided by the data sources. Given that the attributes in your data sources may differ from the unified data models (UDM) in the Brinqa Platform, the consolidation feature enables you to establish mappings between source attributes and target attributes.
System administrators can view attribute mappings by navigating to Administration > Data > Models, locating the data model, and clicking Consolidation. The provided screenshot demonstrates the mapping of the Hostnames
attribute within the Host data model in a test system. The Order precedence criterion dictates that the Hostnames
attribute stores the first non-empty value encountered, following a top-to-bottom evaluation.
Just like how identifier designation works, the Manual entry takes precedence over other sources in attribute mapping. In other words, Manual entry is listed at the top of each drop-down list under Attribute mappings. You can adjust the order if this doesn't align with your preference.
Most of the Brinqa Integration+ connectors provide a default mapping between source attributes and target attributes. You can view the attribute mapping on the Consolidation page of the data model and make modifications if necessary. In case the default mapping fails to capture the desired attributes, you can add your own mapping to the list. Furthermore, if you have introduced custom attributes for the data model, you must add the appropriate mapping for those attributes.
Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results.
Mapping criteria
Along with Order precedence, Brinqa also supports Collection, Max, and Min as mapping criteria. The following table provides further information for each criterion:
Table 1: Supported mapping criteria
Criteria | Description | Example |
---|---|---|
Collection | Combine all distinct values from the source attributes and store it in the target attribute. The target attribute must be of a multivalued type to accommodate the combined values. | The mapping of the Tags attribute in the Host data model uses the Collection criterion, ensuring that the tags from all the sources are combined and stored. |
Max | Find the maximum value among the attributes and save the corresponding source attribute in the target attribute. If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute. | The mapping of the Last assessed attribute in the Host data model uses the Max criterion, ensuring that the most recent date from all the sources is stored. To store the hostname of the last-assessed machine in the target Hostnames attribute, select Hostnames in Source attribute and Last assessed in Comparison attribute. |
Min | Find the minimum value among the attributes and save the corresponding source attribute in the target attribute. If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute. | The mapping of the First seen attribute in the Host data model uses the Min criterion, ensuring that the earliest date from all the sources is stored. To store the hostname of the first-seen machine in the target Hostnames attribute, select Hostnames in Source attribute and First seen in Comparison attribute. |
Order precedence | Evaluate the source attributes from top to bottom and save the first non-empty value in the target attribute. Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results. | The mapping of the Operating system attribute in the Host data model uses the Order precedence criterion, ensuring that the most accurate value from all the sources is stored. |
Data conversions
When mapping source attributes to target attributes, the Brinqa Platform enables you to define how your data is formatted to ensure proper matching. This is done through the Conversion field (as shown in the screenshot from the overview), where you can indicate that the attribute type is boolean, date, number, or string, and specify its format if necessary. The platform uses the format you define to match the incoming data and stores it in the default format supported by the UDM. The string converter provides the additional ability to modify the consolidated value. By default, no specific format is set.
Boolean conversions
When you indicate that an attribute is a boolean, the imported value is trimmed to remove leading and trailing whitespace characters. If the trimmed value is “true” (case insensitive), “y” (case insensitive), or "1", the attribute is stored as true
. Any other value, or if the value is null, the attribute is stored as false
.
Date conversions
Different data sources use varying formats to store dates, which can include diverse styles such as Unix time, ISO 8601, or custom date formats. These variations require careful consideration to ensure accurate data interpretation and processing. The following table summarizes the different date and time formats that the platform supports.
Table 2: Supported date conversions
Option | Description |
---|---|
Date Time Best Effort | Indicates that the platform should detect the date time format on a best-effort basis, following this order: Date Time Unix (Detect), Date Time RFC, then Date Time ISO. TIP: Brinqa recommends refraining from selecting this option, especially with large datasets. Although it's simpler for you, it's the most time-consuming and resource-intensive method due to the extensive evaluation required. |
Date Time Custom | Indicates that the date is in a proprietary format. If you select this option, a new field displays for you to specify the format using the Java Date Time Formatter patterns. As you type the patterns, the platform displays the output using the current date as an example. |
Date Time ISO | Indicates that the date is in ISO 8601 format: 'yyyy-MM-ddTHH:mm:ss.sss UTC'. The platform checks these possible patterns: |
Date Time RFC | Indicates that the date follows the RFC 1123 format: 'day, dd MMM yyyy HH:mm GMT'. |
Date Time Unix (Detect) | Indicates that the date is in Unix time, but the format is uncertain. The platform checks if it's in seconds or milliseconds by comparing it to the seconds-based Unix time 100 years in the future. Smaller numbers are assumed to be seconds, larger ones milliseconds. |
Date Time Unix (Milliseconds) | Indicates that the date is Unix time in milliseconds. |
Date Time Unix (Seconds) | Indicates that the date is Unix time in seconds. |
Number conversions
The options for number attributes are straightforward: you can specify whether an incoming number is an integer or a real number.
String conversions
For string attributes, you can specify how the platform should convert your data before storing it: no conversion, simple string conversion, or regular expression (regex) conversion.
Simple string conversions
The Brinqa Platform offers several basic string conversion options, as outlined in the table below, to ensure consistency and cleanliness of your data before consolidation.
Table 3: Supported simple string conversions
Option | Description |
---|---|
Lowercase | Convert the incoming string to lowercase. |
Trim | Remove any leading and trailing whitespaces from the incoming string. |
Trim and Lowercase | Remove any leading and trailing whitespaces from the incoming string and convert it to lowercase. |
Trim and Uppercase | Remove any leading and trailing whitespaces from the incoming string and convert it to uppercase. |
Uppercase | Convert the incoming string to uppercase. |
Regex conversions
With the Regex option, you can not only specify a pattern to match the incoming string but also use capture groups to extract information for further manipulation. As shown in the screenshot, you can input a pattern in the Regex field, and define how the matched string should be processed in the Substitution field.
For example, if the incoming string is 'abc' but you want to store it as 'cba,' you can use the regex pattern (a)(b)(c)
to match the string. This pattern creates three groups: group 1 contains 'a,' group 2 contains 'b,' and group 3 contains 'c.' Then, by using the {group number} syntax, you can specify {3}{2}{1}
in the Substitution field to reorder the characters.
Brinqa recommends using tools like Regex101 to test and confirm that your pattern is correct.
Some key points to consider:
-
When you specify a regex pattern, all instances of the pattern in the source value will be matched. Using the same example above, the result would be 'cbacba' if the incoming string is 'abcabc'.
-
The substitution string can be left empty, in which case all matches will be dropped. For instance, if you do not specify a substitution string in the previous example, 'abc123abc' would become '123'.
-
If the specified regex or substitution results in an empty string, the output will either be
null
or the value defined by the "null handler", if one is set.
You can define more than one regex pattern by clicking Add Regex. To help you get started, the platform provides some commonly used patterns that you can select using the drop-down next to Add Regex:
-
Add with whitespace handler: Starts the regex pattern with
s+
to match one or more whitespace characters. -
Add with empty string handler: Starts the regex pattern with
^$
to match an empty string. -
Add null handler: Adds a handler to manage situations where the incoming value is null.
When multiple regex patterns are specified, the platform applies them in order until it finds a match, at which point the corresponding substitution will be made. If more than one pattern matches, only the first match is applied. If none of the patterns match, the source value remains unchanged.
Logging and troubleshooting
If the Brinqa Platform encounters a mismatch converting your data, such as specifying the date format as ISO while the data is a string, the value for that attribute is discarded and the event is logged.
To view these logs, System administrators can navigate to Administration > System > Logs and enter the word 'dropped' to search for log messages similar to the following:
To keep the logs manageable, the platform records only the first 10 dropped values for each attribute.
There are two possible solutions when attribute values are dropped:
-
If the attribute in the source record is in an incorrect format, the number of dropped values is usually small compared to the total record count. In this case, you can fix the format in the data source, rerun data integration, and then run consolidation again.
-
If the format specified for data conversion is incorrect, a large number of values, or even all of them, may be dropped. In this case, update the option for data conversion and run consolidation again.