Skip to main content

Data Consolidation

This article details the data consolidation process, exploring the steps involved in combining similar data from different resources and mapping source attributes to the attributes defined in the data models.

What is data consolidation?

Data consolidation refers to the process of combining data from multiple sources into a unified and consistent format. It involves gathering data from various systems, databases, or files and merging them into a single data set. The purpose of data consolidation is to not only eliminate duplicates, but also create a comprehensive and integrated view of the information that can be easily analyzed and used for decision-making purposes.

Consolidation plays a crucial role in the data orchestration process within the Brinqa Platform. After individual source data models (SDM) have been created from imported data, consolidation combines identical data from multiple sources and maps them to the unified data models (UDM). To illustrate this process, the following diagram simplifies it using two sources, CrowdStrike and Qualys Vulnerability Management (VM):

Consolidation diagram

Figure 1. Multiple source data models consolidated into unified data models.

When examining host details on the Inventory > Hosts page (hold the pointer over the record and click Details), you can click the Sources tab to view how the specific host record has been consolidated. The following screenshot showcases a host record consolidated from four different sources:

Sources tab in host details

For this host record, while the UID value is different across all sources, the MAC address is the same for Armis, CrowdStrike, and Qualys VM. Similarly, the hostname value is identical for Armis, CrowdStrike, and ServiceNow. The data from these four sources have been merged into a unified entry in the Brinqa Platform, effectively eliminating any duplications.

How does data consolidation work?

In the Brinqa Platform, consolidation is configured in the data model, which serves as a template for the imported data set. The Consolidation page, found within data models, offers two main functionalities:

  • Identifier designation: This feature lets you select and arrange the identifiers for data consolidation. By specifying which attributes serve as identifiers and in what order, you determine how data from different sources are combined and unified.

  • Attribute mapping: This feature enriches the consolidated data set by mapping attributes from different sources to corresponding attributes in the data model. This process ensures that the consolidated data set incorporates relevant and comprehensive information.

The following sections provide detailed explanations and examples for these functionalities.

Achieve data deduplication through identifier designation

When importing data from various sources, it's typical to encounter overlapping information, such as identical MAC address or hostname for a given host. To avoid importing duplicate data sets, Brinqa Integration+ connectors designate specific attributes as identifiers for data consolidation.

An identifier is a piece of information that uniquely identifies the host, regardless of the data source that provides the information. For example, the following screenshot showcases the Host data model in a test system. The data models listed under Sources are SDM created by the Brinqa Platform after importing from those data sources, and the identifiers are those used to consolidate the source data sets. The number displayed next to each identifier represents the count of SDM using this identifier. You can click the identifier to see which SDM is using it.

consodidation list of identifiers

When importing data from each source, Brinqa follows a top-to-bottom order of precedence for evaluating identifiers. It stops the evaluation after a match is found. Referring to the screenshot above, Brinqa starts by evaluating the Instance ID identifier.

  • If the instance ID of the incoming record matches that of an existing host, Brinqa consolidates the incoming record with the existing host and proceeds to the next record.
  • If the instance ID doesn't match, Brinqa creates a new host record.
  • If the instance ID is empty or missing from the incoming record, Brinqa then assesses the MAC addresses identifier in a similar manner, followed by the Public IP address, and so on. If none of the identifiers yield a match, Brinqa creates a new host record.

Data consolidation is case sensitive by default. However, you can click the Ignore Case toggle to enable case-insensitive matching on a per-identifier basis.

For multi-value attributes such as MAC addresses, Brinqa goes through each value in the incoming record one by one looking for a match in an existing host. Again, this evaluation stops when a match is found, or if no matches are found, it proceeds to the next record.

If you have added or modified records manually, Brinqa adds an SDM named Manual entry to the list of sources for consolidation. By default, the Manual entry SDM takes precedence over other sources for identifier designation. In other words, Manual entry appears first in the drop-down list for each identifier. You can adjust the order if this doesn't align with your preference.

TIP #1

Brinqa recommends that you always put UID as the last identifier as a catch-all mechanism since they tend to be unique in different sources.

Why are there still duplicates sometimes?

While Brinqa offers a predefined list of identifiers and their prioritized order, it may not align perfectly with the reliability and credibility of your specific data sources. Consequently, this misalignment can occasionally lead to duplication in the data set. For example, if your machines can have multiple IP addresses but possess unique hostnames, using Brinqa's default order would result in duplicates because Public IP address is consolidated before Hostname, as shown in the previous screenshot. To address this issue, one possible solution is to move Hostname higher in the identifiers list.

Similarly, when multiple sources contain the same attribute, such as MAC address, there can be significant variations in the reliability and credibility of the source data. By default, Brinqa arranges the source attributes for each identifier according to the order of data integration creation. If CrowdStrike is consolidated before other sources, for example, but the data from CrowdStrike tends to be less reliable, your consolidated may also become unreliable. To address this issue, you can move CrowdStrike lower in the attribute list for the MAC_ADDRESSES identifier or even remove it from the list entirely. This adjustment can help ensure the consolidation is based on more dependable data sources.

Lastly, duplicate entries may arise from the quality of your data. For example, if a significant number of your host records lack hostnames, they will be consolidated together based on the Hostname identifier. To address this situation, either populate the hostname for your records or move the Hostname identifier lower in the list. This adjustment can help mitigate the occurrence of duplicate entries.

Enrich data through attribute mapping

In addition to eliminating duplicates in data sets, consolidation also helps to create a comprehensive and integrated view of the information provided by the data sources. Given that the attributes in your data sources may differ from the predefined data models in the Brinqa Platform, the consolidation feature enables you to establish mappings between source attributes and target attributes.

The Brinqa Integration+ connectors provide a default mapping between source attributes and target attributes. You can view the attribute mapping on the Consolidation page of the data model and make modifications if necessary. In case the default mapping fails to capture the desired attributes, you can add your own mapping to the list. Furthermore, if you have introduced custom attributes for the data model, you must add the appropriate mapping for those attributes.

The provided screenshot demonstrates the mapping of the Hostname attribute within the Host data model in a test system. The Order precedence criterion dictates that the Hostname attribute stores the first non-empty value encountered, following a top-to-bottom evaluation.

consodidation identifier attribute mapping

Just like how identifier designation works, the Manual entry SDM takes precedence over other sources in attribute mapping. In other words, Manual entry is listed at the top of each drop-down list under Attribute mappings. You can adjust the order if this doesn't align with your preference.

TIP #2

Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results.

In addition to Order precedence, Brinqa also supports Collection, Max, and Min as mapping criteria. The following table provides further information for each criterion:

CriteriaDescriptionExample
CollectionCombine all distinct values from the source attributes and store it in the target attribute. The target attribute must be of a multivalued type to accommodate the combined values.The mapping of the Tags attribute in the Host data model uses the Collection criterion, ensuring that the tags from all the sources are combined and stored.
MaxFind the maximum value among the attributes and save the corresponding source attribute in the target attribute.

If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute.
The mapping of the Last assessed attribute in the Host data model uses the Max criterion, ensuring that the most recent date from all the sources is stored.

To store the hostname of the last-assessed machine in the target Hostname attribute, select Hostname in Source attribute and Last assessed in Comparison attribute.
MinFind the minimum value among the attributes and save the corresponding source attribute in the target attribute.

If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute.
The mapping of the First seen attribute in the Host data model uses the Min criterion, ensuring that the earliest date from all the sources is stored.

To store the hostname of the first-seen machine in the target Hostname attribute, select Hostname in Source attribute and First seen in Comparison attribute.
Order precedenceEvaluate the source attributes from top to bottom and save the first non-empty value in the target attribute. Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results.The mapping of the Operating system attribute in the Host data model uses the Order precedence criterion, ensuring that the most accurate value from all the sources is stored.

How are manual entries consolidated?

If you modify a UDM record manually, Brinqa adds an SDM to represent your manual entries. The following screenshot shows the Sources tab of a host record that has been manually modified. Notice the source named Brinqa Manual Entry:

Sources tab in host details

Manual entries undergo data orchestration as an SDM, thus they are evaluated against the same consolidation identifiers as the other sources. In this example, since the SDMs are first consolidated on the MAC_ADDRESSES attribute, the Brinqa Manual Entry adopts the same MAC address value, ensuring its consolidation into this UDM during subsequent orchestration processes.

To elaborate, suppose you want to add an Instance ID to this record, which is the first identifier evaluated for consolidation. You'll encounter an error upon saving your modification. This occurs because if the Instance ID field is updated with a new value, the Brinqa Manual Entry might merge with a different SDM when data orchestration runs, disassembling its consolidation.

Furthermore, you cannot modify the MAC_ADDRESSES field of this record either. This restriction is in place because if the MAC address value is updated or removed, the UDM will have to consolidate based on Public IP address, which is the next identifier on the list. And since it's empty, the consolidation would occur on Hostname instead. In essence, this manual entry would no longer be consolidated on MAC_ADDRESSES, which is not allowed.

TIP #3

To maintain the integrity of the UDM, do not modify the attributes that have been used as consolidation identifiers.

Nevertheless, sometimes you might encounter an error, similar to the following, when trying to modify an attribute that isn't an identifier:

Consolidation identifier error

This error indicates that the identifier used by the record you're modifying is Hostname, but the first identifier specified in the corresponding data model is UID, therefore your modification cannot be saved. (A manual entry automatically sets the UID field by default.)

There are multiple situations where this error may occur. One scenario is when the order of identifiers listed for the host record, which can be verified on the Sources tab, differs from the order specified in the Host data model. The Brinqa Platform detects the mismatch and issues the error. To resolve this issue, adjust the order of identifiers in the data model to align with the host record.

TIP #4

Changing the order of identifiers in the data model could impact manual entries and lead to the loss of your modifications to the original UDM. This could happen when the manual entry SDM is no longer consolidated with the original UDM due to the change of identifiers.

Should you need to rearrange the identifiers, make sure to run the consolidation flow afterward and update the manual entries accordingly.

You might encounter the aforementioned error in another scenario, where the manual entry uses a different attribute than the identifier. For example, in the following screenshot, the identifier is Instance ID, but the attribute of the manual entry is UID. This inconsistency is not supported and would lead to the previous error.

Consolidation identifier manual entry

TIP #5

When setting up identifiers, the SDM attribute name must match the identifier, with the exception of case sensitivity.

Not all data models support consolidation

Not all data models support consolidation out of the box. Parent data models, such as Asset, Finding, Finding Definition, or Ticket, lack the specificity to build the attack surface or comprehend the cybersecurity posture. Therefore, they don't support consolidation by default.

The following data models support consolidation out of the box:

  • Data models extending Asset: Account, Application, Certification, Cloud resource, Code project, Code repository, Container, Container image, Device, Host, Host image, IP range, Network segment, Package, Service, Site, Site certificate, and Subnet.

  • Data models extending Entity model: Assessment, Attack vector, Business service, Business unit, Company, CPE record, CVE record, Installed package, Threat intelligence, and Weakness.

  • Data models extending Finding: Dynamic code finding, Manual finding, Open source finding, Pentest finding, Static code finding, Violation, and Vulnerability.

  • Data models extending Finding definition: Dynamic code finding definition, Manual finding definition, Open source finding definition, Pentest finding definition, Static code finding definition, Violation definition, and Vulnerability definition.

  • Data models extending Ticket: Dynamic code ticket, Manual ticket, Open source ticket, Pentest ticket, Static code ticket, Violation ticket, and Vulnerability ticket.

To find out if the data model supports consolidation, follow these steps:

  1. Navigate to Administration admin-button > Data > Models.

  2. Locate the data model.

  3. On the Overview page, check if the Supports consolidation option is selected.

    If this option has been enabled, you should see Consolidation in the left menu.

Launch consolidation flows

If you have modified any consolidation settings in a data model, your changes will apply after the data orchestration runs. However, if you want the new consolidation to go into effect immediately, follow these steps:

  1. Navigate to Administration admin-button > Data > Models.

  2. Navigate to the data model you've modified the consolidation and click Flows.

  3. Click the consolidation flow for your data model. For example, if you've modified the consolidation for the Host data model, click Host consolidation flow.

  4. Click Launch, and then click Launch again in the confirmation dialog.