Skip to main content

Data Consolidation

This article details the data consolidation process, exploring the steps involved in combining similar data from different resources and mapping source attributes to the attributes defined in the data models.

What is data consolidation?

Data consolidation refers to the process of combining data from multiple sources into a unified and consistent format. It involves gathering data from various systems, databases, or files and merging them into a single data set. The purpose of data consolidation is to not only eliminate duplicates, but also create a comprehensive and integrated view of the information that can be easily analyzed and used for decision-making purposes.

Consolidation plays a crucial role in the data orchestration process within the Brinqa Platform. After individual source data models (SDM) have been created from imported data, consolidation combines identical data from multiple sources and maps them to the unified data models (UDM). To illustrate this process, the following diagram simplifies it using two sources, CrowdStrike and Qualys Vulnerability Management (VM):

Consolidation diagram

Figure 1. Multiple source data models consolidated into unified data models.

When examining host details on the Inventory > Hosts page (hold the pointer over the record and click Details), you can click the Sources tab to view how the specific host record has been consolidated. The following screenshot showcases a host record consolidated from four different sources:

Sources tab in host details

For this host record, while the UID value is different across all sources, the MAC address is the same for Armis, CrowdStrike, and Qualys VM. Similarly, the hostname value is identical for Armis, CrowdStrike, and ServiceNow. The data from these four sources have been merged into a unified entry in the Brinqa Platform, effectively eliminating any duplications.

How does data consolidation work?

In the Brinqa Platform, consolidation is configured in the data model, which serves as a template for the imported data set. The Consolidation page, found within data models, offers two main functionalities:

  • Identifier designation: This feature lets you select and arrange the identifiers for data consolidation. By specifying which attributes serve as identifiers and in what order, you determine how data from different sources are combined and unified.

  • Attribute mapping: This feature enriches the consolidated data set by mapping attributes from different sources to corresponding attributes in the data model. This process ensures that the consolidated data set incorporates relevant and comprehensive information.

The following sections provide detailed explanations and examples for these functionalities.

Achieve data deduplication through identifier designation

When importing data from various sources, it's typical to encounter overlapping information, such as identical MAC address or hostname for a given host. To avoid importing duplicate data sets, Brinqa Integration+ connectors designate specific attributes as identifiers for data consolidation.

An identifier is a piece of information that uniquely identifies the host, regardless of the data source that provides the information. For example, the following screenshot showcases the Host data model in a test system. The data models listed under Sources are SDM created by the Brinqa Platform after importing from those data sources, and the identifiers are those used to consolidate the source data sets. The number displayed next to each identifier represents the count of SDM using this identifier. You can click the identifier to see which SDM is using it.

consodidation list of identifiers

When importing data from each source, Brinqa follows a top-to-bottom order of precedence for evaluating identifiers. It stops the evaluation after a match is found. Referring to the screenshot above, Brinqa starts by evaluating the Instance ID identifier.

  • If the instance ID of the incoming record matches that of an existing host, Brinqa consolidates the incoming record with the existing host and proceeds to the next record.
  • If the instance ID doesn't match, Brinqa creates a new host record.
  • If the instance ID is empty or missing from the incoming record, Brinqa then assesses the MAC addresses identifier in a similar manner, followed by the Public IP address, and so on. If none of the identifiers yield a match, Brinqa creates a new host record.

For multi-value attributes such as MAC addresses, Brinqa goes through each value in the incoming record one by one looking for a match in an existing host. Again, this evaluation stops when a match is found, or if no matches are found, it proceeds to the next record.

tip

Brinqa recommends that you always put UID as the last identifier as a catch-all mechanism since they tend to be unique in different sources.

Why are there still duplicates sometimes?

While Brinqa offers a predefined list of identifiers and their prioritized order, it may not align perfectly with the reliability and credibility of your specific data sources. Consequently, this misalignment can occasionally lead to duplication in the data set. For example, if your machines can have multiple IP addresses but possess unique hostnames, using Brinqa's default order would result in duplicates because Public IP address is consolidated before Hostname, as shown in the previous screenshot. To address this issue, one possible solution is to move Hostname higher in the identifiers list.

Similarly, when multiple sources contain the same attribute, such as MAC address, there can be significant variations in the reliability and credibility of the source data. By default, Brinqa arranges the source attributes for each identifier according to the order of data integration creation. If CrowdStrike is consolidated before other sources, for example, but the data from CrowdStrike tends to be less reliable, your consolidated may also become unreliable. To address this issue, you can move CrowdStrike lower in the attribute list for the MAC_ADDRESSES identifier or even remove it from the list entirely. This adjustment can help ensure the consolidation is based on more dependable data sources.

Lastly, duplicate entries may arise from the quality of your data. For example, if a significant number of your host records lack hostnames, they will be consolidated together based on the Hostname identifier. To address this situation, either populate the hostname for your records or move the Hostname identifier lower in the list. This adjustment can help mitigate the occurrence of duplicate entries.

Enrich data through attribute mapping

In addition to eliminating duplicates in data sets, consolidation also helps to create a comprehensive and integrated view of the information provided by the data sources. Given that the attributes in your data sources may differ from the predefined data models in the Brinqa Platform, the consolidation feature enables you to establish mappings between source attributes and target attributes.

The Brinqa Integration+ connectors provide a default mapping between source attributes and target attributes. You can view the attribute mapping on the Consolidation page of the data model and make modifications if necessary. In case the default mapping fails to capture the desired attributes, you can add your own mapping to the list. Furthermore, if you have introduced custom attributes for the data model, you must add the appropriate mapping for those attributes.

The provided screenshot demonstrates the mapping of the Hostname attribute within the Host data model in a test system. The Order precedence criterion dictates that the Hostname attribute stores the first non-empty value encountered, following a top-to-bottom evaluation.

consodidation identifier attribute mapping

tip

Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results.

In addition to Order precedence, Brinqa also supports Collection, Max, and Min as mapping criteria. The following table provides further information for each criterion:

CriteriaDescriptionExample
CollectionCombine all distinct values from the source attributes and store it in the target attribute. The target attribute must be of a multivalued type to accommodate the combined values.The mapping of the Tags attribute in the Host data model uses the Collection criterion, ensuring that the tags from all the sources are combined and stored.
MaxFind the maximum value among the attributes and save the corresponding source attribute in the target attribute.

If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute.
The mapping of the Last assessed attribute in the Host data model uses the Max criterion, ensuring that the most recent date from all the sources is stored.

To store the hostname of the last-assessed machine in the target Hostname attribute, select Hostname in Source attribute and Last assessed in Comparison attribute.
MinFind the minimum value among the attributes and save the corresponding source attribute in the target attribute.

If a comparison attribute is provided, the sorting takes place based on it; otherwise, it's performed on the source attribute.
The mapping of the First seen attribute in the Host data model uses the Min criterion, ensuring that the earliest date from all the sources is stored.

To store the hostname of the first-seen machine in the target Hostname attribute, select Hostname in Source attribute and First seen in Comparison attribute.
Order precedenceEvaluate the source attributes from top to bottom and save the first non-empty value in the target attribute. Brinqa recommends ordering the source attributes according to their reliability and credibility for optimal results.The mapping of the Operating system attribute in the Host data model uses the Order precedence criterion, ensuring that the most accurate value from all the sources is stored.

What happens with manual entries?

If you have added or modified records manually, Brinqa adds an SDM named Manual entry to the list of sources for consolidation. By default, Manual entry takes precedence over other SDM for both identifier designation and attribute mapping. In other words, Manual entry is listed at the top of each drop-down list under Identifiers and Attribute mappings. You can adjust the order if this doesn't align with your preference.

Not all data models support consolidation

Not all data models support consolidation out of the box. Parent data models, such as Asset, Finding, Finding Definition, or Ticket, lack the specificity to build the attack surface or comprehend the cybersecurity posture. Therefore, they don't support consolidation by default.

The following data models support consolidation out of the box:

  • Data models extending Asset: Account, Application, Certification, Cloud resource, Code project, Code repository, Container, Container image, Device, Host, Host image, IP range, Network segment, Package, Service, Site, Site certificate, and Subnet.

  • Data models extending Entity model: Assessment, Attack vector, Business service, Business unit, Company, CPE record, CVE record, Installed package, Threat intelligence, and Weakness.

  • Data models extending Finding: Dynamic code finding, Manual finding, Open source finding, Pentest finding, Static code finding, Violation, and Vulnerability.

  • Data models extending Finding definition: Dynamic code finding definition, Manual finding definition, Open source finding definition, Pentest finding definition, Static code finding definition, Violation definition, and Vulnerability definition.

  • Data models extending Ticket: Dynamic code ticket, Manual ticket, Open source ticket, Pentest ticket, Static code ticket, Violation ticket, and Vulnerability ticket.

To find out if the data model supports consolidation, follow these steps:

  1. Navigate to Administration admin-button > Data > Models.

  2. Locate the data model.

  3. On the Overview page, check if the Supports consolidation option is selected.

    If this option has been enabled, you should see Consolidation in the left menu.