HEBench
Simple Set Intersection Workload

Operation Description

Summary

This operation is defined as:

Z = op(X, Y)

where

op(X, Y) = X ⋂ Y

and operation is the simple set intersection.

Operation Parameters

Input: 2 parameters

Parameter Description
0 X is a dataset containing n items.
1 Y is a dataset containing m items.

An item is a vector of k >= 1 elements.

Result:

Output: 1 output

Output Description
0 Z is a set with, at most min(n, m) items, where every item in Z is present in both X and Y.

If the number of items in Z is less than min(n, m), backend must pad the remaining items with 0. Items in Z have no defined ordering: ordering is up to the backend implementation.

Details

If A is a set, and a_i is an item in A, then, the standard simple set intersection operation is defined as:

Z = { z_i; where z_i ∈ X and z_i ∈ Y }

Workloads

This document applies to the following workloads:

Workload Parameters

Required workload parameters: 3

Index Name Type Description
0 n uint64_t Size of dataset X (number of items in set).
1 m uint64_t Size of dataset Y (number of items in set).
2 k uint64_t Number of elements in an item of a dataset. Must be greater than 0.

A backend must specify, at least, a set of default arguments for these parameters.

Backends can require extra parameters beyond the base requirements. If a backend requires extra parameters, these must have default values in every set of default arguments for the workload parameters.

Categories

This workload supports the following categories:

Category Parameters

Latency

See hebench::APIBridge::CategoryParams::latency.

Offline

See hebench::APIBridge::CategoryParams::offline.

Value ranges for elements in CategoryParams::offline::data_count. Default value is used when the backend implementation sets the data_count for the corresponding operand to 0, but user specified 0 or no value at run-time.

Parameter Lower bound Upper bound Default
0 1 none 5
1 1 none 5

Data Type

This workload is defined for the following data types:

@ Float64
64 bits IEEE 754 standard floating point real numbers.
Definition: types.h:306
@ Int64
64 bits signed integers.
Definition: types.h:304
@ Float32
32 bits IEEE 754 standard floating point real numbers.
Definition: types.h:382
@ Int32
32 bits signed integers.
Definition: types.h:380

Data Layout

1. Sets

All the items in a set are vectors with elements of type T (where T is any of the supported numeric types). Every element of an item lies contiguous in memory, and every item also lies contiguous in memory.

For example, given the following set X with n = 3, k = 2 and a set being represented by a vector components:

X = { [ x_00, x_01], [x_10, x_11], [x_20, x21] }

The elements will be stored in memory as:

Offset: 0 1 2 3 4 5
X x_00 x_01 x_10 x_11 x_20 x21

For this case, backends should expect this layout for their raw, clear text inputs, and must generate this layout for their decoded outputs.

Dataset

Supported modes:

Generate External
yes yes

Data Generation

Data generation for sets used as input for this workload occurs during workload initialization by Test Harness. Ground truths are pre-computed during data generation. There is no standard dataset.

During data generation, all set elements are extracted from a pseudo-random uniform distribution between -16384 and 16384: u(-16384, 16384).

External Datasets

From external datasets, if an item is shorter than k, it will be padded. If an item is longer than k it will be truncated and a warning will be displayed.