Compute Functions¶
Datum class¶
-
class
Datum
¶ Variant type for various Arrow C++ data structures.
Public Functions
-
Datum
()¶ Empty datum, to be populated elsewhere.
-
ValueDescr
descr
() const¶ Return the shape (array or scalar) and type for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).
Debug asserts otherwise
-
ValueDescr::Shape
shape
() const¶ Return the shape (array or scalar) for supported kinds (ARRAY, CHUNKED_ARRAY, and SCALAR).
Debug asserts otherwise
-
std::shared_ptr<DataType>
type
() const¶ The value type of the variant, if any.
- Return
nullptr if no type
-
std::shared_ptr<Schema>
schema
() const¶ The schema of the variant, if any.
- Return
nullptr if no schema
-
int64_t
length
() const¶ The value length of the variant, if any.
- Return
kUnknownLength if no type
-
ArrayVector
chunks
() const¶ The array chunks of the variant, if any.
- Return
empty if not arraylike
-
Abstract Function classes¶
-
struct
FunctionOptions
¶ - #include <arrow/compute/function.h>
Base class for specifying options configuring a function’s behavior, such as error handling.
Subclassed by arrow::compute::ArithmeticOptions, arrow::compute::ArraySortOptions, arrow::compute::CastOptions, arrow::compute::CompareOptions, arrow::compute::CountOptions, arrow::compute::DictionaryEncodeOptions, arrow::compute::FilterOptions, arrow::compute::MatchSubstringOptions, arrow::compute::MinMaxOptions, arrow::compute::ModeOptions, arrow::compute::PartitionNthOptions, arrow::compute::ProjectOptions, arrow::compute::QuantileOptions, arrow::compute::ReplaceSubstringOptions, arrow::compute::SetLookupOptions, arrow::compute::SortOptions, arrow::compute::SplitOptions, arrow::compute::StrptimeOptions, arrow::compute::TDigestOptions, arrow::compute::TakeOptions, arrow::compute::TrimOptions, arrow::compute::VarianceOptions
-
struct
Arity
¶ - #include <arrow/compute/function.h>
Contains the number of required arguments for the function.
Naming conventions taken from https://en.wikipedia.org/wiki/Arity.
Public Members
-
int
num_args
¶ The number of required arguments (or the minimum number for varargs functions).
-
bool
is_varargs
= false¶ If true, then the num_args is the minimum number of required arguments.
Public Static Functions
-
int
-
struct
FunctionDoc
¶ - #include <arrow/compute/function.h>
Public Members
-
std::string
summary
¶ A one-line summary of the function, using a verb.
For example, “Add two numeric arrays or scalars”.
-
std::string
description
¶ A detailed description of the function, meant to follow the summary.
-
std::vector<std::string>
arg_names
¶ Symbolic names (identifiers) for the function arguments.
Some bindings may use this to generate nicer function signatures.
-
std::string
options_class
¶ Name of the options class, if any.
-
std::string
-
class
Function
¶ - #include <arrow/compute/function.h>
Base class for compute functions.
Function implementations contain a collection of “kernels” which are implementations of the function for specific argument types. Selecting a viable kernel for executing a function is referred to as “dispatching”.
Subclassed by arrow::compute::detail::FunctionImpl< VectorKernel >, arrow::compute::detail::FunctionImpl< ScalarKernel >, arrow::compute::detail::FunctionImpl< ScalarAggregateKernel >, arrow::compute::detail::FunctionImpl< HashAggregateKernel >, arrow::compute::MetaFunction, arrow::compute::detail::FunctionImpl< KernelType >
Public Types
-
compute-functions::Kind
The kind of function, which indicates in what contexts it is valid for use.
Values:
A function that performs scalar data operations on whole arrays of data.
Can generally process Array or Scalar values. The size of the output will be the same as the size (or broadcasted size, in the case of mixing Array and Scalar inputs) of the input.
A function with array input and output whose behavior depends on the values of the entire arrays passed, rather than the value of each scalar value.
A function that computes scalar summary statistics from array input.
A function that computes grouped summary statistics from array input and an array of group identifiers.
A function that dispatches to other functions and does not contain its own kernels.
Public Functions
-
const std::string &
name
() const¶ The name of the kernel. The registry enforces uniqueness of names.
-
Function::Kind
kind
() const¶ The kind of kernel, which indicates in what contexts it is valid for use.
-
const Arity &
arity
() const¶ Contains the number of arguments the function requires, or if the function accepts variable numbers of arguments.
-
const FunctionDoc &
doc
() const¶ Return the function documentation.
-
virtual int
num_kernels
() const = 0¶ Returns the number of registered kernels for this function.
-
virtual Result<const Kernel *>
DispatchExact
(const std::vector<ValueDescr> &values) const¶ Return a kernel that can execute the function given the exact argument types (without implicit type casts or scalar->array promotions).
NB: This function is overridden in CastFunction.
-
virtual Result<const Kernel *>
DispatchBest
(std::vector<ValueDescr> *values) const¶ Return a best-match kernel that can execute the function given the argument types, after implicit casts are applied.
- Parameters
[inout] values
: Argument types. An element may be modified to indicate that the returned kernel only approximately matches the input value descriptors; callers are responsible for casting inputs to the type and shape required by the kernel.
-
virtual Result<Datum>
Execute
(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const¶ Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
const FunctionOptions *
default_options
() const¶ Returns a the default options for this function.
Whatever option semantics a Function has, implementations must guarantee that default_options() is valid to pass to Execute as options.
-
-
class
ScalarFunction
: public arrow::compute::detail::FunctionImpl<ScalarKernel>¶ - #include <arrow/compute/function.h>
A function that executes elementwise operations on arrays or scalars, and therefore whose results generally do not depend on the order of the values in the arguments.
Accepts and returns arrays that are all of the same size. These functions roughly correspond to the functions used in SQL expressions.
Subclassed by arrow::compute::CastFunction
Public Functions
-
Status
AddKernel
(std::vector<InputType> in_types, OutputType out_type, ArrayKernelExec exec, KernelInit init = NULLPTR)¶ Add a kernel with given input/output types, no required state initialization, preallocation for fixed-width types, and default null handling (intersect validity bitmaps of inputs).
-
Status
-
class
VectorFunction
: public arrow::compute::detail::FunctionImpl<VectorKernel>¶ - #include <arrow/compute/function.h>
A function that executes general array operations that may yield outputs of different sizes or have results that depend on the whole array contents.
These functions roughly correspond to the functions found in non-SQL array languages like APL and its derivatives.
-
class
ScalarAggregateFunction
: public arrow::compute::detail::FunctionImpl<ScalarAggregateKernel>¶ - #include <arrow/compute/function.h>
-
class
HashAggregateFunction
: public arrow::compute::detail::FunctionImpl<HashAggregateKernel>¶ - #include <arrow/compute/function.h>
-
class
MetaFunction
: public arrow::compute::Function¶ - #include <arrow/compute/function.h>
A function that dispatches to other functions.
Must implement MetaFunction::ExecuteImpl.
For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution of concrete Function types, but must handle other Datum kinds on its own.
Public Functions
-
int
num_kernels
() const¶ Returns the number of registered kernels for this function.
-
Result<Datum>
Execute
(const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx) const¶ Execute the function eagerly with the passed input arguments with kernel dispatch, batch iteration, and memory allocation details taken care of.
If the
options
pointer is null, thendefault_options()
will be used.This function can be overridden in subclasses.
-
int
Function registry¶
-
class
FunctionRegistry
¶ A mutable central function registry for built-in functions as well as user-defined functions.
Functions are implementations of arrow::compute::Function.
Generally, each function contains kernels which are implementations of a function for a specific argument signature. After looking up a function in the registry, one can either execute it eagerly with Function::Execute or use one of the function’s dispatch methods to pick a suitable kernel for lower-level function execution.
Public Functions
Add a new function to the registry.
Returns Status::KeyError if a function with the same name is already registered
-
Status
AddAlias
(const std::string &target_name, const std::string &source_name)¶ Add aliases for the given function name.
Returns Status::KeyError if the function with the given name is not registered
-
Result<std::shared_ptr<Function>>
GetFunction
(const std::string &name) const¶ Retrieve a function by name from the registry.
-
std::vector<std::string>
GetFunctionNames
() const¶ Return vector of all entry names in the registry.
Helpful for displaying a manifest of available functions
-
int
num_functions
() const¶ The number of currently registered functions.
Public Static Functions
-
static std::unique_ptr<FunctionRegistry>
Make
()¶ Construct a new registry.
Most users only need to use the global registry
-
FunctionRegistry *
arrow::compute
::
GetFunctionRegistry
()¶ Return the process-global function registry.
Convenience functions¶
-
Result<Datum>
arrow::compute
::
CallFunction
(const std::string &func_name, const std::vector<Datum> &args, const FunctionOptions *options, ExecContext *ctx = NULLPTR)¶ One-shot invoker for all types of functions.
Does kernel dispatch, argument checking, iteration of ChunkedArray inputs, and wrapping of outputs.
-
Result<Datum>
arrow::compute
::
CallFunction
(const std::string &func_name, const std::vector<Datum> &args, ExecContext *ctx = NULLPTR)¶ Variant of CallFunction which uses a function’s default options.
NB: Some functions require FunctionOptions be provided.
Concrete options classes¶
-
compute-concrete-options::CompareOperator
Values:
-
compute-concrete-options::SortOrder
Values:
-
struct
CountOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control Count kernel behavior.
By default, all non-null values are counted.
Public Types
-
compute-concrete-options::Mode
Values:
Count all non-null values.
Count all null values.
Public Functions
-
CountOptions
(enum Mode count_mode = COUNT_NON_NULL)¶
Public Members
-
Mode
count_mode
¶
Public Static Functions
-
static CountOptions
Defaults
()¶
-
-
struct
MinMaxOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control MinMax kernel behavior.
By default, null values are ignored
Public Types
-
compute-concrete-options::Mode
Values:
Skip null values.
Any nulls will result in null output.
Public Functions
-
MinMaxOptions
(enum Mode null_handling = SKIP)¶
Public Members
-
Mode
null_handling
¶
Public Static Functions
-
static MinMaxOptions
Defaults
()¶
-
-
struct
ModeOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control Mode kernel behavior.
Returns top-n common values and counts. By default, returns the most common value and count.
Public Functions
-
ModeOptions
(int64_t n = 1)¶
Public Members
-
int64_t
n
= 1¶
Public Static Functions
-
static ModeOptions
Defaults
()¶
-
-
struct
VarianceOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control Delta Degrees of Freedom (ddof) of Variance and Stddev kernel.
The divisor used in calculations is N - ddof, where N is the number of elements. By default, ddof is zero, and population variance or stddev is returned.
Public Functions
-
VarianceOptions
(int ddof = 0)¶
Public Members
-
int
ddof
= 0¶
Public Static Functions
-
static VarianceOptions
Defaults
()¶
-
-
struct
QuantileOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control Quantile kernel behavior.
By default, returns the median value.
Public Types
-
compute-concrete-options::Interpolation
Interpolation method to use when quantile lies between two data points.
Values:
Public Functions
-
QuantileOptions
(double q = 0.5, enum Interpolation interpolation = LINEAR)¶
-
QuantileOptions
(std::vector<double> q, enum Interpolation interpolation = LINEAR)¶
Public Members
-
std::vector<double>
q
¶ quantile must be between 0 and 1 inclusive
-
Interpolation
interpolation
¶
Public Static Functions
-
static QuantileOptions
Defaults
()¶
-
-
struct
TDigestOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_aggregate.h>
Control TDigest approximate quantile kernel behavior.
By default, returns the median value.
Public Functions
-
TDigestOptions
(double q = 0.5, uint32_t delta = 100, uint32_t buffer_size = 500)¶
-
TDigestOptions
(std::vector<double> q, uint32_t delta = 100, uint32_t buffer_size = 500)¶
Public Members
-
std::vector<double>
q
¶ quantile must be between 0 and 1 inclusive
-
uint32_t
delta
¶ compression parameter, default 100
-
uint32_t
buffer_size
¶ input buffer size, default 500
Public Static Functions
-
static TDigestOptions
Defaults
()¶
-
-
struct
ArithmeticOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
ArithmeticOptions
()¶
Public Members
-
bool
check_overflow
¶
-
-
struct
MatchSubstringOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
MatchSubstringOptions
(std::string pattern)¶
Public Members
-
std::string
pattern
¶ The exact substring (or regex, depending on kernel) to look for inside input values.
-
-
struct
SplitOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Subclassed by arrow::compute::SplitPatternOptions
Public Functions
-
SplitOptions
(int64_t max_splits = -1, bool reverse = false)¶
-
-
struct
SplitPatternOptions
: public arrow::compute::SplitOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
SplitPatternOptions
(std::string pattern, int64_t max_splits = -1, bool reverse = false)¶
Public Members
-
std::string
pattern
¶ The exact substring to look for inside input values.
-
-
struct
ReplaceSubstringOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
ReplaceSubstringOptions
(std::string pattern, std::string replacement, int64_t max_replacements = -1)¶
-
-
struct
SetLookupOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Options for IsIn and IndexIn functions.
Public Members
-
bool
skip_nulls
¶ Whether nulls in
value_set
count for lookup.If true, any null in
value_set
is ignored and nulls in the input produce null (IndexIn) or false (IsIn) values in the output. If false, any null invalue_set
is successfully matched in the input.
-
bool
-
struct
StrptimeOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
-
struct
TrimOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
TrimOptions
(std::string characters)¶
Public Members
-
std::string
characters
¶ The individual characters that can be trimmed from the string.
-
-
struct
CompareOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
Public Functions
-
CompareOptions
(CompareOperator op)¶
Public Members
-
CompareOperator
op
¶
-
-
struct
ProjectOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_scalar.h>
-
struct
FilterOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Public Types
-
compute-concrete-options::NullSelectionBehavior
Configure the action taken when a slot of the selection mask is null.
Values:
the corresponding filtered value will be removed in the output
the corresponding filtered value will be null in the output
Public Functions
-
FilterOptions
(NullSelectionBehavior null_selection = DROP)¶
Public Members
-
NullSelectionBehavior
null_selection_behavior
= DROP¶
Public Static Functions
-
static FilterOptions
Defaults
()¶
-
-
struct
TakeOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Public Functions
-
TakeOptions
(bool boundscheck = true)¶
Public Members
-
bool
boundscheck
= true¶
Public Static Functions
-
static TakeOptions
BoundsCheck
()¶
-
static TakeOptions
NoBoundsCheck
()¶
-
static TakeOptions
Defaults
()¶
-
-
struct
DictionaryEncodeOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Options for the dictionary encode function.
Public Types
-
compute-concrete-options::NullEncodingBehavior
Configure how null values will be encoded.
Values:
the null value will be added to the dictionary with a proper index
the null value will be masked in the indices array
Public Functions
-
DictionaryEncodeOptions
(NullEncodingBehavior null_encoding = MASK)¶
Public Members
-
NullEncodingBehavior
null_encoding_behavior
= MASK¶
Public Static Functions
-
static DictionaryEncodeOptions
Defaults
()¶
-
-
struct
SortKey
¶ - #include <arrow/compute/api_vector.h>
One sort key for PartitionNthIndices (TODO) and SortIndices.
Public Functions
-
SortKey
(std::string name, SortOrder order = SortOrder::Ascending)¶
-
-
struct
ArraySortOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Public Functions
-
ArraySortOptions
(SortOrder order = SortOrder::Ascending)¶
Public Members
-
SortOrder
order
¶
Public Static Functions
-
static ArraySortOptions
Defaults
()¶
-
-
struct
SortOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Public Static Functions
-
static SortOptions
Defaults
()¶
-
static SortOptions
-
struct
PartitionNthOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/api_vector.h>
Partitioning options for NthToIndices.
Public Functions
-
PartitionNthOptions
(int64_t pivot)¶
Public Members
-
int64_t
pivot
¶ The index into the equivalent sorted array of the partition pivot element.
-
-
struct
CastOptions
: public arrow::compute::FunctionOptions¶ - #include <arrow/compute/cast.h>
Public Functions
-
CastOptions
(bool safe = true)¶
Public Members
-
bool
allow_int_overflow
¶
-
bool
allow_time_truncate
¶
-
bool
allow_time_overflow
¶
-
bool
allow_decimal_truncate
¶
-
bool
allow_float_truncate
¶
-
bool
allow_invalid_utf8
¶
Public Static Functions
-