Arrow Flight SQL¶
Arrow Flight SQL is a protocol for clients and servers to communicate with SQL-like semantics over Arrow Flight. While Arrow Flight provides basic request flows and data type definitions, it leaves several points up to the application. This document describes request flows and Protobuf message definitions to enable clients to retrieve metadata about and execute SQL-like queries against compliant Arrow Flight services.
All RPC calls defined here can be found in Flight.proto. All Protobuf definitions can be found in FlightSql.proto.
Note
This specification is still experimental and subject to change.
Metadata Commands¶
Servers expose metadata about the Arrow format, SQL syntax/features, and available tables/resources over Flight’s DoAction endpoint. A client should call DoAction with an action type specified below. If applicable, the action body should be the serialized form of the specified Protobuf message. The server should respond with an appropriate number of Flight Result values, where the Result body is the serialized form of the specified Protobuf message.
Message fields which are specified as a “filter pattern” can contain %
and _
wildcards to match multiple values.
These wildcards function as in SQL LIKE clauses.
Action Type |
Description |
Request Message |
Result Message |
Number of results |
---|---|---|---|---|
GetSqlInfo |
Describe the server version and supported SQL features. |
N/A |
ActionGetSqlInfoResult |
Exactly one |
GetCatalogs |
List available catalogs. |
ActionGetCatalogsRequest |
ActionGetCatalogResult |
Zero or more |
GetSchemas |
List available schemas. |
ActionGetSchemasRequest |
ActionGetSchemasResult |
Zero or more |
GetTables |
List available tables. |
ActionGetTablesRequest |
ActionGetTablesResult |
Zero or more |
GetTableTypes |
List table types. |
N/A |
ActionGetTableTypesResult |
Zero or more |
Query Execution¶
Queries can be executed directly, or as prepared statements; they can also be executed either as result-returning statements, or as updates.
Transaction semantics are not currently defined. All queries are treated as though they run in their own transaction, with an implementation-defined isolation level, and auto-commit enabled.
Statements which return more than one result set, e.g. semicolon-delimited queries, are not supported.
Request Flows¶
Query Statements¶
This allows clients to execute ad-hoc queries which return tabular results.
To retrieve query results:
The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementQuery message.
The server returns a FlightInfo with one or more FlightEndpoints, each with an implementation-defined Ticket payload. The FlightInfo may optionally be populated the Arrow schema of the returned data.
To retrieve the query results: for each returned endpoint, the client calls DoGet with the ticket of that endpoint. If there are multiple endpoints, the full result set is the concatentation of the indvididual result sets from each endpoint, in the order given by the server.
To instead retrieve the query schema:
The client calls GetSchema with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementQuery message.
The server returns a SchemaResult with the serialized Arrow schema.
Insert/Update Statements¶
This allows clients to execute ad-hoc queries which do not return tabular results (e.g. DDL statements, insert/update queries).
The client calls DoPut with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementUpdate message. The client also supplies the schema of the parameter values, if needed.
During the DoPut call, the client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values and used to execute the query.
The server returns a PutResult message containing the serialization of a DoPutUpdateResult message. This contains the number of affected rows, which can be zero, or -1 if the number is unknown.
Prepared Statements¶
This allows clients to create prepared statements for more efficient query execution.
To create a prepared statement:
The client calls DoAction with type equal to GetPreparedStatement and with body containing a serialized ActionGetPreparedStatementRequest.
The server returns a Result with body containing a serialized ActionGetPreparedStatementResult.
To close a prepared statement:
The client calls DoAction with type equal to ClosePreparedStatement and with body containing a serialized ActionClosePreparedStatementRequest.
Clients must close prepared statements after they are done using the statement. Prepared statements may also time out after an implementation-defined duration. It is an error to close a prepared statement while a query is ongoing.
To use a prepared statement to query values:
Optionally, to bind values to the statement:
The client calls DoPut with a command-type FlightDescriptor. The descriptor command contains the serialization of a CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value. The prepared_statement_handle must come from a prior GetPreparedStatement call.
The client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values.
The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value, consistent with a prior DoPut call if there are parameters. The prepared_statement_handle must come from a prior GetPreparedStatement call.
The server returns a FlightInfo with one or more FlightEndpoints, each with an implementation-defined Ticket payload. The FlightInfo may optionally be populated the Arrow schema of the returned data.
To retrieve the query results: for each returned endpoint, the client calls DoGet with the ticket of that endpoint. If there are multiple endpoints, the full result set is the concatentation of the indvididual result sets from each endpoint, in the order given by the server.
To use a prepared statement to insert/update values:
The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value. The prepared_statement_handle must come from a prior GetPreparedStatement call.
During the DoPut call, the client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values and used to execute the query.
The server returns a PutResult message containing the serialization of a DoPutUpdateResult message. This contains the number of affected rows, which can be zero, or -1 if the number is unknown.
client_execution_handle values may not be resued. They need not be distinct between distinct connections, but also cannot be reused across distinct connections. They need not be distinct between distinct prepared_statement_handle values. It is an error to use a prepared statement that is parameterized, but for which parameter values have not been bound.
Error Handling¶
Error Case |
Applicable RPC Calls |
Error Code |
---|---|---|
Query syntax is unrecognized/invalid. |
|
INVALID_ARGUMENT |
Ticket is unrecognized/invalid. |
|
INVALID_ARGUMENT |
The prepared statement handle is unrecognized/invalid. |
|
NOT_FOUND |
Too many or too few parameter values have been bound to the given client_execution_handle prior to execution. |
|
INVALID_ARGUMENT |
Prepared statement is in use and cannot be closed. |
|
INVALID_ARGUMENT |
The client_execution_handle value was already used for the given prepared_statement_handle value. |
|
ALREADY_EXISTS |
Edge cases:
It is not an error if the server returns a different schema between GetFlightInfo and DoGet, or between a GetPreparedStatement action and DoGet.
Protocol Buffer Definitions¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | /*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* <p>
* http://www.apache.org/licenses/LICENSE-2.0
* <p>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
syntax = "proto3";
option java_package = "org.apache.arrow.flight.sql.impl";
package arrow.flight.protocol.sql;
/*
* FlightSQL: a protocol for exposing SQL-like functionality over Arrow Flight.
*/
// Metadata Actions
//
// All messages here are for use with Flight DoAction.
// Serialized ActionXXXRequest messages are used as the "body" field of a Action.
// Serialized ActionXXXResult messages are used as the "body" field of a Result.
// GetSqlInfo: Request metadata about the SQL dialect implemented by the server.
//
// The Action type must be "GetSqlInfo".
/*
* The result of a "GetSqlInfo" action.
*
* Keys and values are modeled after ODBC's SQLGetInfo() function. This information is intended
* to provide FlightSQL clients with basic information about supported SQL syntax and features.
*
* This is not an exhaustive list and more information may be added in future releases, including
* more syntax, supported functions, type conversion support, etc.
*
* Some values are provided at the top level instead, as direct message fields.
*
* FlightSQL supports the following information types:
* 1. Server Information
* 2. SQL Syntax Information
* 3. Custom Information
*/
message ActionGetSqlInfoResult {
// 1. Server: basic information about the server.
ServerInfo server_info = 1;
/*
* 2. SQL Syntax: information about supported SQL syntax and features.
*
* Values are encoded in a map. Implementations are not required to return any of the keys
* below. They may not return implementation-defined keys, but clients should prepared to
* communicate with a server that implements newer or older revisions of FlightSQL and hence
* may return deprecated or unknown keys. The keys and expected value types are as follows:
*
* SQL_DDL_CATALOG : integer_bitmask
* Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
* SQL_DI_DROP) of catalogs.
* In a SQL environment, a catalog is a collection of schemas.
* This property does not have a direct ODBC equivalent.
*
* SQL_DDL_SCHEMA : integer_bitmask
* Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
* SQL_DI_DROP) of schemas.
* In a SQL environment, a schema is a collection of tables, views, indexes etc.
* This property does not have a direct ODBC equivalent.
*
* SQL_DDL_TABLE : integer_bitmask
* Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
* SQL_DI_DROP) of tables.
* In a SQL environment, a table is a collection of rows of information. Each row of information
* may have one or more columns of data.
* This property does not have a direct ODBC equivalent.
*
* SQL_IDENTIFIER_CASE : integer_value
* Indicates the case sensitivity of catalog, table and schema names. Value must be one of the
* constants SQL_IC_UPPER, SQL_IC_LOWER, SQL_IC_SENSITIVE, SQL_IC_MIXED.
*
* SQL_IDENTIFIER_QUOTE_CHAR : string_value
* Indicates the supported character(s) used to surround a delimited identifier.
*
* SQL_QUOTED_IDENTIFIER_CASE : integer_value
* Indicates case sensitivity of quoted identifiers. Value must be one of the
* constants SQL_IC_UPPER, SQL_IC_LOWER, SQL_IC_SENSITIVE, SQL_IC_MIXED.
*/
map<string, GetSqlInfoValue> flight_sql_info = 2;
// 3. Custom Information: implementation-defined properties.
//
// Implementations should prefix keys with an identifier to avoid collisions with other
// implementations.
map<string, GetSqlInfoValue> custom_info = 3;
}
// Basic information about the server.
message ServerInfo {
// The name of the server.
string name = 1;
// The native version of the server.
string version = 2;
// The Arrow format version of the server.
string arrow_format_version = 3;
// The Arrow library version of the server.
string arrow_library_version = 4;
// Indicates whether the server is read only.
bool read_only = 5;
}
/*
* Wrapper for values returned in ActionGetSqlInfoResult.
*/
message GetSqlInfoValue {
oneof value {
string string_value = 1;
int32 integer_value = 2;
int32 integer_bitmask = 3;
}
}
// Enums used for ServerInfo values. These enums are not directly referenced; instead their
// values are encoded as bitfields in the integer_bitmask field in GetSqlInfoValue, or as
// integers in the integer_value field.
enum SqlInfoDdl {
SQL_DI_NONE = 0;
SQL_DI_CREATE = 1;
SQL_DI_DROP = 2;
// Next value is 4
}
enum SqlInfoIdentifierCase {
SQL_IC_UPPER = 0;
SQL_IC_LOWER = 1;
SQL_IC_SENSITIVE = 2;
SQL_IC_MIXED = 3;
}
/*
* Request message for the "GetCatalogs" action.
*
* Requests a list of catalogs available in the server.
*/
message ActionGetCatalogsRequest {
/*
* Specifies the order of result values.
*/
ResultsOrder order = 1;
}
/*
* The result of a "GetCatalogs" action.
*/
message ActionGetCatalogsResult {
repeated string catalog = 1;
}
/*
* Request message for the "GetSchemas" action.
*
* Requests a list of schemas available in the server.
*/
message ActionGetSchemasRequest {
/*
* Specifies the order of result values with precedence:
* - catalog
* - schema
*/
ResultsOrder order = 1;
/*
* Specifies the Catalog to search for schemas.
* If omitted, then schemas for all catalogs are searched.
*/
string catalog = 2;
/*
* Specifies a filter pattern for schemas to search for.
* When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
* In the pattern string, two special characters can be used to denote matching rules:
* - "%" means to match any substring with 0 or more characters.
* - "_" means to match any one character.
*/
string schema_filter_pattern = 3;
}
/*
* The result of a "GetSchemas" action.
*/
message ActionGetSchemasResult {
string catalog = 1;
string schema = 2;
}
/*
* Request message for the "GetTables" action.
*
* Requests a list of tables available in the server.
*/
message ActionGetTablesRequest {
/*
* Specifies the order of result values with precedence:
* - catalog
* - schema
* - table_type
* - table
*/
ResultsOrder order = 1;
// Specifies the Catalog to search for schemas.
string catalog = 2;
/* Specifies a filter pattern for schemas to search for.
* When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
* In the pattern string, two special characters can be used to denote matching rules:
* - "%" means to match any substring with 0 or more characters.
* - "_" means to match any one character.
*/
string schema_filter_pattern = 3;
/* Specifies a filter pattern for tables to search for.
* When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
* In the pattern string, two special characters can be used to denote matching rules:
* - "%" means to match any substring with 0 or more characters.
* - "_" means to match any one character.
*/
string table_name_filter_pattern = 4;
// Specifies literal table types to search for.
//
// When no types are provided, tables of all types will be returned. Otherwise, tables of any
// given type will be returned.
repeated string table_types = 5;
// Specifies if the schema should be returned for found tables.
bool include_schema = 6;
}
/*
* The result of a "GetTables" action.
*/
message ActionGetTablesResult {
string catalog = 1;
string schema = 2;
string table = 3;
string table_type = 4;
// If include_schema was set in the request, this is the Arrow schema of the table to be
// returned, as described in Schema.fbs::Schema, serialized as an IPC message.
bytes table_schema = 5;
}
/*
* The result of a "GetTableTypes" action.
*/
message ActionGetTableTypesResult {
/*
* A possible table type. E.g. table (regular data table), view, system table etc.
*
* The possible values are implementation-defined.
*/
repeated string table_type = 1;
}
// SQL Execution Actions
//
// All messages here are for use with Flight DoAction.
// Serialized ActionXXXRequest messages are used as the "body" field of a Action.
// Serialized ActionXXXResult messages are used as the "body" field of a Result.
/*
* Request message for the "GetPreparedStatement" action.
*
* Creates a new prepared statement server-side. Once the application is done with the prepared
* statement, it must follow up with a ClosePreparedStatement action.
*/
message ActionGetPreparedStatementRequest {
// The valid SQL string to get a prepared statement for.
string query = 1;
}
/*
* The result of a "GetPreparedStatement" action.
*/
message ActionGetPreparedStatementResult {
// Opaque implementation-defined handle for the prepared statement on the server.
bytes prepared_statement_handle = 1;
// If the given query generates a result set, this is the Arrow schema of the results to be
// returned, as described in Schema.fbs::Schema, serialized as an IPC message.
bytes dataset_schema = 2;
// If the given query contained parameters, this is the Arrow schema of the expected
// parameters, as described in Schema.fbs::Schema, serialized as an IPC message.
bytes parameter_schema = 3;
}
/*
* Request message for the "ClosePreparedStatement" action.
*
* Closes server resources associated with the prepared statement handle.
*/
message ActionClosePreparedStatementRequest {
// Opaque handle for the prepared statement on the server.
string prepared_statement_handle = 1;
}
// SQL Execution Messages
//
// Serialized CommandXXX messages are wrapped in a Command message, then used as the "cmd" field of a FlightDescriptor.
/*
* Execute an ad-hoc SQL query that returns a table of results.
*
* Valid for the following RPC calls:
* - GetSchema: return the schema of the query.
* - GetFlightInfo: execute the query. In this case, the application should follow with a DoGet call.
*/
message CommandStatementQuery {
// The SQL query.
string query = 1;
}
/*
* Execute a prepared statement that returns a table of results.
*
* Must be preceded with a GetPreparedStatement action.
*
* Valid for the following RPC calls:
* - DoPut: bind parameter values.
* - GetFlightInfo: execute the prepared statement instance.
*
* The sequence of requests must be as follows:
* 1. DoAction with a GetPreparedStatement payload
* 2. DoPut with a CommandPreparedStatementQuery payload, with the prepared_statement_handle
* returned from the server previously
* 3. GetFlightInfo with the same payload.
* 4. DoGet for each endpoint returned from the previous step.
*/
message CommandPreparedStatementQuery {
// Unique identifier for the instance of the prepared statement to execute.
bytes client_execution_handle = 1;
// Opaque handle for the prepared statement on the server.
bytes prepared_statement_handle = 2;
}
/*
* Execute an ad-hoc SQL query that does not return a table of results.
*
* Valid for the following RPC calls:
* - DoPut: execute the query. In this case, the server should return exactly one DoPutUpdateResult message.
*/
message CommandStatementUpdate {
// The SQL query.
string query = 1;
}
/*
* Execute a prepared statement that does not return a table of results.
*
* Must be preceded with a GetPreparedStatement action.
*
* Valid for the following RPC calls:
* - DoPut: bind parameter values and execute the query.
*/
message CommandPreparedStatementUpdate {
// Unique identifier for the instance of the prepared statement to execute.
bytes client_execution_handle = 1;
// Opaque handle for the prepared statement on the server.
bytes prepared_statement_handle = 2;
}
/*
* A top-level variant that wraps all the possible commands.
*/
message Command {
oneof command {
CommandStatementQuery statement_query = 1;
CommandPreparedStatementQuery prepared_statement_query = 2;
CommandStatementUpdate statement_update = 3;
CommandPreparedStatementUpdate prepared_statement_update = 4;
}
}
// Miscellaneous Messages
/*
* If the client calls DoPut after CommandStatementUpdate or CommandPreparedStatementUpdate, this
* message is used as the "app_metadata" field of the returned PutResult.
*/
message DoPutUpdateResult {
// The number of records updated. A return value of -1 represents
// an unknown updated record count.
int64 record_count = 1;
}
/*
* Helper message for clients to specify a result ordering in metadata actions.
*/
message ResultsOrder {
enum Order {
// Protobuf pattern, not used.
ORDER_UNKNOWN = 0;
// No ordering enforcement.
ORDER_UNORDERED = 1;
// Order results by ascending value order.
ORDER_ASCENDING = 2;
// Order results by descending value order.
ORDER_DESCENDING = 3;
}
Order order = 1;
}
|