Arrow Flight SQL

Arrow Flight SQL is a protocol for clients and servers to communicate with SQL-like semantics over Arrow Flight. While Arrow Flight provides basic request flows and data type definitions, it leaves several points up to the application. This document describes request flows and Protobuf message definitions to enable clients to retrieve metadata about and execute SQL-like queries against compliant Arrow Flight services.

All RPC calls defined here can be found in Flight.proto. All Protobuf definitions can be found in FlightSql.proto.

Note

This specification is still experimental and subject to change.

Metadata Commands

Servers expose metadata about the Arrow format, SQL syntax/features, and available tables/resources over Flight’s DoAction endpoint. A client should call DoAction with an action type specified below. If applicable, the action body should be the serialized form of the specified Protobuf message. The server should respond with an appropriate number of Flight Result values, where the Result body is the serialized form of the specified Protobuf message.

Message fields which are specified as a “filter pattern” can contain % and _ wildcards to match multiple values. These wildcards function as in SQL LIKE clauses.

Action Type

Description

Request Message

Result Message

Number of results

GetSqlInfo

Describe the server version and supported SQL features.

N/A

ActionGetSqlInfoResult

Exactly one

GetCatalogs

List available catalogs.

ActionGetCatalogsRequest

ActionGetCatalogResult

Zero or more

GetSchemas

List available schemas.

ActionGetSchemasRequest

ActionGetSchemasResult

Zero or more

GetTables

List available tables.

ActionGetTablesRequest

ActionGetTablesResult

Zero or more

GetTableTypes

List table types.

N/A

ActionGetTableTypesResult

Zero or more

Query Execution

Queries can be executed directly, or as prepared statements; they can also be executed either as result-returning statements, or as updates.

Transaction semantics are not currently defined. All queries are treated as though they run in their own transaction, with an implementation-defined isolation level, and auto-commit enabled.

Statements which return more than one result set, e.g. semicolon-delimited queries, are not supported.

Request Flows

Query Statements

This allows clients to execute ad-hoc queries which return tabular results.

To retrieve query results:

  1. The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementQuery message.

  2. The server returns a FlightInfo with one or more FlightEndpoints, each with an implementation-defined Ticket payload. The FlightInfo may optionally be populated the Arrow schema of the returned data.

  3. To retrieve the query results: for each returned endpoint, the client calls DoGet with the ticket of that endpoint. If there are multiple endpoints, the full result set is the concatentation of the indvididual result sets from each endpoint, in the order given by the server.

To instead retrieve the query schema:

  1. The client calls GetSchema with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementQuery message.

  2. The server returns a SchemaResult with the serialized Arrow schema.

Insert/Update Statements

This allows clients to execute ad-hoc queries which do not return tabular results (e.g. DDL statements, insert/update queries).

  1. The client calls DoPut with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandStatementUpdate message. The client also supplies the schema of the parameter values, if needed.

  2. During the DoPut call, the client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values and used to execute the query.

  3. The server returns a PutResult message containing the serialization of a DoPutUpdateResult message. This contains the number of affected rows, which can be zero, or -1 if the number is unknown.

Prepared Statements

This allows clients to create prepared statements for more efficient query execution.

To create a prepared statement:

  1. The client calls DoAction with type equal to GetPreparedStatement and with body containing a serialized ActionGetPreparedStatementRequest.

  2. The server returns a Result with body containing a serialized ActionGetPreparedStatementResult.

To close a prepared statement:

  1. The client calls DoAction with type equal to ClosePreparedStatement and with body containing a serialized ActionClosePreparedStatementRequest.

Clients must close prepared statements after they are done using the statement. Prepared statements may also time out after an implementation-defined duration. It is an error to close a prepared statement while a query is ongoing.

To use a prepared statement to query values:

  1. Optionally, to bind values to the statement:

    1. The client calls DoPut with a command-type FlightDescriptor. The descriptor command contains the serialization of a CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value. The prepared_statement_handle must come from a prior GetPreparedStatement call.

    2. The client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values.

  2. The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value, consistent with a prior DoPut call if there are parameters. The prepared_statement_handle must come from a prior GetPreparedStatement call.

  3. The server returns a FlightInfo with one or more FlightEndpoints, each with an implementation-defined Ticket payload. The FlightInfo may optionally be populated the Arrow schema of the returned data.

  4. To retrieve the query results: for each returned endpoint, the client calls DoGet with the ticket of that endpoint. If there are multiple endpoints, the full result set is the concatentation of the indvididual result sets from each endpoint, in the order given by the server.

To use a prepared statement to insert/update values:

  1. The client calls GetFlightInfo with a command-type FlightDescriptor. The descriptor command contains the serialization of the CommandPreparedStatementQuery message. The client_execution_handle is a client-chosen value. The prepared_statement_handle must come from a prior GetPreparedStatement call.

  2. During the DoPut call, the client supplies parameter values as batches of Arrow data. Each row is taken to be a set of parameter values and used to execute the query.

  3. The server returns a PutResult message containing the serialization of a DoPutUpdateResult message. This contains the number of affected rows, which can be zero, or -1 if the number is unknown.

client_execution_handle values may not be resued. They need not be distinct between distinct connections, but also cannot be reused across distinct connections. They need not be distinct between distinct prepared_statement_handle values. It is an error to use a prepared statement that is parameterized, but for which parameter values have not been bound.

Error Handling

Error Case

Applicable RPC Calls

Error Code

Query syntax is unrecognized/invalid.

  • DoAction(GetPreparedStatement)

  • DoPut

  • GetFlightInfo

  • GetSchema

INVALID_ARGUMENT

Ticket is unrecognized/invalid.

  • DoGet

INVALID_ARGUMENT

The prepared statement handle is unrecognized/invalid.

  • DoAction(ClosePreparedstatement)

  • DoPut

  • GetFlightInfo

  • GetSchema

NOT_FOUND

Too many or too few parameter values have been bound to the given client_execution_handle prior to execution.

  • GetFlightInfo

  • DoPut

INVALID_ARGUMENT

Prepared statement is in use and cannot be closed.

  • DoAction(ClosePreparedstatement)

INVALID_ARGUMENT

The client_execution_handle value was already used for the given prepared_statement_handle value.

  • DoPut

  • GetFlightInfo

ALREADY_EXISTS

Edge cases:

  • It is not an error if the server returns a different schema between GetFlightInfo and DoGet, or between a GetPreparedStatement action and DoGet.

Protocol Buffer Definitions

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 * <p>
 * http://www.apache.org/licenses/LICENSE-2.0
 * <p>
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

syntax = "proto3";

option java_package = "org.apache.arrow.flight.sql.impl";
package arrow.flight.protocol.sql;

/*
 * FlightSQL: a protocol for exposing SQL-like functionality over Arrow Flight.
 */

// Metadata Actions
//
// All messages here are for use with Flight DoAction.
// Serialized ActionXXXRequest messages are used as the "body" field of a Action.
// Serialized ActionXXXResult messages are used as the "body" field of a Result.

// GetSqlInfo: Request metadata about the SQL dialect implemented by the server.
//
// The Action type must be "GetSqlInfo".

/*
 * The result of a "GetSqlInfo" action.
 *
 * Keys and values are modeled after ODBC's SQLGetInfo() function. This information is intended
 * to provide FlightSQL clients with basic information about supported SQL syntax and features.
 *
 * This is not an exhaustive list and more information may be added in future releases, including
 * more syntax, supported functions, type conversion support, etc.
 *
 * Some values are provided at the top level instead, as direct message fields.
 *
 * FlightSQL supports the following information types:
 * 1. Server Information
 * 2. SQL Syntax Information
 * 3. Custom Information
 */
message ActionGetSqlInfoResult {
  // 1. Server: basic information about the server.
  ServerInfo server_info = 1;

  /*
   * 2. SQL Syntax: information about supported SQL syntax and features.
   *
   * Values are encoded in a map. Implementations are not required to return any of the keys
   * below. They may not return implementation-defined keys, but clients should prepared to
   * communicate with a server that implements newer or older revisions of FlightSQL and hence
   * may return deprecated or unknown keys. The keys and expected value types are as follows:
   *
   * SQL_DDL_CATALOG : integer_bitmask
   * Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
   * SQL_DI_DROP) of catalogs.
   * In a SQL environment, a catalog is a collection of schemas.
   * This property does not have a direct ODBC equivalent.
   *
   * SQL_DDL_SCHEMA : integer_bitmask
   * Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
   * SQL_DI_DROP) of schemas.
   * In a SQL environment, a schema is a collection of tables, views, indexes etc.
   * This property does not have a direct ODBC equivalent.
   *
   * SQL_DDL_TABLE : integer_bitmask
   * Indicates whether the server supports CREATE (flag SQL_DI_CREATE) and DROP (flag
   * SQL_DI_DROP) of tables.
   * In a SQL environment, a table is a collection of rows of information. Each row of information
   * may have one or more columns of data.
   * This property does not have a direct ODBC equivalent.
   *
   * SQL_IDENTIFIER_CASE : integer_value
   * Indicates the case sensitivity of catalog, table and schema names. Value must be one of the
   * constants SQL_IC_UPPER, SQL_IC_LOWER, SQL_IC_SENSITIVE, SQL_IC_MIXED.
   *
   * SQL_IDENTIFIER_QUOTE_CHAR : string_value
   * Indicates the supported character(s) used to surround a delimited identifier.
   *
   * SQL_QUOTED_IDENTIFIER_CASE : integer_value
   * Indicates case sensitivity of quoted identifiers. Value must be one of the
   * constants SQL_IC_UPPER, SQL_IC_LOWER, SQL_IC_SENSITIVE, SQL_IC_MIXED.
   */
  map<string, GetSqlInfoValue> flight_sql_info = 2;

  // 3. Custom Information: implementation-defined properties.
  //
  // Implementations should prefix keys with an identifier to avoid collisions with other
  // implementations.
  map<string, GetSqlInfoValue> custom_info = 3;
}

// Basic information about the server.
message ServerInfo {
  // The name of the server.
  string name = 1;
  // The native version of the server.
  string version = 2;
  // The Arrow format version of the server.
  string arrow_format_version = 3;
  // The Arrow library version of the server.
  string arrow_library_version = 4;
  // Indicates whether the server is read only.
  bool read_only = 5;
}

/*
 * Wrapper for values returned in ActionGetSqlInfoResult.
 */
message GetSqlInfoValue {
  oneof value {
    string string_value = 1;
    int32 integer_value = 2;
    int32 integer_bitmask = 3;
  }
}

// Enums used for ServerInfo values. These enums are not directly referenced; instead their
// values are encoded as bitfields in the integer_bitmask field in GetSqlInfoValue, or as
// integers in the integer_value field.
enum SqlInfoDdl {
  SQL_DI_NONE = 0;
  SQL_DI_CREATE = 1;
  SQL_DI_DROP = 2;
  // Next value is 4
}
enum SqlInfoIdentifierCase {
  SQL_IC_UPPER = 0;
  SQL_IC_LOWER = 1;
  SQL_IC_SENSITIVE = 2;
  SQL_IC_MIXED = 3;
}

/*
 * Request message for the "GetCatalogs" action.
 *
 * Requests a list of catalogs available in the server.
 */
message ActionGetCatalogsRequest {
  /*
   * Specifies the order of result values.
   */
  ResultsOrder order = 1;
}

/*
 * The result of a "GetCatalogs" action.
 */
message ActionGetCatalogsResult {
  repeated string catalog = 1;
}

/*
 * Request message for the "GetSchemas" action.
 *
 * Requests a list of schemas available in the server.
 */
message ActionGetSchemasRequest {
  /*
   * Specifies the order of result values with precedence:
   *  - catalog
   *  - schema
   */
  ResultsOrder order = 1;

  /*
   * Specifies the Catalog to search for schemas.
   * If omitted, then schemas for all catalogs are searched.
   */
  string catalog = 2;

  /*
   * Specifies a filter pattern for schemas to search for.
   * When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
   * In the pattern string, two special characters can be used to denote matching rules:
   *    - "%" means to match any substring with 0 or more characters.
   *    - "_" means to match any one character.
   */
  string schema_filter_pattern = 3;
}

/*
 * The result of a "GetSchemas" action.
 */
message ActionGetSchemasResult {
  string catalog = 1;
  string schema = 2;
}

/*
 * Request message for the "GetTables" action.
 *
 * Requests a list of tables available in the server.
 */
message ActionGetTablesRequest {
  /*
   * Specifies the order of result values with precedence:
   *  - catalog
   *  - schema
   *  - table_type
   *  - table
   */
  ResultsOrder order = 1;

  // Specifies the Catalog to search for schemas.
  string catalog = 2;

  /* Specifies a filter pattern for schemas to search for.
   * When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
   * In the pattern string, two special characters can be used to denote matching rules:
   *    - "%" means to match any substring with 0 or more characters.
   *    - "_" means to match any one character.
   */
  string schema_filter_pattern = 3;

  /* Specifies a filter pattern for tables to search for.
   * When no schema_filter_pattern is provided, the pattern will not be used to narrow the search.
   * In the pattern string, two special characters can be used to denote matching rules:
   *    - "%" means to match any substring with 0 or more characters.
   *    - "_" means to match any one character.
   */
  string table_name_filter_pattern = 4;

  // Specifies literal table types to search for.
  //
  // When no types are provided, tables of all types will be returned. Otherwise, tables of any
  // given type will be returned.
  repeated string table_types = 5;

  // Specifies if the schema should be returned for found tables.
  bool include_schema = 6;
}

/*
 * The result of a "GetTables" action.
 */
message ActionGetTablesResult {
  string catalog = 1;
  string schema = 2;
  string table = 3;
  string table_type = 4;

  // If include_schema was set in the request, this is the Arrow schema of the table to be
  // returned, as described in Schema.fbs::Schema, serialized as an IPC message.
  bytes table_schema = 5;
}

/*
 * The result of a "GetTableTypes" action.
 */
message ActionGetTableTypesResult {
  /*
   * A possible table type. E.g. table (regular data table), view, system table etc.
   *
   * The possible values are implementation-defined.
   */
  repeated string table_type = 1;
}

// SQL Execution Actions
//
// All messages here are for use with Flight DoAction.
// Serialized ActionXXXRequest messages are used as the "body" field of a Action.
// Serialized ActionXXXResult messages are used as the "body" field of a Result.

/*
 * Request message for the "GetPreparedStatement" action.
 *
 * Creates a new prepared statement server-side. Once the application is done with the prepared
 * statement, it must follow up with a ClosePreparedStatement action.
 */
message ActionGetPreparedStatementRequest {
  // The valid SQL string to get a prepared statement for.
  string query = 1;
}

/*
 * The result of a "GetPreparedStatement" action.
 */
message ActionGetPreparedStatementResult {
  // Opaque implementation-defined handle for the prepared statement on the server.
  bytes prepared_statement_handle = 1;

  // If the given query generates a result set, this is the Arrow schema of the results to be
  // returned, as described in Schema.fbs::Schema, serialized as an IPC message.
  bytes dataset_schema = 2;

  // If the given query contained parameters, this is the Arrow schema of the expected
  // parameters, as described in Schema.fbs::Schema, serialized as an IPC message.
  bytes parameter_schema = 3;
}

/*
 * Request message for the "ClosePreparedStatement" action.
 *
 * Closes server resources associated with the prepared statement handle.
 */
message ActionClosePreparedStatementRequest {
  // Opaque handle for the prepared statement on the server.
  string prepared_statement_handle = 1;
}


// SQL Execution Messages
//
// Serialized CommandXXX messages are wrapped in a Command message, then used as the "cmd" field of a FlightDescriptor.

/*
 * Execute an ad-hoc SQL query that returns a table of results.
 *
 * Valid for the following RPC calls:
 *  - GetSchema: return the schema of the query.
 *  - GetFlightInfo: execute the query. In this case, the application should follow with a DoGet call.
 */
message CommandStatementQuery {
  // The SQL query.
  string query = 1;
}

/*
 * Execute a prepared statement that returns a table of results.
 *
 * Must be preceded with a GetPreparedStatement action.
 *
 * Valid for the following RPC calls:
 *  - DoPut: bind parameter values.
 *  - GetFlightInfo: execute the prepared statement instance.
 *
 * The sequence of requests must be as follows:
 * 1. DoAction with a GetPreparedStatement payload
 * 2. DoPut with a CommandPreparedStatementQuery payload, with the prepared_statement_handle
 *    returned from the server previously
 * 3. GetFlightInfo with the same payload.
 * 4. DoGet for each endpoint returned from the previous step.
 */
message CommandPreparedStatementQuery {
  // Unique identifier for the instance of the prepared statement to execute.
  bytes client_execution_handle = 1;
  // Opaque handle for the prepared statement on the server.
  bytes prepared_statement_handle = 2;
}

/*
 * Execute an ad-hoc SQL query that does not return a table of results.
 *
 * Valid for the following RPC calls:
 *  - DoPut: execute the query. In this case, the server should return exactly one DoPutUpdateResult message.
 */
message CommandStatementUpdate {
  // The SQL query.
  string query = 1;
}

/*
 * Execute a prepared statement that does not return a table of results.
 *
 * Must be preceded with a GetPreparedStatement action.
 *
 * Valid for the following RPC calls:
 *  - DoPut: bind parameter values and execute the query.
 */
message CommandPreparedStatementUpdate {
  // Unique identifier for the instance of the prepared statement to execute.
  bytes client_execution_handle = 1;
  // Opaque handle for the prepared statement on the server.
  bytes prepared_statement_handle = 2;
}

/*
 * A top-level variant that wraps all the possible commands.
 */
message Command {
  oneof command {
    CommandStatementQuery statement_query = 1;
    CommandPreparedStatementQuery prepared_statement_query = 2;
    CommandStatementUpdate statement_update = 3;
    CommandPreparedStatementUpdate prepared_statement_update = 4;
  }
}

// Miscellaneous Messages

/*
 * If the client calls DoPut after CommandStatementUpdate or CommandPreparedStatementUpdate, this
 * message is used as the "app_metadata" field of the returned PutResult.
 */
message DoPutUpdateResult {
  // The number of records updated. A return value of -1 represents
  // an unknown updated record count.
  int64 record_count = 1;
}

/*
 * Helper message for clients to specify a result ordering in metadata actions.
 */
message ResultsOrder {
  enum Order {
    // Protobuf pattern, not used.
    ORDER_UNKNOWN = 0;

    // No ordering enforcement.
    ORDER_UNORDERED = 1;

    // Order results by ascending value order.
    ORDER_ASCENDING = 2;

    // Order results by descending value order.
    ORDER_DESCENDING = 3;
  }

  Order order = 1;
}