Key Joins -- Interactive Demo

7 Specification

The Specification realises the Definition as an SQL operator with compile-time verification. Its subsections cover:

Operand surfaces and proof facts (Section 7.1): an informative model for carrying the proof obligations through a query.
Syntax (Section 7.2): a syntax for declaring a key join, naming the referencing and referenced sides and their column lists.
Compile-time verification (Section 7.3): verification rules that accept a key join only when the DBMS can prove the three conditions of the Definition against the join's intermediate result. One sub-subsection addresses each condition; a fourth covers cases where no proof can be formed.
Propagation through derived tables (Section 7.4): rules that carry the proof obligations through views, common table expressions (CTEs), and subqueries.
Constraint requirements (Section 7.5): the schema-level declarations (PRIMARY KEY, UNIQUE, FOREIGN KEY, NOT NULL) the Specification relies on.
Schema changes (Section 7.6): how dependent objects, especially views, interact with constraint changes that would invalidate a key join.
Soundness (Section 7.7): why every key join accepted by the verification rules satisfies the Definition.

7.1 Operand Surfaces and Proof Facts

This subsection is informative. It describes the model for carrying the proof obligations through a query.

A "surface" is a table reference considered at a particular point in a FROM clause: a base table reference, derived table, CTE reference, view reference, joined table, TABLESAMPLE, function, or other FROM-clause item. Under the rules in this paper, base tables, subqueries, CTEs, and join trees can expose key-join proof facts. Other surfaces expose none unless an implementation defines additional proof rules for them.

A key join is validated against the two surfaces supplied to that operator. These are its "operand surfaces". This locality follows from the Definition: the proof obligation is about the intermediate result at the key-join point, not about the final result of the whole query.

A surface may expose proof facts derived from schema declarations and recognized query structure. If the DBMS cannot derive a fact, that fact is absent. The useful fact kinds are uniqueness, not-nullness, containment, and row coverage.

Unique fact. A set of columns known to identify at most one row for each all-non-null tuple of values. A surface's unique facts are the set of such column sets known for that surface.

Not null fact. A single column known not to contain nulls at this point in the query. A surface's not-null facts are the set of such columns known for that surface.

Containment and coverage fact. Both are based on the same predicate between two surfaces S and S' on column lists C and C' with the same arity. Let P_C(R) denote the projection of row R onto the columns in C:

For every row R in S, either P_C(R) contains a null, or there exists a row R' in S' such that P_C(R) ⊆_pointwise P_C'(R').

For key joins, this is equality of the corresponding non-null key values. The containment notation matches the broader vocabulary used elsewhere in the standard.

A containment fact lives on S (the referencing side). A coverage fact is the same predicate indexed on S' (the referenced side): it says that S' covers the values that can be required by S. This is not a claim that every row of S' is used by S.

A catalog FK from A(C) to B(C') yields a containment fact on A for the A(C) values contained in B(C'). A usable key on B(C') yields the referenced-side proof facts on B: uniqueness and row coverage for B(C'). An implementation may also materialize row coverage for A(C) when it harvests the FK. That row-coverage fact is about A's own all-non-null FK values and is not the coverage-side index of the A(C) to B(C') containment predicate.

The two differ in how they handle filters:

Filtering S preserves containment (fewer rows, still a subset), so the fact stays active. The filter may be recorded as part of the surface's proof evidence.
Filtering S' can break coverage: removed rows might have been the only match. Filters must therefore be recorded as proof evidence; unrecognized filters deactivate the fact.

When both sides are filtered, the proof remaps the coverage-side filter through the FK column pairing and checks that the containment side recorded a matching filter. This is matched filter evidence, which is part of proving Condition 2 when both sides carry filters.

7.2 Syntax

Consider a schema for an organization:

SQL

setup

A query that retrieves an employee's department using a key join:

SQL

The arrow <- points from the FK table (employees) to the PK table (departments). This declares: "join departments to employees along the referential constraint from employees.dept_id to departments.dept_id."

The arrow can point either way. A query starting from departments:

SQL

Here -> points from the FK table (employees, being joined) to the PK table (departments, already in the FROM clause). This returns all employees in department 12. The direction always reflects the referential constraint, regardless of which table appears first in the query.

A key join is written as a join specification, alongside ON and USING:

<table reference> [ <join type> ]
    JOIN right_rel FOR KEY (right_col [, ...]) -> left_rel (left_col [, ...])

<table reference> [ <join type> ]
    JOIN right_rel FOR KEY (right_col [, ...]) <- left_rel (left_col [, ...])

The column list immediately after FOR KEY names columns exposed by the syntactic right operand of the join. The relation name after the arrow identifies a relation visible from the left operand. The -> form is used when the right operand is the FK side; the <- form is used when the right operand is the PK side. Because a key join is a join specification, it can appear anywhere a joined table can appear, including query expressions used within insert statements, merge statements, declare cursor, and subqueries.

The columns in each list are matched positionally: the first column on the left corresponds to the first column on the right, and so on. The order need not match the order in the constraint definition; only the pairwise correspondence between the two lists matters.

A key join is a join specification, so it does not apply to cross join, which has no join specification.

A key join determines the join condition but does not merge output columns. Its output columns are the same as for JOIN ... ON, not the reduced column set produced by JOIN ... USING. A USING (...) form with a join-local FILTER could be defined mechanically, but this proposal keeps the column mapping explicit by naming both the referenced and referencing columns.

Key joins may be combined with LEFT, RIGHT, and FULL join types. The key join specifies the join condition (which referential constraint is being followed and which columns correspond) while the join type controls the result shape. These are orthogonal, just as ON and join type are orthogonal in existing SQL.

When used with an outer join, the null-extension rules apply as usual. A LEFT key join preserves all rows from the left-hand table, null-extending columns from the right-hand table when no match is found. The DBMS still verifies the referential constraint at compile time; what changes is only the treatment of unmatched rows in the result.

The most common use of an outer key join involves nullable FK columns. A referential constraint permits NULL values in the FK columns: a NULL FK value does not violate the constraint, but it does not match any PK row. An inner key join excludes these rows; a left key join preserves them:

SQL

7.2.1 Multiple Foreign Keys Between the Same Tables

When two tables are related by more than one referential constraint, the column lists disambiguate which relationship the join follows:

SQL

The column lists make it unambiguous which referential constraint is being followed.

7.2.2 Self-Joins

A table may reference itself. In the schema above, employees.manager_id references employees.emp_id. A key join can follow this self-referencing constraint by aliasing the table twice:

SQL

The arrow points from e (the FK side, via manager_id) to m (the PK side, via emp_id). The two aliases form separate operand surfaces, even though they resolve to the same base table.

7.2.3 Join-Local FILTER

SQL developers commonly add predicates to the ON clause of a join beyond the equijoin condition:

SQL

This filters which rows qualify as a match during the join. For a LEFT JOIN, non-matching rows are null-extended rather than eliminated, which is semantically different from filtering in WHERE.

USING and FOR KEY both replace the ON clause with a more specialized join specification. That leaves no ON-clause position for additional match-time predicates. For inner joins, this is often not a problem: the predicate can move to WHERE with identical semantics:

SQL

For outer joins, WHERE changes the meaning. Moving the predicate out of ON converts a LEFT JOIN into an effective inner join:

SQL

A derived table over active departments does not help for this example. The filter removes referenced rows on a column outside the key mapping, so the key join is rejected:

SQL

error

To address this, FOR KEY joins accept an optional join-local filter clause, using the existing production defined for aggregate functions (Subclause 10.9, "aggregate function"):

<filter clause> ::= FILTER <left paren> WHERE <search condition> <right paren>

When appended to a key join, the filter is ANDed with the equijoin condition derived from the referential constraint. The key join is still proven before the filter is applied; the filter narrows which rows qualify as a match at runtime:

SQL

This preserves all employees. Departments that exist but are not active are null-extended, exactly as with the traditional ON-clause form. The compile-time verification is unchanged; the filter adds a data condition without weakening it.

The join-local filter is not proof evidence for the key join. It cannot make a nullable FK column known not null, and it cannot repair a PK-side derived table that removed rows before the join. Those facts must be proven from the two operand surfaces before the join-local filter is considered.

The filter clause is permitted with all join types. The join-local filter is applied after the key join has been proven valid. For inner joins, it can discard rows in the same way as an ordinary WHERE predicate; for outer joins, it acts as match-time filtering. Each key join in a query can carry its own filter clause:

SQL

Each FILTER applies only to its own join.

7.3 Compile-Time Verification

The Specification is sound: the DBMS accepts a key join only when all three conditions of the Definition can be proven against the intermediate result at the point the join is evaluated. The DBMS rejects whenever it cannot discharge the proof obligation for at least one condition, whether by direct refutation, absent declaration, or opacity of a structural construct.

The next three subsections verify each condition in turn; the fourth covers cases where no proof can be formed.

7.3.1 Verifying Condition 1 (uniqueness)

Condition 1 requires the referenced multiset to have no duplicates. The check is against the intermediate result at the join point, not against the base table alone: a prior join may have duplicated the referenced rows, so a column that is unique in its base table need not remain unique downstream.

Suppose a query author wants per-order totals across the order's items and payments, combined in a single query. A direct ON-joins formulation is a fan trap, joining the referenced orders table to two of its referencing tables and silently multiplying rows:

SQL

setup

SQL

wrong

The query runs without complaint and produces a Cartesian product between order_items and payments within each order. An order with two items and two payments appears four times, and both aggregates are doubled. GROUP BY at the outer level cannot un-double the SUMs: the duplication has already inflated each group's contribution before the grouping happens.

This same aggregate fan-trap pitfall appears in a PostgreSQL mailing-list thread [^1].

Written with key joins, the same query is rejected at compile time:

SQL

error

Both key joins reach orders on the PK side. After the first key join, each orders row has been duplicated once per matching order_items row, so o.id is no longer unique in the intermediate result. The second key join is rejected against that intermediate result, not against the base table's own primary key.

The repair is to pre-aggregate each FK-side table first, then join the pre-aggregated results to orders:

SQL

Each subquery produces at most one row per order_id, so neither LEFT JOIN can duplicate the orders row. Each key join sees a PK side whose id remains unique, both joins are accepted, and the result has one row per order with expected, non-duplicated totals.

7.3.2 Verifying Condition 2 (containment)

Condition 2 requires each distinct all-non-null element of the referencing multiset to appear in the referenced multiset. The DBMS verifies this in two parts: the named columns must match a declared referential constraint between the two base tables; and any derivation on the PK side must provide row coverage for the PK values that the FK side may need. The second part is satisfied when the PK side applies no row-removing filter to the relevant key values, or when every direct key-equality filter on the PK side has a matching direct key-equality filter on the corresponding FK columns. Extra filters on the FK side are allowed; they only narrow the rows that need to be matched.

A schema designer working on a multi-tenant application declares two views, intending each query to see only the current tenant's rows. The schema uses composite primary keys with a leading tenant_id column matched against CURRENT_USER:

SQL

setup

A query author writes the key join through the two views:

SQL

error

tenant_customers restricts to the current tenant's customers; tenant_orders contains every tenant's orders. The non-null (tenant_id, customer_id) values in tenant_orders do not all appear among the (tenant_id, id) values in tenant_customers: an order whose tenant_id is not the current tenant's would have no match in tenant_customers, and an inner equijoin would silently drop it. The DBMS therefore cannot prove Condition 2.

The repair is to give tenant_orders the same filter:

SQL

setup

Both views now apply matching direct equality predicates on the tenant_id prefix of the foreign key. Each view is restricted to the current tenant's rows, and within that subset every order's (tenant_id, customer_id) has a matching tenant_customers value (tenant_id, id). The key join is accepted:

SQL

The example shows Condition 2 from two sides. The first configuration leaves containment unproven: the PK side is filtered on its key columns without matching direct key-equality evidence on the FK side, so containment is not guaranteed. The second supplies the missing evidence: when every row-removing key filter on the PK side is matched on the FK columns, containment holds within the filtered subset and the key join is accepted. A join-local FILTER (WHERE ...) clause would not repair the first query, because that filter is evaluated as part of the join and is not operand-surface proof evidence. A schema designer migrating an application to key joins surfaces inconsistencies of this kind at compile time, where an ON-join would have left them silent.

7.3.3 Verifying Condition 3 (null handling)

Condition 3 requires that either the referencing multiset contains no null values (3a), or the join type preserves the referencing table (3b). The DBMS discharges 3a from NOT NULL constraints on the referencing columns, accounting for null-extension introduced by upstream outer joins in the intermediate result. It discharges 3b by inspecting the join type. If neither holds, the key join is rejected.

For the examples below, assume orders has a nullable customer_id that references customers (id), and customers has a NOT NULL customer_type_id that references customer_types (id).

An inner key join from orders to customers fails Condition 3:

SQL

setup

SQL

error

3a fails because customer_id is nullable; 3b fails because the join type is inner. An order with NULL customer_id would be silently dropped by the equijoin. The repair is to use an outer join that preserves orders:

SQL

The LEFT JOIN preserves orders, satisfying 3b. Orders without a customer are returned with the customer columns null-extended.

The check applies at every key join in a chain, against the intermediate result at that point. A query that joins orders, customers, and customer_types in a chain can fail Condition 3 at the second join even when every column is properly declared:

SQL

error

This same nullable-side inner-join pitfall appears in a PostgreSQL mailing-list thread [^2].

Although customers.customer_type_id is declared NOT NULL in the base table, the LEFT JOIN to customers null-extends it for orders without a matching customer. The second key join sees a referencing column that is no longer known not null in the intermediate result. 3a fails again because of the upstream null-extension; 3b fails because the second join is inner. The simplest repair is to make the second join outer too:

SQL

An alternative is to group customers and customer_types into a parenthesized join first, so that the LEFT JOIN to orders null-extends a fully resolved intermediate; the inner key join can then remain inner. Both repairs satisfy Condition 3 at every join.

The propagation of not-nullness through chains, including its loss when null-extension can introduce NULLs, is treated more fully in Section 7.4.

7.3.4 When the DBMS Cannot Form a Proof

Some queries fail verification not because they violate a condition outright, but because the DBMS cannot form a proof at all. A key join is accepted only when all three conditions can be proven, and these queries leave at least one condition unproven. Three causes arise: the named columns may not match any declared referential constraint, the columns may not trace to a single base table, or a structural construct may make the proof obligation untrackable. Column traceability and structural opacity are handled by the propagation rules of Section 7.4. The example below shows the first cause: a referential constraint exists between the base tables, but the named columns do not match it.

Consider a hotel booking system:

SQL

setup

The referential constraint from reservations to rooms is composite: (hotel_id, room_number). A developer who overlooks this writes:

SQL

During development and testing, there is only one hotel. The query returns one row and appears correct. In production, multiple hotels share room numbers, and the query silently returns duplicates:

 room_type | rate   | guest_name
-----------+--------+-----------
 Single    | 100.00 | John Doe
 Suite     | 250.00 | John Doe
(2 rows)

Rewriting the join as a key join exposes the bug at compile time:

SQL

error

The named columns (room_number) alone do not match the declared referential constraint between reservations and rooms. The DBMS has no declaration to start validation, and rejects the key join. The error leads the developer to discover the composite key and write:

SQL

7.4 Propagation Through Derived Tables

A query author should not need to know whether they are joining a base table or a view. If a schema designer creates a view over a base table that participates in a referential constraint, the view should be usable in a key join, provided it preserves the properties that make the join valid.

This reflects a separation of duties. The schema designer is responsible for constructing views that preserve the required referential properties. The query author uses them without needing to inspect how they are built.

7.4.1 Validity in Derived Tables

For a key join through a derived table, the columns named on each side must trace to base table columns, and a matching referential constraint must exist between those base tables. On the PK side, the derived table must expose a unique PK value for each all-non-null FK value that can reach the key join.

A key join is an equijoin and NULL never equals anything. The PK-side derived table therefore does not have to reproduce base table row multiplicities. Instead, it supplies Condition 2's value-containment evidence. Uniqueness separately ensures that each matching PK value appears at most once.

This requires that each column named in the key join traces to exactly one base table column ("column traceability" via Subclause 7.16, "query specification"). Expressions, CASE, COALESCE, and casts break traceability. All columns on each side must trace through the same qualifying table to the same base table instance.

Given traceability, the PK-side proof decomposes into two properties that the DBMS can verify at compile time and propagate through derived tables:

Row coverage (nothing needed is missing). The PK side must expose every PK value that an all-non-null FK value can require. This is a value-containment proof, not a statement about base table multiplicity. A WHERE clause, HAVING clause, TABLESAMPLE, OFFSET, FETCH FIRST, or set operation can remove rows and break row coverage. Recursive CTEs, LATERAL derived tables, set functions in the select list, subtable/supertable references, and joins that do not use FOR KEY are conservative proof barriers. A FOR KEY inner join with known not null FK columns preserves row coverage for the FK side: the constraint guarantees a match for every row, so no rows are lost from that side. Row coverage can also be conditional on matched filters. A PK-side derived table that applies a WHERE filter on direct key columns exposes conditional row coverage for that filter, provided the FK-side derived table applies every corresponding direct key-equality filter on the FK columns. Additional conjuncts on the FK side are allowed; they only narrow the FK side further. A join-local FILTER (WHERE ...) clause is not such evidence, because it applies to the join result rather than to the operand surface. This is illustrated in Section 7.3.2. Projection and FOR KEY outer joins that preserve the relevant side preserve row coverage. GROUP BY and DISTINCT preserve an existing row-coverage fact when the row-collapse key covers the same key positions under the same equality identity. Extra grouping or distinct columns do not invalidate the proof. ROLLUP and CUBE are analyzed through their GROUPING SETS transformations. For grouping sets, at least one resulting grouping set must contain all key positions. Set operations, OFFSET, and FETCH FIRST are not used as row-coverage proof.
Uniqueness (nothing duplicated). The PK columns must still be unique in the derived table. This is tracked with BUNC-sets, described below. Joins can break uniqueness by duplicating rows, but GROUP BY on the PK columns can restore it. A key join can also preserve a referenced-side uniqueness fact when the referencing side is proven unique on the corresponding FK columns; otherwise possible fanout inactivates that uniqueness fact. For uniqueness, ROLLUP, CUBE, and GROUPING SETS are treated conservatively because the same values can appear at multiple aggregation levels.

Null-extension on the PK side is inert. An outer join can introduce rows where the PK columns are NULL, but NULL never equals anything in an equijoin, so those rows cannot match an FK row and cannot break either property above. Instead of exposing a positive null-extension fact, the DBMS tracks not-nullness compositionally through derived tables; a null-extending join removes not-null evidence for the affected columns. That evidence is consulted on the FK side, not the PK side.

On the FK side, column traceability is required. The detailed treatment of nullable FK columns appeared in Section 7.3 under Condition 3; the same rules apply when the FK side is a derived table. Filtering, grouping, and deduplication on the FK side do not invalidate the referential relationship; they merely reduce the set of rows to be matched.

7.4.2 BUNC-Sets

A key join needs the referenced columns to identify at most one matching referenced row for each all-non-null referencing key. It does not need the referenced columns themselves to be known not null, because NULL on the referenced side cannot match an equijoin comparison.

This is why the proposal introduces BUNC-sets: base-table unique nullable constraint sets. A BUNC-set records uniqueness of the non-null values that can participate in a key join. A BUC-set is the stronger case: a BUNC-set whose columns are also known not null.

This separation keeps the two proof obligations distinct. BUNC-sets supply the PK-side uniqueness proof. Not-nullness is tracked separately, and matters on the FK side when an inner key join would otherwise drop rows whose referencing key is NULL.

7.4.3 How Properties Compose Through Layers

These properties are defined recursively. If view V1 wraps a base table and preserves all required properties, and view V2 wraps V1 and also preserves them, then V2 is valid for use in a key join. The "underlying column" concept (Subclause 7.16, "query specification") and "counterpart" concept (Subclause 4.27.2, "General rules and definitions") already chain through arbitrary layers of derivation. Row coverage, BUNC-sets, and propagated not-nullness are new, with null-extending joins removing not-null evidence where they can introduce NULLs, but they are defined to compose in the same way.

A schema designer can therefore build a stack of views, each adding projections, joins, or groupings, and as long as each layer preserves the required properties, the top-level view is usable in a key join. The query author sees only the top-level view and treats it as a base table.

7.4.4 A View That Works

The examples that follow use the employees/departments schema introduced in Section 7.2. A schema designer creates a view for use by department managers. The view projects columns but does not filter rows:

SQL

setup

A query author can use this view in a key join exactly as if it were the base table:

SQL

This works because department_directory supplies the PK-side proof facts required by this key join: row coverage and uniqueness.

7.4.5 A CTE That Works

Key joins work through common table expressions in the same way. A query author can define a derived table inline and use it in a key join:

SQL

The same rules apply: dept_id in dept_summary traces back to departments.dept_id via its underlying column, and the DBMS verifies the referential constraint against the base table. The query author could replace the CTE with the department_directory view from the previous example, or vice versa, with no change in validity.

Recursive common table expressions (WITH RECURSIVE) are not supported in key joins. The standard already treats recursive CTEs as opaque for BPK/BUC/BUNC-sets and known functional dependencies (Subclause 4.27.16, "Known functional dependencies in a query expression"), making their properties implementation-defined. Key joins follow the same approach.

7.4.6 LATERAL Derived Tables

A LATERAL derived table is a table subquery that can reference columns from preceding tables in the same FROM clause. Key-join proof facts are not exposed through LATERAL derived tables. Their cardinality and containment properties may depend on the current row of an outer table, so a single operand-surface fact is not enough to prove a key join through them.

7.4.7 Breaking Uniqueness

A view that joins the PK table to another table can duplicate PK rows. Suppose we add a table for projects that are assigned to a department:

SQL

setup

A reporting view joins departments to their assigned projects:

SQL

setup

The view definition is accepted. The key join inside the view is valid at its own join point because assigned_projects.dept_id is a not nullable foreign key. But the output does not have a uniqueness fact for dept_id: a department with more than one assigned project appears once per project.

The employees table also has a foreign key to departments.dept_id. A query that tries to use department_projects as the referenced side of that key join is rejected, since the referenced columns are not unique:

SQL

error

7.4.8 Restoring Uniqueness with GROUP BY

The repair is to group the referencing table before joining it to departments. The grouped derived table has at most one row per dept_id, so the subsequent key join cannot duplicate department rows. The LEFT JOIN preserves departments with zero assigned projects, maintaining department row coverage:

SQL

setup

The GROUP BY collapses the duplicated department rows back to one row per dept_id. Because department_projects preserved all department rows, the grouped view also preserves row coverage for departments.dept_id. The grouped dept_id column is therefore usable as the referenced side of another key join:

SQL

Had the aggregate view used a join that did not preserve departments with zero projects, row coverage would be lost. The DBMS would reject the later key join even though uniqueness was satisfied.

7.4.9 Null-extension

An outer join can null-extend an FK column: where the FK column was NOT NULL in the base table, the derived table now has rows where the column is NULL. These null-extended rows cannot match anything in a downstream equijoin, so an inner key join on a null-extended FK column silently drops them and breaks row-preservation of the FK-side derived table.

Consider a schema with regions, stores, and sales:

SQL

setup

A reporting view uses LEFT JOINs to list every region together with its sales, keeping regions with no stores and stores with no sales:

SQL

setup

region_sales.store_id is null-extended for regions with no stores and stores with no sales, even though sales.store_id is declared NOT NULL in the base table.

A downstream view cannot use region_sales on the FK side of an inner key join to stores:

SQL

error

The DBMS rejects the key join at view definition time. The null-extended rows would be dropped by the equijoin, silently omitting regions with no stores and stores with no sales, and the downstream query author would see inflated or deflated totals with no error.

The repair is to preserve region_sales across the downstream key join with an outer join type:

SQL

setup

7.4.10 Nullable UNIQUE Constraints

A referential constraint can reference a UNIQUE constraint whose columns are nullable. Key joins support this. Consider a procurement system where suppliers are registered before their DUNS number (a standard business identifier) is verified:

SQL

setup

SQL

Suppliers with NULL duns_number are unreachable via the equijoin (NULL is not equal to anything), so they do not affect the result. The UNIQUE constraint guarantees that each non-null DUNS number identifies exactly one supplier.

A view that projects columns without filtering also works:

SQL

setup

The view supplies the PK-side proof facts required by this key join: row coverage and uniqueness. The base table NULLs remain in the view.

Null-extension is a distinct concern from base table nullability. NULLs from the base table are real data (the DUNS number has not been assigned yet); NULLs from outer join null-extension are artifacts that do not represent any row. Both kinds of NULL are inert on the PK side: the equijoin cannot match NULL, so neither a base-table NULL nor a null-extended NULL can participate in a key join result.

The following key join is accepted even though supplier_by_region null-extends duns_number for regions with no suppliers:

SQL

setup

Rows of supplier_by_region with NULL duns_number, whether they came from a base-table NULL or from null-extension, cannot match any purchase_orders.duns_number, so they do not affect the result.

7.4.11 Inner Key Joins with Known Not Nullable Referencing Columns

An inner join generally breaks row-preservation: unmatched rows are eliminated. But an inner key join where every FK column is known not nullable is lossless on the referencing side. The constraint guarantees a match for every row, so no rows can be lost.

Using the regions, stores, and sales tables from the null-extension example above, a view that enriches sales with store and region information via two inner key joins preserves row coverage for sales.id:

SQL

setup

Each inner key join preserves the relevant referencing rows independently: the FK columns are known not null, and the referential constraint guarantees a match, so no rows are lost. This also preserves the row-coverage fact. The first join preserves row coverage for sales.id; the second sees a table that still covers sales.id and preserves that fact again. The view is valid as the PK side of a downstream key join.

If any FK column in the chain were nullable, that inner join could eliminate referencing rows, and the row-coverage fact would be lost from that point onward.

7.4.12 Set Operations

Set operations cannot appear on either side of a key join. A set operation produces result columns whose underlying columns come from both operands (Subclause 7.17, "query expression", Syntax Rule 23d). The key join requirement that each column traces to exactly one underlying base table column rejects this before any other check applies.

Key-join proof facts are not exposed through set operations. A future extension could add specific rules for particular set operations, but the rules here treat them as conservative barriers for key-join proof.

7.4.13 Filtered Views and the PK Side

A key join through a filtered view on the PK side is rejected unless the FK side supplies matching direct key-equality filter evidence; see Section 7.3.2. This differs from a query that filters the base table with a WHERE clause after the join:

For the examples below, assume orders has a NOT NULL customer_id that references customers (id).

SQL

setup

SQL

error

Both queries, if accepted, would produce the same result rows, but would differ from the key join's perspective. In the first query, the join is against the full customers table: every order is guaranteed to find its matching customer, and the WHERE clause filters the result after the join has been evaluated. In the second query, active_customers is the PK side, and it is missing inactive customers. An order that references an inactive customer will not find a match.

This asymmetry is inherent in row coverage. A key join guarantees that the join will behave as a proper foreign key traversal, producing exactly one match for each FK row. A filtered PK side breaks this guarantee unless the missing PK values are matched by filters that remove the corresponding FK rows before the join.

7.4.14 Safe View Composition

A schema designer can build layered helper views that pre-join a central table to its related tables. Query authors then write against these views instead of repeating the joins in every query.

Without key joins, this layering is fragile. A schema designer who builds a view on top of another view must manually verify that the underlying view preserves the right properties. If the underlying view changes, a column that behaved like a primary key may start producing duplicates or losing rows, and every dependent view silently returns wrong results. With key joins, the DBMS verifies these properties automatically at view definition time and blocks schema changes that would break them.

The example uses an order-processing schema with a central orders table referencing customers, order_statuses, shipping_methods, payment_methods, currencies, order_priorities, tax_codes, and discount_types; customers references countries and customer_types; and order_items references orders and products. A schema designer creates three layered views. The first enriches customers with their country and type:

SQL

setup

DROP TABLE IF EXISTS countries, customer_types, addresses,
    customers, order_statuses, shipping_methods, payment_methods, currencies,
    order_priorities, tax_codes, discount_types, products, orders,
    order_items, payments CASCADE;

INSERT INTO countries VALUES (1, 'Sweden'), (2, 'Germany');
INSERT INTO customer_types VALUES (1, 'Retail'), (2, 'Wholesale');
INSERT INTO addresses VALUES (1, '123 Main St', 'New York'), (2, '456 Oak Ave', 'London'), (3, '789 Broadway', 'New York'), (4, '10 Downing St', 'London');
INSERT INTO customers VALUES
    (1, 'Acme Corp', 1, 2, 1, TRUE),
    (2, 'Jane Doe', 2, 1, 2, FALSE),
    (3, 'Widgets Ltd', 1, 1, 3, TRUE),
    (4, 'Bob Smith', 2, 1, 4, TRUE);
INSERT INTO order_statuses VALUES (1, 'Shipped'), (2, 'Pending');
INSERT INTO shipping_methods VALUES (1, 'Express', TRUE), (2, 'Standard', FALSE);
INSERT INTO payment_methods VALUES (1, 'Credit Card'), (2, 'Invoice');
INSERT INTO currencies VALUES (1, 'SEK'), (2, 'EUR');
INSERT INTO order_priorities VALUES (1, 'High'), (2, 'Normal');
INSERT INTO tax_codes VALUES (1, 'VAT25'), (2, 'VAT12');
INSERT INTO discount_types VALUES (1, 'Loyalty');
INSERT INTO products VALUES (1, 'Widget'), (2, 'Gadget');
INSERT INTO orders (id, order_date, amount, customer_id, order_status_id, shipping_method_id, payment_method_id, currency_id, order_priority_id, tax_code_id, discount_type_id) VALUES
    (100, '2025-01-15', 250.00, 1, 1, 1, 1, 1, 1, 1, 1),
    (101, '2025-02-20', 89.99, 2, 2, 2, 2, 2, 2, 2, NULL),
    (102, '2025-03-10', 175.50, 1, 1, 1, 2, 1, 1, 1, NULL);
INSERT INTO order_items VALUES (1, 100, 1, 29.99), (2, 100, 2, 49.99), (3, 101, 1, 29.99);
INSERT INTO payments VALUES (1, 100, 79.98), (2, 101, 29.99);

CREATE VIEW customer_details AS
    SELECT c.id AS customer_id, c.name, co.country_name,
           ct.type_name AS customer_type
    FROM customers          AS c
    JOIN countries          AS co FOR KEY (id) <- c (country_id)
    JOIN customer_types     AS ct FOR KEY (id) <- c (customer_type_id);

The second enriches orders with customer details and lookup tables:

SQL

setup

The key join on customer_details guarantees that each order's customer_id matches exactly one row in customer_details. No order rows are lost and no order rows are duplicated, so order_details contains exactly one row per order. A schema designer can safely build further views on top of it (e.g., an orders_by_country summary) with the same guarantee.

The third view enriches order items with order details and product information:

SQL

setup

A query author who needs customer information writes against customer_details. One who needs order-level data uses order_details. One who needs line-item data uses order_item_details. If a schema change would break a property that a downstream view depends on, the DBMS rejects the change and alerts the schema designer to the problem.

7.5 Constraint Requirements

A key join requires each proof source to be ENFORCED and NOT DEFERRABLE. This applies to the referential constraint, to the primary key or unique constraint used to prove PK-side uniqueness, and to any NOT NULL constraint used to prove that an inner key join cannot drop rows with null referencing key values.

The compile-time guarantees of a key join depend on these constraints actually being maintained. A NOT ENFORCED constraint provides no data integrity guarantee; the data may violate the constraint at any time. A DEFERRABLE constraint permits checking to be postponed within a transaction. A key join evaluated while such a postponement is in effect could produce wrong results even though the constraint will eventually be checked.

If a key join depends on a constraint that is NOT ENFORCED or DEFERRABLE, the query is rejected at compile time. If the query compiles, the join is valid under the declarations used by the proof.

7.5.1 Exclusion of Temporal Referential Constraints

A key join does not reference a referential constraint that includes a referencing period specification (i.e., a temporal foreign key as defined in Feature T181, "Application-time period tables"). Temporal referential constraints have range-containment semantics: the FK row's application-time period must be a subset of the union of matching PK rows' periods. A single FK row can match multiple PK rows whose periods collectively cover it, breaking the 1:1 cardinality guarantee. Supporting temporal constraints would require purpose-built syntax beyond the scope of this proposal; the exclusion is forward-compatible with a future extension.

7.6 Schema Changes and Dependent Objects

A stored definition that contains a key join is dependent on every schema object used to establish the proof. This includes the referential constraint, supporting primary key or unique constraints, consumed NOT NULL constraints, view definitions that expose proof facts, and routines used in proof predicates. This uses the same dependency mechanisms already defined in the standard, including the rules for dropping constraints (Subclause 11.26, "drop table constraint definition").

Stored definitions include views and SQL functions with stored SQL bodies. A view that merely exposes traceable columns need not record those proof-source dependencies until a stored key join consumes that view as a proof surface.

The consequences are:

RESTRICT: Dropping a proof source that a stored key join depends on is prohibited. The DROP fails.
CASCADE: Dropping the proof source causes all dependent stored definitions to be dropped or otherwise invalidated, recursively.

If a constraint that a stored key join depends on is altered so that it is NOT ENFORCED or becomes DEFERRABLE, the alteration must be blocked unless dependent objects are handled by the applicable cascading operation. The constraint no longer satisfies the conditions under which the key join was defined.

The VIEW_CONSTRAINT_USAGE view added by the Schemata part of this proposal makes the table-constraint subset of these dependencies visible to schema designers. For a view containing a key join, this includes the referential constraint, the supporting primary key or unique constraint, and any NOT NULL constraints consumed as proof sources. Routine and user-defined ordering dependencies remain handled by the existing dependency mechanisms for those objects.

Prepared statements containing a key join are compiled against the current schema. If any proof source used by the key join is dropped or altered before the statement is executed, the statement is re-prepared or fails, following the existing rules for prepared statement invalidation.

7.6.1 Real-World Consequences

The preceding examples were simplified to illustrate individual properties. The scenarios that follow show how the same mechanisms prevent silent data corruption in more realistic settings. Both use the following schema, where each customer has one address:

SQL

setup

A reporting view joins customers to their addresses and is used as the PK side of a key join in a downstream view:

SQL

setup

This works because customer_details supplies the PK-side proof facts required for the customer_id key: row coverage and uniqueness. The next two sub-subsections show how the dependency mechanism reacts when each fact is broken in turn: the first by a schema migration that breaks uniqueness, the second by a data maintenance change that breaks row coverage.

7.6.2 Schema Migration Breaking Uniqueness

A later migration adds support for multiple addresses per customer. The schema designer moves the foreign key from customers to addresses and drops the existing views (via CASCADE) to redefine them:

SQL

setup

The recreated customer_details view is accepted. The LEFT JOIN preserves customer row coverage, but customer_id is no longer unique in the view output: a customer with multiple addresses appears once per address. When the schema designer attempts to recreate the downstream revenue view, the DBMS rejects it:

SQL

error

Without key joins, the join would silently count each order once per address, inflating revenue figures, and the inflated totals would go undetected until someone noticed the numbers were wrong.

7.6.3 Data Maintenance Breaking Row-Preservation

Privacy regulations require anonymization of inactive users. The schema designer modifies the customers table to allow NULL addresses:

SQL

setup

SQL

error

The customer_details view uses an inner key join from customers to addresses, with address_id as the FK column. That stored key-join proof depends directly on the NOT NULL constraint on address_id: without it, the inner join could drop customers whose address is unknown. The DROP NOT NULL command is rejected while customer_details depends on that proof. The schema designer must explicitly drop or replace the view before allowing nullable addresses.

The schema designer then recreates customer_details with a LEFT JOIN, preserving anonymized customers with NULL address fields:

SQL

setup

Without key joins, the schema designer could drop the NOT NULL constraint, run a nightly anonymization job that sets address_id to NULL, and never notice that the inner join in the original customer_details silently excludes those customers. A downstream revenue_by_city report over customer_details would then lose orders for anonymized customers, producing deflated totals with no error.

7.6.4 Error Messages and Privilege Scoping

When a key join fails because the required referential constraint does not exist, or because a derived table does not satisfy the required properties, the DBMS raises an error condition. That error condition may populate diagnostic area fields such as TABLE_NAME, COLUMN_NAME, and CONSTRAINT_NAME with identifiers of the underlying base tables and constraints.

If the query author does not hold privileges on those underlying objects, revealing their names would be an information leak. The standard already addresses this: the diagnostic area suppression rules in Subclause 23.1, "get diagnostics statement" require that TABLE_NAME, COLUMN_NAME, and constraint identifiers be replaced with zero-length character strings when no relevant privilege is held by the current authorization identifiers. Key join errors are subject to these existing rules.

In practice, a query author with privileges on the underlying tables might see an error identifying the missing constraint. A query author who accesses the same tables only through views would see a generic error such as "key join cannot be satisfied," with no object names disclosed.

7.7 Soundness of the Specification

The Specification is sound because it is accept-by-proof: the DBMS accepts a key join only when recognized proof rules establish all three conditions of the Definition against the intermediate result at the point the join is evaluated. Unknown or opaque constructs do not need special cases; by default they expose no row-coverage, uniqueness, foreign-key containment, or not-nullness facts that can discharge a key-join obligation, so no proof is formed. The propagation rules of Section 7.4 carry established facts through derived tables compositionally, so the same argument applies whether the operands are base tables or derived tables.

The Specification is not complete. Some queries that would in fact satisfy the Definition for a given database state are still rejected, because the proof obligations are discharged from schema declarations and query structure alone, not from inspection of the data. This is intentional: a sound-but-incomplete validator rejects some queries for which a stronger proof might exist, but never accepts a key join without the proof facts required by the Definition.

A DBMS may also require the compared key positions to have the same declared type, collation, and equality semantics for proof purposes. The mathematical definition only requires an equijoin comparison, but the compile-time proof must be able to determine from declarations and query structure that the relevant uniqueness and row-coverage facts, and the join comparisons, use the same equality semantics. If that cannot be established, the key join is rejected.

8 Design Rationale

8.1 Compile-Time Verification

A key join could be verified either at compile time (during query preparation) or at runtime (during query execution). This proposal requires compile-time verification.

8.1.1 The Query Author's Contract

A key join is a declaration: "this join follows a foreign key." Compile-time verification provides a hard guarantee: if the query prepares successfully, the referential constraint exists and the join is semantically valid. This does not depend on the contents of the tables.

8.1.2 Runtime Checks Are Insufficient

Runtime verification has several problems.

Errors depend on data, not on validity. A query that joins on the wrong columns may produce correct-looking results for months, as long as the data does not expose the mismatch. The hotel scenario in Section 7.3.4 is exactly this case: the query worked in testing because the test dataset had only one hotel. A runtime check would only catch the error when problematic data actually flows through the join, which may be long after deployment.

Errors are intermittent and hard to reproduce. If a runtime check fires only when certain rows are present, the error may appear in production but not in development or staging environments.

Failures may arrive only after substantial work. For long-running analytical queries, a runtime check may not encounter the offending row until the query has scanned large tables, consumed significant resources, or produced part of a result. The failure is then late and operationally expensive. Compile-time verification rejects the invalid key join before execution begins.

Deferred checks cannot prevent wrong results. By the time a runtime check fires, the query is already executing. Wrong results may have already been returned to the application, written to other tables, or used in downstream computations.

Runtime checks cannot validate stored definitions. A prepared statement or view definition is compiled once and executed many times, potentially against changing schemas. Compile-time verification catches problems when the statement is prepared or the view is created.

8.1.3 Key Joins Aid Testing

Constructing test data that exercises every join path in a complex query is difficult. The hotel scenario is instructive: the join on room_number alone is wrong, but a test dataset needs multiple hotels with overlapping room numbers to reveal the bug. A tester who does not independently verify the foreign key structure will not think to construct that specific dataset. The bug hides not because the tester is unskilled, but because the test data must be specifically crafted to expose a structural error that has nothing to do with the query's business logic.

Key joins eliminate this class of structural errors at compile time. This frees the tester to focus on verifying that the query produces correct results for the intended business logic, with confidence that the joins are structurally sound.

8.1.4 Compile-Time Verification Enables Composition

The derived table support described earlier relies on compile-time analysis of view definitions to determine whether referential properties are preserved through layers of derivation. This analysis examines the structure of the query, not the data. A runtime approach could not provide these guarantees without executing the view's query first.

The separation of duties between schema designer and query author also depends on compile-time verification. The schema designer constructs views that preserve the required referential properties; the query author uses them. This contract works only if the DBMS can verify the view's properties when the query is compiled, not when it happens to be run with particular data.

8.2 Why Arrow Syntax

The authors originally considered TO and FROM as keywords for indicating direction, but FROM in a join clause causes confusion with the FROM clause itself. Testing with focus groups confirmed that arrows are clearer and less ambiguous. The choice also resembles SQL:2023 SQL/PGQ, whose graph pattern syntax likewise uses ASCII arrows [^3].

8.3 Why Arrow Direction Is Fixed by the FK, Not Chosen by the Author

The fixed arrow direction realises the local-determinability part of the Intention. The arrow direction in a key join encodes the side of the newly introduced table. <- means the newly introduced table is the PK side; -> means it is the FK side. The reader tells the side from the arrow alone, without reading the aliases on either side.

An alternative design would let the query author place either column list first and flip the arrow to match, so the same key join could be written either way. Under that alternative, the arrow no longer fixes the side of the newly introduced table.

The fixed convention keeps the all-<- scan described in Section 8.9 a single pass. A reader can tell at a glance whether every join uses <-, and therefore whether the first table in the FROM clause is preserved end-to-end. Under the alternative, the reader would have to parse each key join individually, reading the arrow and checking which alias sits on which side, to decide whether that key join preserves or duplicates the running row set.

8.4 Why Column Lists Instead of Constraint Names

An alternative design would name the referential constraint directly (e.g., JOIN ... FOR KEY fk_books_author). This has two problems. First, when the same base table appears multiple times in a query, the constraint name alone cannot distinguish which instance is intended. Second, referential constraints are defined between base tables; derived tables do not have constraint names. Since key joins are designed to work through views and CTEs, the syntax must use column lists, which exist at every level of derivation. Constraint names do not.

8.5 Why the Joined Table's Column List Takes No Alias

In a key join, the column list immediately after FOR KEY is scoped by the grammar to the newly introduced table. Naming the newly introduced table again before that column list would normally add no information.

The column list after the arrow is different. It refers into the already formed left operand of the join, which may contain many table references. The syntax therefore names the relation whose columns are being used. That name selects the operand surface against which the key-join proof is formed.

The remaining edge case is when the newly introduced table is a parenthesized join. Even there, an alias before the first column list would only be needed if several tables in it expose columns with the same name. We do not expect that case to be common enough to justify adding the alias position to ordinary key joins today. The syntax leaves room for it: a future revision could add an optional alias before the first column list if usage shows that the edge case matters in practice.

8.6 Why Explicit Columns Instead of Inference

An inference-based design, where the DBMS determines the join columns automatically from the constraint, has been tried in practice. SQL Anywhere (Sybase, in releases since the late 1990s) provides a KEY JOIN syntax that infers the foreign key relationship without requiring the query author to name any columns:

SELECT rl.id, rl.amount * c.rate AS usd_amount, r.receipt_date
FROM receipt_lines AS rl
KEY JOIN receipts AS r
KEY JOIN currencies AS c;

This works until the schema evolves. If a currency column with a foreign key to currencies is later added to receipts, the KEY JOIN with currencies silently re-resolves to the new foreign key, joining through receipts instead of through receipt_lines. The query returns different results with no error.

Explicit column lists prevent this: the query author declares which columns the join follows, and schema changes that invalidate the declaration produce an error rather than silently changing the query's meaning.

8.7 Indirect Foreign Keys Are Not Supported

A key join follows a single referential constraint. It does not support transitive ("indirect") foreign key paths, even when the intermediate column is the same column. For example, time_cards.person_id can reference employees.person_id. The same employees.person_id column can also be a foreign key to people.person_id. A query author cannot key join time_cards directly to people. The FK -> PK = FK -> PK path must still be explicit:

SQL

setup

SQL

An indirect key join would require the DBMS to compose referential constraints across intermediate tables, introducing ambiguity (multiple paths may exist), hiding the intermediate table from the query, and preventing independent verification of each join. Requiring explicit joins through each table keeps the query's intent visible.

8.8 Benefits

The preceding subsections have justified individual design choices. The benefits below summarise what those choices buy in aggregate.

Queries document their intent. A key join states which referential constraint the join follows. The direction of the arrow shows which is the FK table and which is the PK table.
Schema changes surface as errors, not as wrong results. A stored definition that contains a key join is dependent on the proof sources used to validate that key join, including the referential constraint, supporting uniqueness constraints, consumed NOT NULL constraints, proof-exposing view definitions, and routines used in proof predicates. Stored definitions include views and SQL functions with stored SQL bodies. Dropping any of these with RESTRICT is blocked; dropping with CASCADE drops or invalidates the dependent stored definitions. This uses the same dependency mechanism already defined in the standard.
Views compose safely. A schema designer can build layered views that pre-join related tables. Key joins verify the required properties at each layer, and dependency tracking prevents schema changes from silently breaking downstream views (see Section 7.4.14).
Missing join columns are caught. Composite foreign keys are a common source of bugs: developers forget one column and the join looks correct in simple test data. A key join catches this immediately, as shown in the hotel booking example in Section 7.3.4.
Fan traps are caught at compile time. Joining a referenced table to two of its referencing tables silently multiplies rows and inflates aggregates in traditional equijoins. A key join rejects this pattern at compile time (Section 7.3.1).
No same-name requirement. Unlike USING, key joins work when FK and PK columns have different names (e.g., orders.customer_id referencing customers.id). And unlike NATURAL or USING, key joins are unambiguous when multiple tables in the query share column names.
Immune to column additions. A USING clause can break when an unrelated schema change adds a column with the same name to another table in the join. Key joins name the specific table reference for each column, so they are immune to this problem.

8.9 Readability at Scale

The local-readability part of the Intention shows up most concretely in larger queries. Consider a typical order-processing query that joins a central orders table to ten related tables. The query author wrote the joins in the order they came to mind, arbitrarily starting the FROM clause with customers:

SQL

setup

DROP TABLE IF EXISTS countries, customer_types, addresses, customers, order_statuses, shipping_methods, payment_methods, currencies, order_priorities, tax_codes, discount_types, products, orders, order_items, payments CASCADE;

CREATE TABLE countries (id INTEGER PRIMARY KEY, country_name VARCHAR(100));
CREATE TABLE customer_types (id INTEGER PRIMARY KEY, type_name VARCHAR(50));
CREATE TABLE addresses (id INTEGER PRIMARY KEY, street CHARACTER VARYING(200), city CHARACTER VARYING(100));
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    country_id INTEGER NOT NULL REFERENCES countries (id),
    customer_type_id INTEGER NOT NULL REFERENCES customer_types (id),
    address_id INTEGER NOT NULL REFERENCES addresses (id),
    active BOOLEAN DEFAULT TRUE NOT NULL
);
CREATE TABLE order_statuses (id INTEGER PRIMARY KEY, status_name VARCHAR(50));
CREATE TABLE shipping_methods (id INTEGER PRIMARY KEY, method_name VARCHAR(50), available BOOLEAN DEFAULT TRUE NOT NULL);
CREATE TABLE payment_methods (id INTEGER PRIMARY KEY, method_name VARCHAR(50));
CREATE TABLE currencies (id INTEGER PRIMARY KEY, currency_code CHAR(3));
CREATE TABLE order_priorities (id INTEGER PRIMARY KEY, priority_name VARCHAR(50));
CREATE TABLE tax_codes (id INTEGER PRIMARY KEY, code_name VARCHAR(50));
CREATE TABLE discount_types (id INTEGER PRIMARY KEY, type_name VARCHAR(50));
CREATE TABLE products (id INTEGER PRIMARY KEY, name CHARACTER VARYING(100));
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    order_date DATE,
    amount DECIMAL(10,2),
    customer_id INTEGER NOT NULL REFERENCES customers (id),
    order_status_id INTEGER NOT NULL REFERENCES order_statuses (id),
    shipping_method_id INTEGER NOT NULL REFERENCES shipping_methods (id),
    payment_method_id INTEGER NOT NULL REFERENCES payment_methods (id),
    currency_id INTEGER NOT NULL REFERENCES currencies (id),
    order_priority_id INTEGER NOT NULL REFERENCES order_priorities (id),
    tax_code_id INTEGER NOT NULL REFERENCES tax_codes (id),
    discount_type_id INTEGER REFERENCES discount_types (id)
);
CREATE TABLE order_items (
    id INTEGER PRIMARY KEY,
    order_id INTEGER NOT NULL REFERENCES orders (id),
    product_id INTEGER NOT NULL REFERENCES products (id),
    amount DECIMAL(10,2)
);
CREATE TABLE payments (
    id INTEGER PRIMARY KEY,
    order_id INTEGER NOT NULL REFERENCES orders (id),
    amount DECIMAL(10,2)
);
INSERT INTO countries VALUES (1, 'Sweden'), (2, 'Germany');
INSERT INTO customer_types VALUES (1, 'Retail'), (2, 'Wholesale');
INSERT INTO addresses VALUES (1, '123 Main St', 'New York'), (2, '456 Oak Ave', 'London'), (3, '789 Broadway', 'New York'), (4, '10 Downing St', 'London');
INSERT INTO customers VALUES
    (1, 'Acme Corp', 1, 2, 1, TRUE),
    (2, 'Jane Doe', 2, 1, 2, FALSE),
    (3, 'Widgets Ltd', 1, 1, 3, TRUE),
    (4, 'Bob Smith', 2, 1, 4, TRUE);
INSERT INTO order_statuses VALUES (1, 'Shipped'), (2, 'Pending');
INSERT INTO shipping_methods VALUES (1, 'Express', TRUE), (2, 'Standard', FALSE);
INSERT INTO payment_methods VALUES (1, 'Credit Card'), (2, 'Invoice');
INSERT INTO currencies VALUES (1, 'SEK'), (2, 'EUR');
INSERT INTO order_priorities VALUES (1, 'High'), (2, 'Normal');
INSERT INTO tax_codes VALUES (1, 'VAT25'), (2, 'VAT12');
INSERT INTO discount_types VALUES (1, 'Loyalty');
INSERT INTO products VALUES (1, 'Widget'), (2, 'Gadget');
INSERT INTO orders (id, order_date, amount, customer_id, order_status_id, shipping_method_id, payment_method_id, currency_id, order_priority_id, tax_code_id, discount_type_id) VALUES
    (100, '2025-01-15', 250.00, 1, 1, 1, 1, 1, 1, 1, 1),
    (101, '2025-02-20', 89.99, 2, 2, 2, 2, 2, 2, 2, NULL),
    (102, '2025-03-10', 175.50, 1, 1, 1, 2, 1, 1, 1, NULL);
INSERT INTO order_items VALUES (1, 100, 1, 29.99), (2, 100, 2, 49.99), (3, 101, 1, 29.99);
INSERT INTO payments VALUES (1, 100, 79.98), (2, 101, 29.99);

SQL

Every ON clause looks the same: two columns compared for equality. Nothing in the syntax reveals which table references which, or whether the join follows a foreign key at all. The reader must cross-reference the schema to verify each condition.

The first refactoring step is mechanical: rewrite each ON clause as a FOR KEY clause, leaving the join order untouched. The arrow direction is fixed by the schema, not chosen by the query author (see Section 8.3).

SQL

SELECT o.id, o.order_date, oi.amount,
       c.name AS customer, co.country_name,
       os.status_name, sm.method_name AS shipping,
       pm.method_name AS payment, cur.currency_code,
       op.priority_name, tc.code_name AS tax_code,
       dt.type_name AS discount
FROM customers           AS c
JOIN countries           AS co  FOR KEY (id)          <- c (country_id)
JOIN customer_types      AS ct  FOR KEY (id)          <- c (customer_type_id)
JOIN orders              AS o   FOR KEY (customer_id) -> c (id)
JOIN order_statuses      AS os  FOR KEY (id)          <- o (order_status_id)
LEFT JOIN discount_types AS dt  FOR KEY (id)          <- o (discount_type_id)
JOIN shipping_methods    AS sm  FOR KEY (id)          <- o (shipping_method_id)
JOIN payment_methods     AS pm  FOR KEY (id)          <- o (payment_method_id)
JOIN order_items         AS oi  FOR KEY (order_id)    -> o (id)
JOIN currencies          AS cur FOR KEY (id)          <- o (currency_id)
JOIN order_priorities    AS op  FOR KEY (id)          <- o (order_priority_id)
JOIN tax_codes           AS tc  FOR KEY (id)          <- o (tax_code_id)
WHERE o.id = 100;

Both arrow directions appear. Nine joins use <-: the joined table is the PK side of a lookup from a table already in scope. Two use ->: orders is reached from customers, and order_items is reached from orders. Each -> is a point where the query's row count can grow, the mechanism that makes fan traps possible (Section 7.3.1). Key join validation tracks this against the intermediate result.

The second refactoring step is optional. A query author who wants the result to map one-to-one to a specific table's rows can place that table first in the FROM clause and rearrange the joins so every arrow becomes <-. Reaching this shape requires a root table in the foreign key graph (one that no other table references in the query); only a root can sit at the start with every other table joined as the PK side of a <-. Placing order_items first:

SQL

SELECT o.id, o.order_date, oi.amount,
       c.name AS customer, co.country_name,
       os.status_name, sm.method_name AS shipping,
       pm.method_name AS payment, cur.currency_code,
       op.priority_name, tc.code_name AS tax_code,
       dt.type_name AS discount
FROM order_items         AS oi
JOIN orders              AS o   FOR KEY (id) <- oi (order_id)
JOIN customers           AS c   FOR KEY (id) <- o (customer_id)
JOIN countries           AS co  FOR KEY (id) <- c (country_id)
JOIN customer_types      AS ct  FOR KEY (id) <- c (customer_type_id)
JOIN order_statuses      AS os  FOR KEY (id) <- o (order_status_id)
JOIN shipping_methods    AS sm  FOR KEY (id) <- o (shipping_method_id)
JOIN payment_methods     AS pm  FOR KEY (id) <- o (payment_method_id)
JOIN currencies          AS cur FOR KEY (id) <- o (currency_id)
JOIN order_priorities    AS op  FOR KEY (id) <- o (order_priority_id)
JOIN tax_codes           AS tc  FOR KEY (id) <- o (tax_code_id)
LEFT JOIN discount_types AS dt  FOR KEY (id) <- o (discount_type_id)
WHERE o.id = 100;

Every join uses <-: each newly introduced table is the referenced (PK) side, guaranteed to be preserved by the key join. No join can duplicate order_items rows. The result maps one-to-one to a subset of the order_items rows (a subset because the WHERE clause filters). This guarantee is structural, verified at compile time, and requires no runtime checks.

8.10 Why match type Does Not Affect Key Joins

A referential constraint's match type controls how rows with null values in the referencing columns are checked. It does not change the column correspondence declared by the referential constraint, and it does not change the rule for all-non-null referencing keys: for MATCH SIMPLE, MATCH FULL, and MATCH PARTIAL, every all-non-null referencing key must have a referenced row with equal referenced column values.

This all-non-null containment fact is the only containment fact that a key join consumes from the referential constraint. The join condition produced by FOR KEY is always the ordinary equijoin over the named columns:

PKC1 = FKC1 AND ... AND PKCn = FKCn

Under SQL equality, a null value in any compared column does not match. Rows whose referencing key contains a null value are therefore handled by the key-join null rule: an inner key join is accepted only when the referencing columns have not-null proof; otherwise the join type must preserve the referencing side.

MATCH PARTIAL may allow a composite referencing row with some null key components to satisfy the referential constraint by considering only its non-null components. A key join deliberately does not turn that case into a partial-column join. Such a join would have row-dependent join columns and could match multiple referenced rows. For FOR KEY, those rows remain ordinary null-key rows: they do not match the equijoin and must either be impossible by not-null proof or preserved by an outer join.

8.11 Why Direction Is Not Inferred from Nullability

The arrow direction is fixed by the referential constraint (Section 8.3), but the query author still writes it. We considered inferring it instead: in the common ORM pattern, the referencing table is preserved and the referenced table is null-extended, so the null-extended side appears to identify the referenced side.

BI-style aggregates show this is not always the case. They can preserve the referenced side and null-extend the referencing side, the reverse arrangement:

SQL

Here customers is the referenced side, yet customers is preserved and orders is null-extended. The null-extended side can be either end of the constraint, so it cannot signal which side is referenced.

8.12 Why Do We Need To Write The Arrow, If All-<- Is A Good Idea

Section 8.9 presents the all-<- form as an option. We considered requiring it: every key join introduces the PK side, so every arrow points the same way. We rejected this because it fixes the order in which the tables must be written.

Reordering or inserting a join then forces rewriting the rest of the chain to keep the arrows aligned, which raises the cost of refactoring. It also has performance consequences: a planner cannot explore every possible join order, so some databases fall back on the order in which the user wrote the relations, and mandating that order takes the lever away from the developer. On top of that, only one order can ever be forced: if some other feature wants a different one, only one of them can win. Furthermore it limits the expressiveness available to the developer.

[Foundation]	ISO/IEC 9075-2, SQL/Foundation (IWD)
[Schemata]	ISO/IEC 9075-11, SQL/Schemata (IWD)