Logic and databases, relational algebra

Ex. 1

Let \(R,S\) be respectively \(n+m\)- i \(m\)-ary relational signature symbols.
Consider a new operation \(\div\) in relational algebra:

\([\![R\div S]\!]=\{\langle a_1,\dots,a_n\rangle|\) for each \(\langle b_1,\dots,b_m\rangle\in[\![S]\!]\ (\langle a_1,\dots,a_n,b_1,\dots,b_m\rangle\in[\![R]\!])\}\).

Show that the operator \(\div\) is expressible in terms of the other operators of relational algebra. Give an expression of relational algebra that defines \(R\div S\).


Definitions

Non-uniform class \(AC^0\) contains functions \(f:\{0,1\}^*\to\{0,1\}^*\) such that there is a polynomial \(p(n)\) and a constant \(c\) such that for every \(n\) there is a logical circuit \(C\) that computes \(f(x)\) for all \(x\in\{0,1\}^n\), which:

  1. has \(n\) input gates and \(m\) output gates
  2. is built from logical gates AND, OR and NOT
  3. input and logical gates can have arbitrarily many outputs
  4. gates AND and OR can have arbitrarily many inputs, NOT gates - only one input
  5. the number of gates in the circuit does not exceed \(p(n)\)
  6. the depth of the circuit does not exceed \(c\)

Non-uniform class \(NC^1\) contains functions \(f:\{0,1\}^*\to\{0,1\}\) such that there is a polynomial \(p(n)\) and a constant \(c\) such that for every \(n\) there is a logical circuit \(C\) that computes \(f(x)\) for all \(x\in\{0,1\}^n\), which:

  1. has \(n\) input gates and one output gate
  2. is built from logical gates AND, OR and NOT
  3. input and logical gates can have arbitrarily many outputs
  4. gates AND and OR must have exactly 2 inputs, NOT gates - only one input
  5. the number of gates in the circuit does not exceed \(p(n)\)
  6. the depth of the circuit does not exceed \(c\)

Ex. 2

Prove that with a suitable (natural) encoding of finite structures \(\mathfrak{A}\) as binary strings \(code(\mathfrak{A})\), for every sentence \(\varphi\) of first order logic, the function
\(f_\varphi:code(\mathfrak{A})\mapsto\) if \(\mathfrak{A}\models\varphi\) then \(1\) else \(0\)
belongs to non-uniform \(AC^0\).

Ex. 3

Prove that \(f_\varphi\) of the previous exercise, belongs to \(NC^1\).

Information In databases, an important optimization rule for relational algebra expressions tries to minimize the size of intermediate results obtained during query evaluation.

Ex. 4

Prove that the following problem is undecidable:

Data: a relational algebra expression \(E\) over a signature with a single binary symbol \(R\) and perhaps some other symbols.
Question: is \(|[\![E]\!]|<|[\![R]\!]|^2\) for every model?

Ex. 5

Relational algebra with total order can be constructed in two ways. Assume that \(\leq\) is a total order on all elements that appear in tuples, and there are no constants in the signature.

The first way is to introduce, to every database, an additional table (formally, a view) \(LEQ\) with two columns, containing all tuples
\(\langle a,b\rangle\) for \(a,b\) in the active domain, such that \(a\leq b\). Then ordinary expressions of relational algebra can use \(LEQ\)
as any other table. Then we treat \(LEQ\) as part of the query syntax, and not as an ordinary table.

Another way is to keep the tables as they are, but extend the syntax and in the condition \(\theta\) of selection \(\sigma_\theta(E)\) allow inequalities of the form \(i\leq j\) for \(i,j\) not larger than the number of columns in \(E\). Semantics is obvious, e.g. \([\![\sigma_{i\leq j}(E)]\!]=\{\vec{a}\in [\![E]\!]:a_i\leq a_j\}.\)

Prove that the queries expressible in both formalisms are the same.


Definition

The semijoin operator \(\gg\) of relational algebra has the following semantics (caution: I use a nonstandard symbol, as I could not produce the standard one with the available markers here):

\([\![R\gg_\theta S]\!]=\{\vec{a}\in [\![R]\!]~:~\exists \vec{b}\in[\![S]\!] \theta(\vec{a},\vec{b})\}\),
where \(\theta\) is a set of equalities between the columns of \(R\) and \(S\), numbered jointly.

For example, for binary \(R\) and \(S\)
\([\![R\gg_{1=3, 2=4} S]\!]=\{\langle a_1,a_2\rangle\in [\![R]\!]~:~\exists \langle b_1,b_2\rangle\in[\![S]\!]a_1=b_1\land a_2=b_2\}.\)

Ex. 6

Prove that semijoin is expressible in relational algebra, and define it using the other operators.

Ex. 7

Prove that expressions built of selection, projection, sum, difference and semijoin do not express all queries definable in relational algebra.

Ex. 8

Prove that every expression built of selection, projection, sum, difference and semijoin in a database where elements are natural numbers, can be computed in time \(O(n\log n),\) where \(n\) is the maximal number of tuples in relations.

Ex. 9

Prove that expressions built of projection, product, sum, difference and semijoin express all queries definable in relational algebra, provided there there are no constants in the signature.

Ex. 10

Prove that there is an expression built of selection, projection, sum, difference and semijoin, than defines the interestion of two relations.

Ex. 11

Prove that the following relational algebra operators are not expressible in terms of the others:

  • Sum is not expressible by selection, projection, product and difference.
  • Difference is not expressible by selection, projection, product and sum.
  • The other cases are very easy.

Ex. 12

Consider two tables \(A\) i \(B\) with two columns each. The first one has \(n_A\) rows.

Design an algorithm to calculate the relational algebra expression

\(A-\pi_{1,2}(\sigma_{1=3}(A\times B)).\)

You should assume that the active domain consist of integers, and table rows are indexed by nonnegative integers.

The algorithm should use exactly one table of two columns and \(n_A\) wierszy, and a few additional variables. Do not use pointers, recursion and other methods of implicit memory allocation.

Tables \(A\) and \(B\) are read-only, and you may assume that they are sorted.

Ex. 13

Let \(k\in\mathbb{N}\) be a constant. Prove that if \(E\) is a relational algebra expression and the maximal number of columns in subexpressions of
\(E\) is \(k\), then there is an agorithm to calculate the value of \([\![E]\!]\) in every structure \(\mathfrak{A}\) running in time \(O(n^{k}\log n),\) where \(n\) the size of the active domain of \(\mathfrak{A}\).