śr., 10/03/2012 - 12:31 — fmurlak
### Ex. 1

Teacher's hint

Definitions

### Ex. 2

### Ex. 3

Teacher's hint

### Ex. 4

Teacher's hint

### Ex. 5

Solution

Definition

### Ex. 6

### Ex. 7

### Ex. 8

### Ex. 9

### Ex. 10

### Ex. 11

### Ex. 12

### Ex. 13

Let \(R,S\) be respectively \(n+m\)- i \(m\)-ary relational signature symbols.

Consider a new operation \(\div\) in relational algebra:

\([\![R\div S]\!]=\{\langle a_1,\dots,a_n\rangle|\) for each \(\langle b_1,\dots,b_m\rangle\in[\![S]\!]\ (\langle a_1,\dots,a_n,b_1,\dots,b_m\rangle\in[\![R]\!])\}\).

Show that the operator \(\div\) is expressible in terms of the other operators of relational algebra. Give an expression of relational algebra that defines \(R\div S\).

The ides is to show how the equivalence of first-order logic and relational algebra can be helpful. The exercise in the form "give an expression" is marked with two exclamation marks in Ullman & Widom's database textbook.

*Non-uniform class \(AC^0\) * contains functions \(f:\{0,1\}^*\to\{0,1\}^*\) such that there is a polynomial \(p(n)\) and a constant \(c\) such that for every \(n\) there is a logical circuit \(C\) that computes \(f(x)\) for all \(x\in\{0,1\}^n\), which:

- has \(n\) input gates and \(m\) output gates
- is built from logical gates AND, OR and NOT
- input and logical gates can have arbitrarily many outputs
- gates AND and OR can have arbitrarily many inputs, NOT gates - only one input
- the number of gates in the circuit does not exceed \(p(n)\)
- the depth of the circuit does not exceed \(c\)

*Non-uniform class \(NC^1\) * contains functions \(f:\{0,1\}^*\to\{0,1\}\) such that there is a polynomial \(p(n)\) and a constant \(c\) such that for every \(n\) there is a logical circuit \(C\) that computes \(f(x)\) for all \(x\in\{0,1\}^n\), which:

- has \(n\) input gates and one output gate
- is built from logical gates AND, OR and NOT
- input and logical gates can have arbitrarily many outputs
- gates AND and OR must have exactly 2 inputs, NOT gates - only one input
- the number of gates in the circuit does not exceed \(p(n)\)
- the depth of the circuit does not exceed \(c\)

Prove that with a suitable (natural) encoding of finite structures \(\mathfrak{A}\) as binary strings \(code(\mathfrak{A})\), for every sentence \(\varphi\) of first order logic, the function

\(f_\varphi:code(\mathfrak{A})\mapsto\) **if** \(\mathfrak{A}\models\varphi\) **then** \(1\) **else** \(0\)

belongs to non-uniform \(AC^0\).

Prove that \(f_\varphi\) of the previous exercise, belongs to \(NC^1\).

In fact \(AC^0\subseteq NC^1\) and this is the mathematical purpose of this exercise.

Morally, \(NC^1\) is much more realistic than \(AC^0\) and the idea is that SQL (roughly equivalent to first order logic) can be efficiently computed in parallel architectures.

Morally, \(NC^1\) is much more realistic than \(AC^0\) and the idea is that SQL (roughly equivalent to first order logic) can be efficiently computed in parallel architectures.

**Information** In databases, an important optimization rule for relational algebra expressions tries to minimize the size of intermediate results obtained during query evaluation.

Prove that the following problem is undecidable:

Data: a relational algebra expression \(E\) over a signature with a single binary symbol \(R\) and perhaps some other symbols.

Question: is \(|[\![E]\!]|<|[\![R]\!]|^2\) for every model?

This will be done after Trakhtenbrot's theorem about undecidability of first order logic over finite structures. Use it together with Codd's theorem.

The purpose is to show undecidability of some questions related to query optimization in databases.

The exercise is sligthly (but only slightly) cheating, because the "other" symbols can generate huge intermediate results, and we only ask about the end result. One way out ot this is to assume that the size of \(R\) is dominating (say, exponential) compared to the size of other relations.

The purpose is to show undecidability of some questions related to query optimization in databases.

The exercise is sligthly (but only slightly) cheating, because the "other" symbols can generate huge intermediate results, and we only ask about the end result. One way out ot this is to assume that the size of \(R\) is dominating (say, exponential) compared to the size of other relations.

Relational algebra with total order can be constructed in two ways. Assume that \(\leq\) is a total order on all elements that appear in tuples, and there are no constants in the signature.

The first way is to introduce, to every database, an additional table (formally, a view) \(LEQ\) with two columns, containing all tuples

\(\langle a,b\rangle\) for \(a,b\) in the active domain, such that \(a\leq b\). Then ordinary expressions of relational algebra can use \(LEQ\)

as any other table. Then we treat \(LEQ\) as part of the query syntax, and not as an ordinary table.

Another way is to keep the tables as they are, but extend the syntax and in the condition \(\theta\) of selection \(\sigma_\theta(E)\) allow inequalities of the form \(i\leq j\) for \(i,j\) not larger than the number of columns in \(E\). Semantics is obvious, e.g. \([\![\sigma_{i\leq j}(E)]\!]=\{\vec{a}\in [\![E]\!]:a_i\leq a_j\}.\)

Prove that the queries expressible in both formalisms are the same.

Every expression \(\sigma_{\theta}(E)\), gdzie \(\theta\) with inequalities can be equivalently presented as

\(\sigma_{\theta_1}(\sigma_{\theta_2}(\ldots\sigma_{\theta_k}(E)\ldots))\), where \(\theta_i\) are single equalities or inequalities. Therefore if we express a single selection\(\sigma_{i\leq j}(E)\) using \(LEQ\), we are done.

This is done as follows: \(\pi_{1,2,\ldots,n}(\sigma_{i=n+1,j=n+2}(E\times LEQ))\), where \(n\) is the number of columns in \(LEQ\).

\(\sigma_{\theta_1}(\sigma_{\theta_2}(\ldots\sigma_{\theta_k}(E)\ldots))\), where \(\theta_i\) are single equalities or inequalities. Therefore if we express a single selection\(\sigma_{i\leq j}(E)\) using \(LEQ\), we are done.

This is done as follows: \(\pi_{1,2,\ldots,n}(\sigma_{i=n+1,j=n+2}(E\times LEQ))\), where \(n\) is the number of columns in \(LEQ\).

For the other direction, let \(AD\) define the active domain (as in the lecture notes). Then \(LEQ=\sigma_{1\leq 2}(AD\times AD)\).

The *semijoin* operator \(\gg\) of relational algebra has the following semantics (caution: I use a nonstandard symbol, as I could not produce the standard one with the available markers here):

\([\![R\gg_\theta S]\!]=\{\vec{a}\in [\![R]\!]~:~\exists \vec{b}\in[\![S]\!] \theta(\vec{a},\vec{b})\}\),

where \(\theta\) is a set of equalities between the columns of \(R\) and \(S\), numbered jointly.

For example, for binary \(R\) and \(S\)

\([\![R\gg_{1=3, 2=4} S]\!]=\{\langle a_1,a_2\rangle\in [\![R]\!]~:~\exists \langle b_1,b_2\rangle\in[\![S]\!]a_1=b_1\land a_2=b_2\}.\)

Prove that semijoin is expressible in relational algebra, and define it using the other operators.

Prove that expressions built of selection, projection, sum, difference and semijoin do not express all queries definable in relational algebra.

Prove that every expression built of selection, projection, sum, difference and semijoin in a database where elements are natural numbers, can be computed in time \(O(n\log n),\) where \(n\) is the maximal number of tuples in relations.

Prove that expressions built of projection, product, sum, difference and semijoin express all queries definable in relational algebra, provided there there are no constants in the signature.

Prove that there is an expression built of selection, projection, sum, difference and semijoin, than defines the interestion of two relations.

Prove that the following relational algebra operators are not expressible in terms of the others:

- Sum is not expressible by selection, projection, product and difference.
- Difference is not expressible by selection, projection, product and sum.
- The other cases are very easy.

Consider two tables \(A\) i \(B\) with two columns each. The first one has \(n_A\) rows.

Design an algorithm to calculate the relational algebra expression

\(A-\pi_{1,2}(\sigma_{1=3}(A\times B)).\)

You should assume that the active domain consist of integers, and table rows are indexed by nonnegative integers.

The algorithm should use exactly one table of two columns and \(n_A\) wierszy, and a few additional variables. Do not use pointers, recursion and other methods of implicit memory allocation.

Tables \(A\) and \(B\) are read-only, and you may assume that they are sorted.

Let \(k\in\mathbb{N}\) be a constant. Prove that if \(E\) is a relational algebra expression and the maximal number of columns in subexpressions of

\(E\) is \(k\), then there is an agorithm to calculate the value of \([\![E]\!]\) in every structure \(\mathfrak{A}\) running in time \(O(n^{k}\log n),\) where \(n\) the size of the active domain of \(\mathfrak{A}\).