Friday, December 5, 2008

Proof iteration vs Proof recursion

So I learned some flavors of induction, it is time to apply them into something familiar. Iteration and recursion.

Recall when designing a iteration part of a program, I worry about if the result is what I want, i worry about if the loop enters an infinite loop. When trying to prove an iteration correct, we need to prove that every time the loop goes through, it is partially correct. In addition, we need to prove that the loop terminates.
To prove partial correctness, it is convenient to prove a loop invariant, which is always right after each iterating. I should take notice that a loop invariant can be any condition. Like the 2nd question in test2, where the loop invariant is simply that n is always a natural number. To prove termination, recall PWO which lead to a theorem which says that every decreasing sequence of natural numbers is finite. So I have to find the decreasing sequence. This may or may not be the same as the loop invariant, but it has to be decreasing, and consists of all natural numbers.

This is a rush exam period, so I have to remind myself not to get confused with the prove of recursive programs. In recursion we do not usually have an invariant, since every time the program calls itself, it has no record of how many steps it actually went though.(Unless we set a stack or other way.) However, recall when I was learning writing recursion program in CSC148, the first thing I was told was that a recursion has to have a base case, otherwise the recursion never terminates. So recursive program is actually an inductively defined program. Then proving recursion correctness is in the similar way as we write induction for inductive defined predicates. We first prove that the base case is correct, then using some flavor of induction to prove that P(n) holds for all n. The if statements are good signs of separating cases.

PSI, PCI, PWO, Structural Induction.

The first segment of this course was about various flavors of induction.
PSI is the most familiar one, since I have learned this technique since high school. It is to prove the basis first, then prove that if case n is true, case n+1 is true as well. This technique is widely used not only in computer science.
PCI was first seen in MAT137 but not until this course that I extensively used this technique. Using PCI facilitated many proofs that was seen impossible, for you can assume all the case before n is true, so you can use the property of those.
This is my first course learning PWO. Intuitively it makes perfect sense, that every non-empty set of positive integers contains a smallest element. However, it is worth noticing that when applying this technique, I should not forget to prove that all the elements of the set are natural numbers, and they are in a non-increasing order. (I lost a few marks for this in test 2.) A good example is proving the loop invariant for looping correctness. For problems that seems not so comfortable with PSI or PCI, I should try to build such a set and see if it is better to apply PWO.
It can be proved that PWO, PCI, and PSI are equivalent.
Structural induction is best used when the predicate to prove is based on a inductive definition. The basis is the cases of the definition's basis. We then prove according to the inductive part of the definition.

A comparison between structural induction and PCI: Q3 in A3, every regex without * denotes a finite language. (only skeleton provided)

Structural induction: (based on the definition)
P(r): The language denoted by a regex r over a finite alphabet
without * is finite.
Claim: \forall finite alphabet, \forall r \in RE, P(r).
Basis: the basis of regex without * are \empty, \epsilon, a (\for every a \in \sigma). These all denotes finite languages.
Induction Steps: Assume P(r1), P(r2), then by definition, r1r2, r1+r2 \in RE. Then prove they both denotes finite languages.
Thus \forall r \in RE, P(r).

Complete induction: (based on the number of operations in the regex, n)
P(n): The language denoted by an n-operator regex R over a finite alphabet
without * is finite.
Claim: \forall finite alphabet, \forall n \in N, P(n).
Proof: Let \sigma be an arbitrary finite alphabet.
Base Case: if n = 0, that means R has no Alternation or Concatenation. In this case, P(0) holds.
Induction Step: Assume n is an arbitrary N. Assume P(m) is true for all 0 <= m <= n. Then any R of (n+1) Step is Concatenation or Alternation of R1, R2 with n-1, 1 operation respectively. By IH they both holds, then prove R1R2, R1+R2 both satisfy the predicate.
Therefore By PCI, \forall n \in N, P(n).
So \forall finite alphabet, \forall n \in N, P(n).

Thursday, December 4, 2008

Problem Solving Episode

Question: PS5.
Prove that if language L has a finite number of strings, then there is some regular expression that denotes L.

1.UNDERSTANDING THE PROBLEM
The first thing in solving any problem is to fully understand what the problem to prove. In this case, we are proving that a language with finite NUMBER of strings can be denoted by a regex. It is easy to confuse "number of strings" with "length of strings", where I made a mistake. And we have to understand what does Language refer to. It means all the subsets of the superset of the alphabet. In other words, a language is a set of strings over an alphabet. This means, a language does not have to follow any rule to compose. (although usually they are).
2.DEVISING A PLAN
Intuitively thinking, if L has a finite number of strings, a regex is possible to build, at least the one with all element separated by alternation operation. To have a more rigorous proof, we have to consider that languages can be based on different alphabets, and finite number can be infinitely large, so it is not possible to prove in an individual case. However, consider the difference between cases. The cases can only be identified by the number of strings the language is carrying; the number of string is always a natural number. For this kind of questions, induction is a routine way to sort them out. To use induction we have to find the basis, which is provided by empty language and language with one string. Then we just choose a flavor of induction and finish this proof.
CARRYING OUT THE PLAN
The first thing to do in writing an inductive proof: define predicate.
P(n): If language L has |L| = n (that is, it consists of n strings), then L has a regular expression that denotes it.
Basis: As stated before, n = 0 or n = 1. If 0, then \empty is a regex. if 1, then \epsilon for empty string language, or a \in \sigma for L(a).
Induction Step: ( Here complete induction is used.)
Assume n is a natural number, and that P(0)/\.../\P(n-1) is true. Assume L is a language of n strings, i.e. |L|=n. n = 0 or 1 is proved in basis. Otherwise, split L into one set with n-1 strings, and another set with 1 string. By IH, P(1) and P(n-1) hold, Then regex R1 and R2 denote them respectively. L(R1)+L(R2) = L(R1+R2), so R1+R2 is a regex denotes L.
Therefore forall n \in N, P(n). So every finite language can be denoted by a regex.
LOOKING BACK
I spent quite some time on PS5 in this SLOG. Not only I lost considerable mark on it so it is quite impressive, it also provides a sound example of how to formulate a question, how to grasp the variants within it, and how to evantually conduct an induction to solve it.

DFSA

I have wrote about NFSA, Cartesian Product stuff already. Since DFSA is the basis of all the automaton machines, I need to show some understanding in DFSA.
Here I wrote the proof of state invariants, for question 4 in A3:

The state invariants of the DFSA are:
\delta(s,x) = (q(\epsilon), q(0)') if x = \epsilon
(q(0), q(0)') if x = 0 mod 5, the length of x is even
(q(0), q(1)') if x = 0 mod 5, the length of x is odd
(q(1), q(0)') if x = 1 mod 5, the length of x is even
(q(1), q(1)') if x = 1 mod 5, the length of x is odd
(q(2), q(0)') if x = 2 mod 5, the length of x is even
(q(2), q(1)') if x = 2 mod 5, the length of x is odd
(q(3), q(0)') if x = 3 mod 5, the length of x is even
(q(3), q(1)') if x = 3 mod 5, the length of x is odd
(q(4), q(0)') if x = 4 mod 5, the length of x is even
(q(4), q(1)') if x = 4 mod 5, the length of x is odd

Proof:
Base case: x = \epsilon. In this case, \delta(q(\epsilon),x) = (q(\epsilon), q(0)'). All other states are vacuously true.
Induction Step: Assume P(x'), x' being some string of {0,1}*. And let x = x'a. Then either a = 0 or a = 1.
Case a = 0: Then \delta*(s,x) = \delta*((q(\epsilon), q(0)'), x'0) = \delta((\delta*(q(\epsilon), q(0)'), x')), 0)
= \delta(q(\epsilon), q(0)'), 0) if x' = \epsilon # By IH, P(x')
\delta((q(0), q(0)'), 0) if x' = 0 mod 5, the length of x is even
\delta((q(0), q(1)'), 0) if x' = 0 mod 5, the length of x is odd
\delta((q(1), q(0)'), 0) if x' = 1 mod 5, the length of x is even
\delta((q(1), q(1)'), 0) if x' = 1 mod 5, the length of x is odd
\delta((q(2), q(0)'), 0) if x' = 2 mod 5, the length of x is even
\delta((q(2), q(1)'), 0) if x' = 2 mod 5, the length of x is odd
\delta((q(3), q(0)'), 0) if x' = 3 mod 5, the length of x is even
\delta((q(3), q(1)'), 0) if x' = 3 mod 5, the length of x is odd
\delta((q(4), q(0)'), 0) if x' = 4 mod 5, the length of x is even
\delta((q(4), q(1)'), 0) if x' = 4 mod 5, the length of x is odd
= (q(\epsilon), q(0)') if x = \epsilon
(q(0), q(0)') if x = 0 mod 5, the length of x is even
(q(0), q(1)') if x = 0 mod 5, the length of x is odd
(q(1), q(0)') if x = 1 mod 5, the length of x is even
(q(1), q(1)') if x = 1 mod 5, the length of x is odd
(q(2), q(0)') if x = 2 mod 5, the length of x is even
(q(2), q(1)') if x = 2 mod 5, the length of x is odd
(q(3), q(0)') if x = 3 mod 5, the length of x is even
(q(3), q(1)') if x = 3 mod 5, the length of x is odd
(q(4), q(0)') if x = 4 mod 5, the length of x is even
(q(4), q(1)') if x = 4 mod 5, the length of x is odd .

Case a = 1: Then \delta*(s,x) = \delta*((q(\epsilon), q(0)'), x'1) = \delta((\delta*(q(\epsilon), q(0)'), x')), 1)
= \delta(q(\epsilon), q(0)'), 1) if x' = \epsilon # By IH, P(x')
\delta((q(0), q(0)'), 1) if x' = 0 mod 5, the length of x is even
\delta((q(0), q(1)'), 1) if x' = 0 mod 5, the length of x is odd
\delta((q(1), q(0)'), 1) if x' = 1 mod 5, the length of x is even
\delta((q(1), q(1)'), 1) if x' = 1 mod 5, the length of x is odd
\delta((q(2), q(0)'), 1) if x' = 2 mod 5, the length of x is even
\delta((q(2), q(1)'), 1) if x' = 2 mod 5, the length of x is odd
\delta((q(3), q(0)'), 1) if x' = 3 mod 5, the length of x is even
\delta((q(3), q(1)'), 1) if x' = 3 mod 5, the length of x is odd
\delta((q(4), q(0)'), 1) if x' = 4 mod 5, the length of x is even
\delta((q(4), q(1)'), 1) if x' = 4 mod 5, the length of x is odd
= (q(\epsilon), q(0)') if x = \epsilon
(q(0), q(0)') if x = 0 mod 5, the length of x is even
(q(0), q(1)') if x = 0 mod 5, the length of x is odd
(q(1), q(0)') if x = 1 mod 5, the length of x is even
(q(1), q(1)') if x = 1 mod 5, the length of x is odd
(q(2), q(0)') if x = 2 mod 5, the length of x is even
(q(2), q(1)') if x = 2 mod 5, the length of x is odd
(q(3), q(0)') if x = 3 mod 5, the length of x is even
(q(3), q(1)') if x = 3 mod 5, the length of x is odd
(q(4), q(0)') if x = 4 mod 5, the length of x is even
(q(4), q(1)') if x = 4 mod 5, the length of x is odd
So forall string x', forall a \in {0,1}, x = x'a, P(x') -> P(x).

Therefore \forall x \in {0,1}*, P(x).

FSA, regex, regular language and pumping lemma

Every regex has an equivalent NFSA, and every NFSA has an equivalent DFSA.

Cartesian Product:
A direct approach of union,
Qu = Q1 * Q2 (Cartesian product: all possible combo of states)
\sigmau: all same.
\deltau((q1, q2), a) = (\delta1(q1,a), \delta(q2,a))
su = (s1,s2)
Fu = (F1, F2).
I ound Cartesian Product is actually very straightforward, since it is combining two seperated DFSAs. The only problem is that we have to carry off some tedious writing when clean out the solution...

Since regex, NFSA and DFSA are in the same level of expressive power, it should be named a class. Actually, there is. A language is called Regular iff it is denoted by some regex, or is accepted by a FSA.

How to prove if the language is regular or not? This question naturally introduces another idea of "pumping lemma". That says that if L is a regular language, then every x \in L that has length greater than some n can be splited into uvw, |uv| <= n, uv^kw \in L forall k \in N.

To show this, we have an example, consider language L that accepts binary strings with equal number of 0s and 1s. Try to use pumping lemma and find contradiction.

obviously, assume x starts with a series of 0s. then by pumping lemma, x = uvw, so uv = 0^k. Then uvw, w being 1, either more 1 than 0 if k = 0, or more 0 than 1 if k >0. Therefore contradict.

Wednesday, December 3, 2008

NFSA revisit.

NFSA is not yet covered in any assignment or problem set, yet it is among the test and exam material. So here i do a little review:

Intuitively, DFSA is the sort of FSA that if the current state and the next character in the string are known, we know there is one and only one state to go next. In the contrary, for NFSA each state and corresponding next character combination has multiple choices to the next state. Whats more state transition can even go without the next character, called /epsilon transition. Obviously, for a specific string, there will be a number of state transition routes, each might lead to different end state. To deal with this, if any of these routes eventually lead to a accepting state, the string is accepted by this NFSA. Clearly, designing a NFSA is more intuitive when coping with the patterns with "+" operation.

*The mathematical definition of the NFSA is similar to those of DFSA, that M = (Q, /Sigma, /delta, s, F). Only difference being the transition function /delta, where each element will have a set of "destinations". If the NFSA includes /epsilon transition, /delta also includes transition function for that. Conversely, DFSA is a special case of NFSA.

Equivalence of NFSA and DFSA:

For an NFSA M = (Q, \sigma, \delta, s, F), converting into a DFSA M' = (Q', \sigma', \delta', s', F'):
Subset Construction: each state of M' is a set of states of M.
Q' = P(Q) (all possible subsets of the powerset of Q.)
\sigma' = \sigma (Alphabet remains same..of course so!)
s' = \epsilon(s) (states from s with \epsilon transition)
F' = {q'\in Q'|Q \intersect F not empty}
\epsilon' : the transition functions that maps the element in Q' to element in Q' with a transition.

Ex: if Q = {q0, q1}, F = q1,
then Q' can be {q'(@), q'0, q'1, q'01}, F' = {q'1,q'01}, do some combination with \epsilon'.

Tuesday, December 2, 2008

More on PS5

I ve just seen the sample solution for PS5.

P(n): If language L has |L| = n (that is, it consists of n strings), then L has a regular expression that denotes it.

Base cases: If n = 0, then L is the empty language, and it is denoted by \emptyset.

If n = 1, then L consists of a single string. If the single string is the empty string, then the symbol \epsilon denotes L. Otherwise, we may argue that the sequence of symbols in L's single string constitutes a regular expression for L (perhaps by adding suitable parentheses). A really rigorous proof uses induction on the length of the single string.

Induction step: Assume n is a natural number, and that P(0)/\.../\P(n-1) is true. Suppose L is a language with |L|=n. If n is either 0 or 1, we're done, since the base cases show that L has a regular expression that denotes it. Otherwise, partition L into one set with n-1 strings, and another set with 1 string. By P(1) and P(n-1), these two sets have regular expressions that denote them, R1 and R2. Since L is the union of the languages denoted by these two sets, it is denoted by (R1+R2).

I think a simple induction do the job as well. The base case is the same.
IH: Assume P(n) is true. Suppose L is a language with |L| = n+1. Arbitrary pick an element s from L, then there is a subset of L that does not include s. This set L' is a language that has n string, so by IH there is a regex R' that denotes L'. By one more alternation operation, we have R = R'+s that denotes L. Therefore, any finite language can be denoted by a regex.