Propositional Logic/Calculus as a formal language.

Last time I talked about Why we need formal languages and what are they? which laid the foundations for the need and a rudimentary understanding of them, this time I will focus on how we turn the idea of a formal language into proper logic and through it, formalizes our notation of deductive logic in a simple way that machines can handle. Many of the readers will be familiar with the concepts of truth value, but it is important to understand that as a formal language, Propositional Calculus do not deal with true or false value of anything, it is just a system to manipulate symbols that coincide with our intuitive notion of how truth value works.

Remember that we have two components to a formal language

  1. Symbols
  2. Rules of composition

In propositional logic we subdivide these further, our symbols is divided into

  • Operators
  • Variables

And for Rules of composition we do not subdivide it further, we rename it mostly into Well-formed formulas.

Ontop of it we add a system of inference consisting of

  • Axioms
  • Substitution
  • Inference rules

Symbols

The collection of symbols can be said to be finite or infinite depending on how one chose it but in general, Operators are always finitely many and variables are infinitely many. Be careful now that while we call these groups of symbols for operators and variables, they are still just symbols and our choice of name is picked to fit how we decide to use them in rules of composition later. Generally we also include “(” and “)” as symbols to assist us to disambiguate things, but we equally create additional rules to minimize the usage of (). Classical example is ab+c=a\cdot b+c=(a\cdot b)+c.

To make variables finite we may add an operator symbol of ‘ and a single variable which we may denote A, and we implement a rule of that we can place as many ‘ we want after A, so we get A, A’, A”, A”’,…. so we get different variables through that. Of course that is very difficult to read and cumbersome for humans so in general, we use different, still capital, letters and assume that the variables are infinite but formally we can say that both sets are finite, equally do we use for ease of our reading and understanding subscripts of numbers to expand it further. From a formal point of view, some might complain on the fact I use the notion “set” here, which is something to be defined later, which might end up being circular. Formally however I have not used a set, I am using our human concept of how a set should behave to verbally communicate these things and as such, avoid having it being circular.

Operators tends to use special symbols for them and for the propositional logic of our choice, the standard one, we will only have two symbols, ¬ and ⇒. Once again we are picking these symbols because by the rules of composition they will behave in ways that coincide with our intuitive feeling of what those symbols usually represents, but without adhering to the concept of “truth” which is a slippery concept. An important property for each operator is that we assign them an arity, which is a natural number and can be any. For the general case of an operator we use the letter O for it and superscript the arity, so \text{O}^n is a general operator of arity n. For ¬ we give the arity of 1, and for ⇒ we give the arity of 2.

Again here, one might complain that there is a level of circularity, if we use formal language to define things in mathematics and we then use it to define numbers, but use numbers to define a property in a formal language? This is again not circular just because we are not using the properties of numbers, but using them to intuitively describe a property of things here that merely tells us the number of variables an operator accepts. This can be suspended entirely and give instead each operator a rule of composition where we have to use substitution to go around. This is of course again a lot of extra needless work so we use this intuitive approach for convenience.


Well-formed formulas

For the well-formed formulas, we build them inductively like this.

  1. Atomic formulas, just variables, are wff’s
  2. For any given operator \text{O}^n and wff’s, w_1,w_2,\ldots,w_n, we have that \text{O}^n(w_1,w_2,\ldots,w_n) is a wff.

On #2 I add to the reader that we are using a lot of shorthand and conveniences for us humans there. If one is very strict the ellipses should be replaced by the appropriate string and so would one in most instances. However this is needlessly complicated and cumbersome to read so we suspend doing it like that for the convenience of readability as we can easily understand what it means. Similarly for human reading convenience, if an operator have arity of two, we may do infix notation so instead of *(P,Q) we may very well write P*Q, and similarly for unary operators we may use prefix or suffix writing instead as we see fit.


System of inference

This part is what separates the logic systems from many other formal languages, it allows us to create theorems where we can emulate how we humans reason, that given certain things we conclude another one. System of inference consists of the 3 parts of Axiom, Substitution and Inference Rules.

Axioms is simply then a collection of wff’s that are declared to be within the language and can be of any kind. In the standard one we have these axioms

  • P \implies (Q\implies P)
  • (P \implies (Q\implies R))\implies ((P\implies Q)\implies(P\implies R))
  • (\neg Q \implies \neg P)\implies (P\implies Q)

As with all axioms, these are merely given and said to be valid. However for any potential variable, we would need to rewrite these and that would require a lot of work, unless we want it to be extremely specific but we want it to be valid always, that is where substitution comes from.

Substitution in a wff is when when we replace one variable with another. To define it more rigorously we do it recursively. We write w[u/P] to say that in the wff w, we replace all P with u, where u is a wff. Notice that u has to be a valid wff  for the substitution to give a valid wff.

  1. if w is atomic then
    1. if w = P, then w[u/P]=u
    2. if w≠P, then w[u/P]=w
  2. if w=O^n(w_1,\ldots,w_n) then w[u/P]=O^n(w_1[u/P],\ldots,w_n[u/P])

With this we can have our set of axioms being quite small, relatively speaking, but still derive many more valid ones that we can use for other purposes, it primarily makes it more concise to deal with our language without having to deal with enormous sets of axioms.

Inference rules is when we use the method of substitution and pre-existing valid, including the axioms, wff’s to derive new ones. These are rules that usually come in the form of w_1,w_2,\ldots,w_n\vdash u, which is usually read as “Given w1,w2,…,wn infer u”. Important to note is that here, we extend our substitution slightly by having

(w_1,w_2,\ldots,w_n\vdash u)[r/P]=(w_1[r/P],\ldots,w_n[r/P]\vdash u[r/P])

which says that in an inference rule, if we substitute equally on all wff’s involved, the inference rule is valid.

The ordinary standard logic uses one inference rule, namely

A,A\implies B \vdash B

That is all it uses. When we can show that given w_1,\ldots, w_n that one infers W, then we say that w_1,\ldots, w_n \vdash W is a theorem of our language. What it means to be a theorem is that if we are given w_1,\ldots, w_n and have the theorem w_1,\ldots, w_n \vdash W, then w_1,\ldots, w_n, W are available to be used in to show another theorem. In other words a theorem in propositional logic makes additional wff available for us to use in future theorems.

A simple example is if a theorem is A \implies (\neg B \land Q), then if  we are given A in a theorem,A , P \vdash W, then it is equivalent to A , P, \neg B \land Q \vdash W. In a sense one can view it as theorems assist making the necessary parts of other theorems shorter. The proof of a theorem is a sequence of WFFs, where the first ones are the ones given in the theorem, the ones after are the WFFs we infer based on the inference rules, and the last one is the one that the theorem says we infer. An example is for the theorem A,B,A\implies(B\implies C)\vdash C

  1. A
  2. B
  3. A\implies(B\implies C)
  4. B\implies C, MP[B\implies C/B](1,3)
  5. C, MP[C/B][B/A](2,4)

We say two theorems are circular if both are used in each others proof, this may be hidden through various means.


To summerize ordinary propositional calculus and it’s components

  • Operators:
    • \neg^1, written as prefix
    • \implies^2, written as infix
  • Axioms:
    1. P \implies (Q\implies P)
    2. (P \implies (Q\implies R))\implies ((P\implies Q)\implies(P\implies R))
    3. (\neg Q \implies \neg P)\implies (P\implies Q)
  • Inference rules:
    • A,A\implies B \vdash B, called Modus Ponens

All else is the usual for propositional calculus. In the next post we will see that this system will coincide with our regular notion of logic and reasoning while being entirely mechanical in nature.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s