i and inconsistent functions and properties

Since early days, well maybe grade 9 or later, we all encounter the mystical number “i”, we get familiar with real numbers fairly easily, they are mostly intuitive (but there is a lot of unintuitive stuff) but i eludes a lot of people. What is it? How does it relate to real numbers? How do we compare them? Why is it functions get so much more complicated with i? etc.

I will come to describe how complex numbers are constructed at a later date, but for now we agree that complex numbers are in the form of $a+bi$ with a and b being real numbers and leave out the strict formalities here.

A lot of cranks want to claim this is inconsistent because they claim that $i=\sqrt{-1}=\sqrt{\frac{1}{-1}}=\frac{\sqrt{1}}{\sqrt{-1}}=\frac{1}{i}=-i$, which is absurd as a non-zero number cannot equal to its negative self so its inconsistent and wrong!

Seems convincing, no? There are a whole lot of errors here that I will work through. First and foremost to get rid of the notion that $i=\sqrt{-1}$ is the DEFINITION! Which it isn’t, we define i with this relation $i^2=-1$, this might seem nitpicky and it is but for very good reason that I will return to soon. The most glaring issue is the step $\sqrt{\frac{1}{-1}}=\frac{\sqrt{1}}{\sqrt{-1}}$. This step is where it all breaks down. They use $\sqrt{x}$ in the same fashion as if it was real numbers. Let me explain more detailed.

When we write $\sqrt{x}$, or any equivalence, we mean a specific type of function with a specific domain, that is the set of all inputs that it is defined for. We often omit writing there cause context can salvage most of it and in most cases, it just doesn’t matter but here it does. Normally we have $\sqrt{}:\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0}$, that is we have squareroot defined as a function from the non-zero real numbers to the non-zero real numbers. Why is that so? Because in real numbers, we cannot make sense of squareroot for negative numbers, it is just non-sensical, this is what complex numbers are attempting to fix but that is not relevant right now. Now to distinguish them I will write ${}^+\sqrt{}$ for $\sqrt{}:\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0}$. This function will have certain properties we can prove, like the familiar ${}^+\sqrt{xy}={}^+\sqrt{x}{}^+\sqrt{y}$ which is what they use.

The astute reader will know we are still working on real numbers, what happens when we input a complex number into our ${}^+\sqrt{}$, NOTHING! We cannot do it! It has no meaning for complex numbers, what we need to do is EXPAND our function ${}^+\sqrt{}$ such that it can take complex numbers. But notice when we expand it, it no longer is the same function! However, for evident reasons, while we can expand it in infinitely many ways, it makes intuitive sense and align with our desires that the expansion will be such that if we still put in just the real numbers, and ignore any complex number, it would coincide with our original ${}^+\sqrt{}$, in mathematics we would write something like $\sqrt{}|_{\mathbb{R}_{\geq 0}}={}^+\sqrt{}$, which means that when we restrain the domain to positive real numbers, we cannot tell the two functions apart cause they always yield the same thing.

There is a natural way to do this and we do define it as such, I won’t go into the details here, but we do get $\sqrt{}:\mathbb{C}\to\mathbb{C}$ such that they coincide and we can get a meaningful way to get $\sqrt{-1}=i$. This is what I talked about before, we cannot define i in that manner because to put -1 into our squareroot, we must expand its domain and to do it, we need the complex numbers so it becomes circular if one tries to define it as such. If we define it as the square, the problem vanishes. Now, when we expand a function, a natural question arises, does the properties we had the luxury of before remain? Are they preserved? The general question is….no they are not. Sometimes it might remain and in certain expansions, but in general you cannot assume it. It is something you must PROVE that it is conserved, you cannot just ASSUME it. And that is the issue, they assume the distribution of squareroto remains and the fact of the matter is, what they prove is not what they want. Then want to go like this

1. Assume complex numbers work
2. Show that squarerooting gives a contradiction
3. Therefore our assumption of complex numbers was false.

When in fact, what they end up proving is only

1. Assume squareroot retain its former properties
2. Show it leads to a contradiction
3. THerefore the assumption of squareroot is false.

This is a keydifferens because the complex numbers as a field has only addition and multiplication and there, it is perfectly consistent. Squareroot is not a “native” function to complex numbers, it is one we add as an addition to perform computation and solve things but it is not innate. You cannot make algebraic structures with it being “native” beacuse most often, it requires a very restricted image or domain to be workable. To demonstrate that complex numbers as a field is inconsistent or wrong, you have to use addition and multiplication, nothing else. Any proof involving any additional functions will only invalidate that singular functions definition.

Always be careful with the domain of functions and how it changing changes the function, when they overlap and are expansions we mathematicians, out of sloth, omit it cause we assume it is salvagable by the knowledgable.

Peanos Axioms

Natural numbers are what most humans learn to count with in life and in mathematics, we axiomatizing it to create a proper firm foundation for it. In mathematics, Giuseppe Peano is the one credited to have come up with the modern axiomatization of natural numbers. What he wrote is however different from what we use today and I will go through the modern way of it and demonstrate how this axiomatization can be done by ZFC as well.

These are the axioms and we let $\mathbb{N}$ be the set we work with and call the set of natural numbers and the elements are called natural numbers.

1. $\exists 0\in\mathbb{N}$
2. $\exists \sigma:\mathbb{N}\hookrightarrow\mathbb{N}/\{0\}$
3. $\forall S\forall x((0\in S \land (x\in S\implies \sigma(x)\in S))\implies S=\mathbb{N})$

This is amongst the most succinct ways of writing peanos axioms, it can be read out as

1. There exists one element we call 0 in our natural numbers
2. We have a function $\sigma$ that is injective but not surjective to $\mathbb{N}$. We call this function the successor function.
3. The induction principle which is that if a subset of $\mathbb{N}$ contains 0 and all the successors of its elements, then it is equal to the natural numbers.

It should be noted that some authors choose to start at 1, I generally prefer starting at 0 cause it makes natural numbers into a semiring. We label the elements according to our intuitive feeling, we know that “after 0 comes 1”, and as such, we label $1=\sigma(0)$, we label $2=\sigma(1)=\sigma^2(0)$ and so on. Some might question my usage of $\sigma^2$, where I technically use 2 before I even define it, but again, this is using our intuitive understanding to facilitate how to convey what is meant, I could just as well write $\sigma(\sigma(0))$ or $\sigma\circ\sigma(0)$ to make it equal but for large amount of it, such that it would later require, it is cumbersome and entirely pointless so we may very well use that intuitive shorthand to make it easier for our human brains.

From these axioms we can show that the structure $(\mathbb{N},+,\cdot)$ has all the properties, if we define addition and multiplication as follows.

We define addition recursively as following

• $a+0=a$
• $a+\sigma(b)=\sigma(a+b)$

What this means is that we define it so that our object 0 is the neutral element for addition on the right side, notice we do not define it to be for the left side and as such it might not be, however we will see that it is commutative so it is for the left side as well. After that we say that the sum of an element and the successor of another element, is the successor of the sum of the elements.

Multiplication

For multiplication, we define it also recursively as

• $a\cdot 0 = 0$
• $a\cdot\sigma(b)=a+a\cdot b$

The recursive step of $a\cdot\sigma(b)=a+a\cdot b$ is the one that captures our intuitive feeling of what natural number multiplication is, repeated addition. It will turn any product we want into a long sequence of additions that terminates at 0 always. A quick proof we can see is that

$a\cdot\sigma(0)=a+a\cdot 0 = a+0=a$

That is, the successor of 0 is the identity of multiplication.

With this we can now show that for example we have the operation being associative, that is $a+(b+c)=(a+b)+c$. The sources below demonstrates how it is done and a crucial thing of importance is that they all rely on the induction principle. Going through it all

ZFC

Peanos axioms can be embedded into ZFC by using what it provides us, the definition of addition and multiplication gives functions for us based on the successor function so that is the one we only need to focus on, and the existence of a set that sates the desired qualities and luckily for us, this is not difficult.

We define the successor function as

$\sigma(x)=x\cup\{x\}$

For a set where this would work is provided by Axiom of Inifnity, namely

$\exists S\forall x (\emptyset\in S\land(x\in S\implies x\cup\{x\}\in S))$

This set, has the defined successor function built into it and gives us a natural zero element, namely the empty set there would be the element we label zero in peanos axioms. So if we let $\mathbb{N}=S$ and $0=\emptyset$ then we have $\sigma:S\hookrightarrow S/\{\emptyset\}$ and from there, the rest of it follows naturally from ZFC. This illustrates the power of ZFC, that given an axiomatization of natural numbers, we can use ZFC to construct it instead.

An important thing to pay attention to here is that we are not saying numbers are sets but that we can use sets in ZFC to form a structure that has the characteristics of what we expect the natural numbers to have.

Sources

Propositional Logic/Calculus as a formal language.

Last time I talked about Why we need formal languages and what are they? which laid the foundations for the need and a rudimentary understanding of them, this time I will focus on how we turn the idea of a formal language into proper logic and through it, formalizes our notation of deductive logic in a simple way that machines can handle. Many of the readers will be familiar with the concepts of truth value, but it is important to understand that as a formal language, Propositional Calculus do not deal with true or false value of anything, it is just a system to manipulate symbols that coincide with our intuitive notion of how truth value works.

Remember that we have two components to a formal language

1. Symbols
2. Rules of composition

In propositional logic we subdivide these further, our symbols is divided into

• Operators
• Variables

And for Rules of composition we do not subdivide it further, we rename it mostly into Well-formed formulas.

Ontop of it we add a system of inference consisting of

• Axioms
• Substitution
• Inference rules

Symbols

The collection of symbols can be said to be finite or infinite depending on how one chose it but in general, Operators are always finitely many and variables are infinitely many. Be careful now that while we call these groups of symbols for operators and variables, they are still just symbols and our choice of name is picked to fit how we decide to use them in rules of composition later. Generally we also include “(” and “)” as symbols to assist us to disambiguate things, but we equally create additional rules to minimize the usage of (). Classical example is $ab+c=a\cdot b+c=(a\cdot b)+c$.

To make variables finite we may add an operator symbol of ‘ and a single variable which we may denote A, and we implement a rule of that we can place as many ‘ we want after A, so we get A, A’, A”, A”’,…. so we get different variables through that. Of course that is very difficult to read and cumbersome for humans so in general, we use different, still capital, letters and assume that the variables are infinite but formally we can say that both sets are finite, equally do we use for ease of our reading and understanding subscripts of numbers to expand it further. From a formal point of view, some might complain on the fact I use the notion “set” here, which is something to be defined later, which might end up being circular. Formally however I have not used a set, I am using our human concept of how a set should behave to verbally communicate these things and as such, avoid having it being circular.

Operators tends to use special symbols for them and for the propositional logic of our choice, the standard one, we will only have two symbols, ¬ and ⇒. Once again we are picking these symbols because by the rules of composition they will behave in ways that coincide with our intuitive feeling of what those symbols usually represents, but without adhering to the concept of “truth” which is a slippery concept. An important property for each operator is that we assign them an arity, which is a natural number and can be any. For the general case of an operator we use the letter O for it and superscript the arity, so $\text{O}^n$ is a general operator of arity n. For ¬ we give the arity of 1, and for ⇒ we give the arity of 2.

Again here, one might complain that there is a level of circularity, if we use formal language to define things in mathematics and we then use it to define numbers, but use numbers to define a property in a formal language? This is again not circular just because we are not using the properties of numbers, but using them to intuitively describe a property of things here that merely tells us the number of variables an operator accepts. This can be suspended entirely and give instead each operator a rule of composition where we have to use substitution to go around. This is of course again a lot of extra needless work so we use this intuitive approach for convenience.

Well-formed formulas

For the well-formed formulas, we build them inductively like this.

1. Atomic formulas, just variables, are wff’s
2. For any given operator $\text{O}^n$ and wff’s, $w_1,w_2,\ldots,w_n$, we have that $\text{O}^n(w_1,w_2,\ldots,w_n)$ is a wff.

On #2 I add to the reader that we are using a lot of shorthand and conveniences for us humans there. If one is very strict the ellipses should be replaced by the appropriate string and so would one in most instances. However this is needlessly complicated and cumbersome to read so we suspend doing it like that for the convenience of readability as we can easily understand what it means. Similarly for human reading convenience, if an operator have arity of two, we may do infix notation so instead of $*(P,Q)$ we may very well write $P*Q$, and similarly for unary operators we may use prefix or suffix writing instead as we see fit.

System of inference

This part is what separates the logic systems from many other formal languages, it allows us to create theorems where we can emulate how we humans reason, that given certain things we conclude another one. System of inference consists of the 3 parts of Axiom, Substitution and Inference Rules.

Axioms is simply then a collection of wff’s that are declared to be within the language and can be of any kind. In the standard one we have these axioms

• $P \implies (Q\implies P)$
• $(P \implies (Q\implies R))\implies ((P\implies Q)\implies(P\implies R))$
• $(\neg Q \implies \neg P)\implies (P\implies Q)$

As with all axioms, these are merely given and said to be valid. However for any potential variable, we would need to rewrite these and that would require a lot of work, unless we want it to be extremely specific but we want it to be valid always, that is where substitution comes from.

Substitution in a wff is when when we replace one variable with another. To define it more rigorously we do it recursively. We write w[u/P] to say that in the wff w, we replace all P with u, where u is a wff. Notice that u has to be a valid wff  for the substitution to give a valid wff.

1. if w is atomic then
1. if w = P, then w[u/P]=u
2. if w≠P, then w[u/P]=w
2. if $w=O^n(w_1,\ldots,w_n)$ then $w[u/P]=O^n(w_1[u/P],\ldots,w_n[u/P])$

With this we can have our set of axioms being quite small, relatively speaking, but still derive many more valid ones that we can use for other purposes, it primarily makes it more concise to deal with our language without having to deal with enormous sets of axioms.

Inference rules is when we use the method of substitution and pre-existing valid, including the axioms, wff’s to derive new ones. These are rules that usually come in the form of $w_1,w_2,\ldots,w_n\vdash u$, which is usually read as “Given w1,w2,…,wn infer u”. Important to note is that here, we extend our substitution slightly by having

$(w_1,w_2,\ldots,w_n\vdash u)[r/P]=(w_1[r/P],\ldots,w_n[r/P]\vdash u[r/P])$

which says that in an inference rule, if we substitute equally on all wff’s involved, the inference rule is valid.

The ordinary standard logic uses one inference rule, namely

$A,A\implies B \vdash B$

That is all it uses. When we can show that given $w_1,\ldots, w_n$ that one infers $W$, then we say that $w_1,\ldots, w_n \vdash W$ is a theorem of our language. What it means to be a theorem is that if we are given $w_1,\ldots, w_n$ and have the theorem $w_1,\ldots, w_n \vdash W$, then $w_1,\ldots, w_n, W$ are available to be used in to show another theorem. In other words a theorem in propositional logic makes additional wff available for us to use in future theorems.

A simple example is if a theorem is $A \implies (\neg B \land Q)$, then if  we are given A in a theorem,$A , P \vdash W$, then it is equivalent to $A , P, \neg B \land Q \vdash W$. In a sense one can view it as theorems assist making the necessary parts of other theorems shorter. The proof of a theorem is a sequence of WFFs, where the first ones are the ones given in the theorem, the ones after are the WFFs we infer based on the inference rules, and the last one is the one that the theorem says we infer. An example is for the theorem $A,B,A\implies(B\implies C)\vdash C$

1. A
2. B
3. $A\implies(B\implies C)$
4. $B\implies C$, MP[$B\implies C$/B](1,3)
5. C, MP[C/B][B/A](2,4)

We say two theorems are circular if both are used in each others proof, this may be hidden through various means.

To summerize ordinary propositional calculus and it’s components

• Operators:
• $\neg^1$, written as prefix
• $\implies^2$, written as infix
• Axioms:
1. $P \implies (Q\implies P)$
2. $(P \implies (Q\implies R))\implies ((P\implies Q)\implies(P\implies R))$
3. $(\neg Q \implies \neg P)\implies (P\implies Q)$
• Inference rules:
• $A,A\implies B \vdash B$, called Modus Ponens

All else is the usual for propositional calculus. In the next post we will see that this system will coincide with our regular notion of logic and reasoning while being entirely mechanical in nature.

Why we need formal languages and what they are

Many may consider one of the primary distinguishing features that makes humanity unique within the kingdom animalia is our use of language. It’s capability to offer an infinite variety with a finite set of sounds and words makes it quite the marvel from evolution and human capabilities.

However, within it and us there are two conflicting goals that cannot be reconciled.

1. Precision of language, where we want to supply precise, exact, unambiguous information to our listener.
2. Economy of language, where we want to minimize the number of syllables, sounds and everything while providing information.

To have complete one means sacrificing the other completely, we humans make a compromise where we take a bit of both because in most instances when ambiguity is introduced, the context itself is enough to salvage the precise meaning. The few instances when it is not, and we made the erroneous assumption that it is enough, the other bloke will ask a clarification question, answer is given and the conversation continuous on.

In normal speech it is adequate for virtually all cases, in text however, we tend to lean more toward Precision of language. This is because, while there is a certain level of context given by the text, it is not nearly as much as if we had been discussing with each other face to face but more importantly, when an ambiguity arises there is no way for the reader to ask for clarification and they are stuck attempting to work through multiple interpretation of what the author might mean and hope they find the correct one.

And still, no matter how much one might try, the precision is never enough to eliminate this issue because a text that was that precise in normal natural language would be unwieldy,difficult to read and quite frankly, it’d be exceptionally boring to read cause it would not feel like the product of a human being as it goes against everything we know how a human would express themselves even in the most formal manner. The way natural language works is so ingrained into our minds that we essentially require it to express ourselves and be comfortable.

In most cases, this ambiguity is not an issue, after all the text written by ancient Greeks were highly ambiguous, much more than anything we do today and took five to ten times reading a text before understanding it. However, when tiny nuance changes have fundamental effects on the meaning and further processes, this miniscule ambiguity will cause issues. In mathematics this the exceptionally important due to subclauses. A classical example is

One morning I shot an elephant in my pyjamas

Of course by context of knowing what an elephant is, we know it is not the elephant wearing my pyjamas, but if you are not familiar with what an elephant is, the pyjama might very well be on the elephant cause there is no reason why it should not be able to wear it when you do not know what it refers to. In mathematics, we often use the same type of objects within the same discussion so this kind of subclauses is highly ambiguous, but again if we do not use them in a natural manner, it will sound like it is not human and feel unreadable.

This is what formal languages exists to solve. Formal languages are not languages per say because they contain nothing we humans would recognise as a language, no sounds, nothing to speak or the likes and they lack semantic meaning in them. In ordinary language, words meaning is dependent on each other in a circular sense and meaning is assigned through usage but a formal language lacks all forms of meaning as we recognise it.

Why they are called a language though is that it shares some properties with a language.

• Collection of symbols – This can be said to be equivalent to a natural languages word but unlike a natural language, these have no meaning and are just symbols which can be represented as symbols on a paper, binary sequences, strings of sound or whatnot.
• Rules of composition – A sequence of symbols may not be of any kind, most sequences of symbols are forbidden and there are only some that is allowed. This can be said to be equivalent to natural language of grammar. Unlike a natural language, cause it assigns no meaning, any string that does not violate these rules is valid. These rules are strict and clear and all sequences can be determined to be valid or not.

The last one is important because in English we have for example

Colourless green ideas sleep furiously.

which is a legitimate string under English grammar and words, Rules of composition and Finite set of symbols, but anyone knowing English would reject this sentence outright because it is semantically non-sense. This restriction might be seen and considered a rule to add into our Rules of Composition, but because it is semantically grounded, this rule cannot be precise. As speakers of English, or any language with many dialects, can affirm, what is semantic non-sense in one dialect might be semantically valid in another, despite the grammatical rules remaining identical, so we get ambiguity in the rules as everyone uses different rules of composition.

One important thing to notice with all of this is that, a formal language is ultimately just symbols with some rules on how to put them together. We humans however still utilizes natural languages and as such, we often assign words and ways to say things in the formal language. That is however not a property OF the formal language, but a necessity of us humans so we can EXPRESS a formal language in a way that is befitting of us. Most formal languages are so rigid in rules that even a computer is capable to determine its validity. We humans might use words that are ambiguous to represent the symbols and order and everything within, but these words we choose to use and their inherent meaning have no affect or impact on the formal language. We do often choose words to fit the general property of the symbol within the formal language and how we would call it in our natural language, but that choice is ultimately entirely arbitrary.

More advanced formal languages add additional rules and properties to the language but those two properties are fundamentally what defines them.