Graduate Studies

I want to thank all my readers for visiting Math ∩ Programming as often as you do, and doubly thank those who are kind enough to leave a comment.

UIC's east side campus. My office building is unseen, to the left of this picture.

Unfortunately over the next few weeks I may not have time to do work as much on this blog as I have in the past two months. After driving 3,000 miles across the U.S. (detouring to Seattle, WA), I’ve arrived in wonderful Chicago, and I’m presently going through TA and graduate student orientation at the University of Illinois at Chicago. I just took my master’s examination, and I believe I did well on it. On Monday, I start with a full load of five courses with six hours a week of teaching Calculus.

For those interested, my classes include Measure Theory, Algebraic Topology, Probability Theory, and A Second Course in Algebra, the last of which puts me on course for my first preliminary examination (I believe my second will be in logic).

Alongside the pure mathematics classes I plan to take a number of computer science courses which are very relevant to this blog. Some of these include supercomputing, codes and cryptography, computational geometry, computer algorithms 2, error-correcting codes, computational complexity, mathematical theory of artificial intelligence, analytic symbolic computation, mathematical theory of databases, and others.

Each of these courses offers a boat-load of good content for this blog, and I plan to revel in the knowledge (by blogging about it and making awesome animations!). So while the next few weeks might have a content “slump” (I still want to add to the proof gallery, a much less intensive task than the main content posts), the best is surely yet to come.

So thanks again for reading. Until next time!

About these ads

The Square Root of 2 is Irrational (Geometric Proof)

Problem: Show that \sqrt{2} is an irrational number (can’t be expressed as a fraction of integers).

Solution: Suppose to the contrary that \sqrt{2} = a/b for integers a,b, and that this representation is fully reduced, so that \textup{gcd}(a,b) = 1. Consider the isosceles right triangle with side length b and hypotenuse length a, as in the picture on the left. Indeed, by the Pythagorean theorem, the length of the hypotenuse is \sqrt{b^2 + b^2} = b \sqrt{2} = a, since \sqrt{2} = a/b.

 

Swinging a b-leg to the hypotenuse, as shown, we see that the hypotenuse can be split into parts b, a-b, and hence a-b is an integer. Call the point where the b and a-b parts meet P. If we extend a perpendicular line from P to the other leg, as shown, we get a second, smaller isosceles right triangle. Since the segments PQ and QR are symmetrically aligned (they are tangents to the same circle from the same point), they too have length equal to a-b. Finally, we may write the hypotenuse of the smaller triangle as b-(a-b) = 2b-a, which is also an integer.

So the lengths of the sides of the smaller triangle are integers, but by triangle similarity, the hypotenuse to side-length ratios are equal: \sqrt{2} = a/b = (2b-a)/(a-b), and obviously from the picture the latter numerator and denominator are smaller numbers. Hence, a/b was not in lowest terms, a contradiction. This implies that \sqrt{2} cannot be rational. \square

This proof is a prime example of the cooperation of two different fields of mathematics. We just translated a purely number-theoretical problem into a problem about triangle similarity, and used our result there to solve our original problem. This technique is widely used all over higher-level mathematics, even between things as seemingly unrelated as topological curves and groups. Finally, we leave it as an exercise to the reader to extend this proof to a proof that whenever k is not a perfect square, then \sqrt{k} is irrational. The proof is quite similar, but strays from nice isosceles right triangles

The Perceptron, and All the Things it Can’t Perceive

This post assumes some basic familiarity with Euclidean geometry and linear algebra. Though we do not assume so much knowledge as is contained in our primer on inner product spaces, we will be working with the real Euclidean inner product. For the purpose of this post, it suffices to know about the “dot product” of two vectors.

The General Problem

One of the main problems in machine learning is to classify data. Rigorously, we have a set of points called a training set X = \left \{ \mathbf{p_i} \right \} \subset \mathbb{R}^n generated by some unknown distribution D. These points are often numerical representations of real-world data. For instance, one dimension could represent age, while another could represent cholesterol level. Then each point \mathbf{p_i} would be a tuple containing the relevant numerical data for a patient. Additionally, each point has an associated label l_i = \pm 1, which represents the class that piece of data belongs to. Continuing our cholesterol example, the labels here could represent whether the patient has heart disease. For now, we stick to two classes, though iterated techniques extend any binary classification method to a multiple-class problem.

Given such a training set, we seek a decision function f : \mathbb{R}^n \to \left \{ \pm 1 \right \} which is consistent with our training set (i.e. f(\mathbf{p_i}) = l_i for each i) and generalizes well to all data generated by D (i.e. new data points not part of the training set). We want our decision function to treat future heart-disease patients as well as correctly classify those from the past.

With no restrictions on f, one would imagine such a function is wild! In fact, if the distribution D has no patterns in it (e.g. random noise), then no such f exists. However, for many distributions one can discover functions which are good approximations. They don’t even have to be consistent with the training set, as long as they work reliably in general.

Arguably the simplest non-trivial example of such a decision function is a line in the plane which separates a training set X \subset \mathbb{R}^2 into two pieces, one for each class label. This is precisely what the perceptron model does, and it generalizes to separating hyperplanes in \mathbb{R}^n.

The Dawn of Machine-Kind: the Perceptron

“Now, consider the following: you were admitted to this robot asylum. Therefore, you must be a robot. Diagnosis complete.” ― Dr. Perceptron to Fry, Futurama.

The very first algorithm for classification was invented in 1957 by Frank Rosenblatt, and is called the perceptron. The perceptron is a type of artificial neural network, which is a mathematical object argued to be a simplification of the human brain. While at first the model was imagined to have powerful capabilities, after some scrutiny it has been proven to be rather weak by itself. We will formulate it in its entirety here.

Most readers will be familiar with the equation of a line. For instance, y = -2x is a popular line. We rewrite this in an interesting way that generalizes to higher dimensions.

First, rewrite the equation in normal form: 2x + y = 0. Then, we notice that the coefficients of the two variables form a vector (2,1), so we may rewrite it with linear algebra as \left \langle (2,1),(x,y) \right \rangle = 0, where the angle bracket notation represents the standard Euclidean inner product, also known as the dot product. We note that by manipulating the values of the coefficient vector, we can get all possible lines that pass through the origin.

We may extend this to all lines that pass through any point by introducing a bias weight, which is a constant term added on which shifts the line. In this case, we might use -1 to get \left \langle (1,2),(x,y) \right \rangle - 1 = 0. The term “bias” is standard in the neural network literature. It might be better to simply call it the “constant weight,” but alas, we will follow the crowd.

We give the coefficient vector and the bias special variable names, \mathbf{w} (for the common alternative name weight vector) and b, respectively. Finally, the vector of variables (x,y) will be henceforth denoted \mathbf{x} = (x_1, x_2, \dots , x_n). With these new naming conventions, we rewrite the line equation (one final time) as \left \langle \mathbf{w}, \mathbf{x} \right \rangle + b = 0.

Now we let the dimension of \mathbb{R}^n vary. In three dimensions, our \left \langle \mathbf{w}, \mathbf{x} \right \rangle + b = 0 becomes w_1x_1 + w_2x_2 + w_3x_3 + b = 0, the equation of a plane. As n increases, the number of variables does as well, making our “line” equation generalize to a hyperplane, which in the parlance of geometry is just an affine subspace of dimension n-1. In any case, it suffices to picture a line in the plane, or a plane in three dimensions; the computations there are identical to an arbitrary \mathbb{R}^n.

The usefulness of this rewritten form is in its evaluation of points not on the hyperplane. In particular, we define f : \mathbb{R}^n \to \mathbb{R} by f(\mathbf{x}) = \left \langle \mathbf{w}, \mathbf{x} \right \rangle + b. Then taking two points \mathbf{x,y} on opposite sides of the hyperplane, we get that f(\mathbf{x}) < 0 < f(\mathbf{y}) or f(\mathbf{y}) < 0 < f(\mathbf{x}).

So if we can find the right coefficient vector and bias weight, such that the resulting hyperplane separates the points in our training set, then our decision function could just be which side of the line they fall on, i.e. \textup{sign}(f(\mathbf{x})). For instance, here is a bunch of randomly generated data in the unit square in \mathbb{R}^2, and a separating hyperplane (here, a line).

A line separating the blue data points (-1) from the red data points (+1). The diagonal boundary of the blue shaded area is the separating line, and the blue shaded area represents the set of points the corresponding classification function would deem to be in the red class (+1).

The blue region is the region within the unit square where f(\mathbf{x}) \geq 0, and so it includes the decision boundary. If we receive any new points, we could easily classify them by which side of this hyperplane they fall on.

Before we stop to think about whether such a hyperplane exists for every data set (dun dun dun!) let’s go ahead and construct an algorithm to find it. There is a very straightforward way to proceed: as long as the separating hyperplane makes mistakes, update the coefficient vector and bias to remove a mistake. In order to converge, we simply need that the amount by which we push the coefficient vector is small enough. Here’s a bit of pseudocode implementing that idea:

hyperplane = [0, 0, ..., 0], bias = 0
while some misclassification is made in the training set:
   for each training example (point, label):
      if label * (<hyperplane, point> + bias) <= 0:
         hyperplane += label * point
         bias += label

Upon encountering an error, we simply push the coefficients and bias in the direction of the failing point. Of course, the convergence of such an algorithm must be proven, but it is beyond the scope of this post to do so. The interested reader should see these “do-it-yourself” notes. The proof basically boils down to shuffling around inequalities, squeezing things, and applying the Cauchy-Schwarz inequality. The result is that the number of mistakes made by the algorithm before it converges is directly proportional to the volume enclosed by the training examples, and inversely proportional to the smallest distance from the separating hyperplane to any training point.

Implementation (and Pictures!)

As usual, we implemented this algorithm in Mathematica. And albeit with a number of helper functions for managing training sets in a readable way, and a translation of the pseudocode algorithm into functional programming, the core of the implementation is about as many lines as the pseudocode above. Of course, we include the entire notebook on this blog’s Github page. Here are our results:

We generate a set of 25 data points in the unit square, with points above the line y = 1-x in one class, and those below in the other. This animation shows the updates to the hyperplane at every step of the inner loop, stopping when it finds a separating line.

First, we note that this algorithm does not find “the best” separating line by any means. By “the best,” consider the following three pictures:

Clearly, the third picture achieves the “best” separation, because it has a larger distance from the training points than the other two. In other words, if we compute the training set margin \gamma = \textup{min}(f(\mathbf{p_i})), we claim a large \gamma implies a better separation. While the original perceptron algorithm presented here does not achieve a particularly small \gamma in general, we will soon (in a future post) modify it to always achieve the maximum margin among all separating hyperplanes. This will turn out to take a bit more work, because it becomes a convex quadratic optimization problem. In other words, finding the best separating hyperplane is conceptually and computationally more difficult than finding any separating hyperplane.

Finally, we test its generalization on a new set of 1000 generated points. We see that even with as few as 25 training samples, we get an accuracy of about 92 percent!

92.2% generalization accuracy. Not too shabby!

Here the blue region is the region of generated data in class +1, the red region (small sliver in the lower right corner) is the region that the perceptron falsely claims is in class +1, while the purple area is the overlap of the perceptron’s perceived +1 region and the true +1 region.

For a few measly lines of pseudocode, this algorithm packs a punch!

The Problem With Lines

As eagerly as we’d like to apply the perceptron algorithm to solve the problems of the world, we must take a step back and analyze the acceptable problems. In particular (and though this sounds obvious), the perceptron can only find hyperplanes separating things which can be separated by hyperplanes! We call such a training set linearly separable, and we admit that not every training set is so.

The smallest possible such example is three points on a line, where one point in class +1 separates two points in class -1. Historically, however, the first confounding counterexample presented was exclusive “or” function. Specifically, we have four points of the unit square arranged as follows:

(0,0), (1,1) -> +1
(1,0), (0,1) -> -1

The reader can verify that the perceptron loops infinitely on either of the given training sets, oscillating between the same hyperplanes over and over again.

Even though we may not have a linearly separable training set, we could still approximate a separation boundary with a hyperplane. Thus, we’d want to minimize the number of misclassifications of a hyperplane with respect to that training set. Amazingly, this problem in NP-complete. In other words, it is widely believed that the problem can’t be solved in polynomial time, so we can forget about finding a useful algorithm that works on large data sets. For a more complete discussion of NP-completeness, see this blog’s primer on the subject.

The need for linearly separable training data sets is a crippling problem for the perceptron. Most real-world distributions tend to be non-linear, and so anything which cannot deal with them is effectively a mathematical curiosity. In fact, for about twenty years after this flaw was discovered, the world lost interest in neural networks entirely. In our future posts, we will investigate the various ways researchers overcame this. But for now, we look at alternate forms of the perceptron algorithm.

The Dual Representation

We first note that the initial coefficient vector and bias weight for the perceptron were zero. At each step, we added or subtracted the training points to the coefficient vector. In particular, our final coefficient vector was simply a linear combination of the training points:

\displaystyle \mathbf{w} = \sum \limits_{i=1}^k \alpha_i l_i \mathbf{p_i}

Here the \alpha_i are directly proportional (in general) to the number of times \mathbf{p_i} was misclassified by the algorithm. In other words, points which cause fewer mistakes, or those which are “easy” to classify, have smaller \alpha_i. Yet another view is that the points with higher \alpha_i have greater information content; we can learn more about our distribution by studying them.

We may think of the \mathbf{\alpha} vector as a dual system of coordinates by which we may represent our hyperplane. Instead of this vector having the dimension of the Euclidean space we’re working in, it has dimension equal to the number of training examples. For problems in very high dimension (or perhaps infinite dimensions), this shift in perspective is the only way one can make any computational progress. Indeed, the dual problem is the crux of such methods like the so-called support vector machines.

Furthermore, once we realize that the hyperplane’s coordinate vector can be written in terms of the training points, we may rewrite our decision function as follows:

\displaystyle f(\mathbf{x}) = \textup{sign} \left ( \left \langle \sum \limits_{i=1}^k \alpha_i l_i \mathbf{p_i}, \mathbf{x} \right \rangle + b \right )

By the linearity of an inner product, this becomes

\displaystyle f(\mathbf{x}) = \textup{sign} \left ( \sum \limits_{i=1}^k \alpha_i l_i \left \langle \mathbf{p_i}, \mathbf{x} \right \rangle + b \right )

And the perceptron algorithm follows suit:

alpha = [0, 0, ..., 0], b = 0
while some misclassification is made in the training set:
   for each training example (i, point, label):
      if f(point, label) <= 0:
         alpha[i] += 1
         b += label

So in addition to finding a separating hyperplane (should one exist), we have a method for describing information content. Since we did not implement this particular version of the perceptron algorithm in our Mathematica notebook, we challenge the reader to do so. Once again, the code is available on this blog’s Github page, and feel free to leave a comment with your implementation.

Until next time!

A Dash of Python

We will orient our dash of Python around the first and simplest problem from ProjectEuler.net.

Installing Python

To get Python on your computer, go to python’s website  and follow the instructions for downloading and installing the interpreter. Most Window’s users can simply click here to download an installer, Mac OS 10.6 – 10.7 users can click here to get their installer, and linux users can (and should) fend for themselves.

For non-terminal buffs, you can use Python’s official text editor, called IDLE, for editing and running Python programs. It comes packaged with the download links above. For terminal junkies, or people wishing to learn the terminal, you can use your favorite terminal editor (nano for beginners, vim or emacs for serious coders) to write the code, then type

$> python source.py

at a command prompt to run the code, which we assume here is called “source.py” and is in the present working directory. To learn more about the Python interpreter, see the official Python tutorial. On to the puzzle.

Python

Problem: Find the sum of all positive multiples of 3 or 5 below 1000.

To spoil the fun and allow the reader to verify correctness, the answer is 233168. Of course, this problem can be solved quite easily by using nice formulas for arithmetic progressions, but for the sake of learning Python we will do it the programmer’s way.

The first step in solving this problem is to figure out how we want to represent our data. Like most languages, Python has a built-in type for integers. A built-in type is simply a type of data that is natively supported by a language, be it a number, a character of text, a list of things, or a more abstract type like, say, a shopping cart (not likely, but possible). So one program we might start with is

3+5+6+9

which is also known as the sum of all multiples of 3 or 5 less than 10. This evaluates to 23, and we may pat ourselves on the back for a successful first program! Python understands most simple arithmetic expressions we’d hope to use, and a complete list is very googleable.

Unfortunately to type out all such numbers up to 1000 is idiotic. The whole point of a computer is that it does the hard work for us! So we identify a few key goals:

  • We want a test for “divisible by n”
  • We want to be able to keep track of those numbers which have the desired divisibility property
  • We want to apply our test to all numbers less than 1000 in a non-repetitive way

The first is quite easy. Looking at our list of operators, we find the “remainder” operator %. In particular, “a % b”, when performed on two integers a,b, gives a \mod b, the remainder when a is divided by b. In particular, if “a % 3″ is zero, then a is divisible by 3, and similarly for 5. Again looking at our list of operators, we have an == equality operator (we will see plain old = later for variable assignment) and an “or” boolean or operator. So our test for divisibility by 3 or 5 is simply

x % 3 == 0 or x % 5 == 0

Typing this into the python interpreter (with “x” replaced with an actual number) will evaluate to either “True” or “False”, the built-in types for truth and falsity. We will use the result of this test shortly.

Once we find a number that we want to use, we should be able to save it for later. For this we use variables. A variable is exactly what it is in mathematics: a placeholder for some value. It differs from mathematics in that the value of the variable is always known at any instant in time. A variable can be named anything you want, as long as it fits with a few rules. To assign a value to a variable, we use the = operator, and then we may use them later. For example,

x = 33
x % 3 == 0 or x % 5 == 0

This program evaluates to True, but it brings up one interesting issue. Instead of a single expression, here we have a sequence of expressions on different lines, and as the program executes it keeps track of the contents of all the variables. For now this is a simple idea to swallow, but later we will see that it has a lot of important implications.

Once we can test whether a number is divisible by stuff, we want to do something special when that test results in True. We need a new language form called an “if statement.” It is again intuitive, but it has one caveat that is unique to Python. An if statement has the form:

if test1:
   body1
elif test2:
   body2
... elif testK:
   bodyK
else: 
   bodyElse

The “test#” expressions evaluate to either True or False. The entire block evaluates in logical order: if “test1″ evaluates to true, evaluate the sequence of statements in “body1,” and ignore the rest; if “test1″ evaluates to false, do the same for “test2″ and “body2″. Continuing in this fashion, if all tests fail, execute the sequence of statements in the “bodyElse” block.

The quirk is that whitespace matters here, and Python will growl at the programmer if he doesn’t have consistent spacing. Specifically, the body of any if statement (a sequence of statements in itself) must be indented consistently for each line. Any nested if statements within that must be further indented, and so on. An indentation can be a tab, or a fixed number of spaces. As long as each line in an indented block follows the same indentation rule, it doesn’t matter how far the indentation is. We use three spaces from here on. This unambiguously denotes which lines are to be evaluated as the body of an if statement. While it may seem confusing at first, indenting blocks of code will soon become second nature, and it is necessary for rigor.

For instance, the following tricky program was written by a sloppy coder, and it will confuse the Python interpreter (and other coders).

x = 1

if False:
   y = 4
  x = x + 1

x

If one tries to run this code, Python will spit out an error like:

IndentationError: unindent does not match any outer
 indentation level

In other words, Python doesn’t know whether the programmer meant to put “x = x + 1″ into the body of the if statement or not. Rather than silently resolve the problem by picking one, Python refuses to continue. For beginners, this will save many headaches, and good programmers almost always have consistent indenting rules as a matter of tidiness and style.

Combining variables, our divisibility test, and if statements, we can construct a near-solution to our problem:

theSum = 0
numberToCheck = 1

if numberToCheck % 3 == 0 or numberToCheck % 5 == 0:
    theSum = theSum + numberToCheck

numberToCheck = 2

if numberToCheck % 3 == 0 or numberToCheck % 5 == 0:
    theSum = theSum + numberToCheck

numberToCheck = 3
...

While this works, and we could type this block of code once for each of the 1000 numbers, this again is a colossal waste of our time! Certainly there must be a way to not repeat all this code while searching for divisible numbers. For this we look to loops. Loops do exactly what we want: alter a small piece of a block of code that we want to run many times.

The simplest kind of loop is a while loop. It has the form:

while testIsTrue:
   body

The while loop is almost self-explanatory: check if the test is true, evaluate the body if it is, repeat. We just need to make sure that the test eventually becomes false when we’re done, so that the loop terminates.

Incorporating this into our problem, we have the following program

theSum = 0
numberToCheck = 1

while numberToCheck < 1000:
   if numberToCheck % 3 == 0 or numberToCheck % 5 == 0:
      theSum = theSum + numberToCheck

   numberToCheck = numberToCheck + 1

print(theSum)

Notice that the indentation makes it clear when the nested if body ends, and when the while body itself ends.

We add the extra “print” statement to allow the user to see the result of the computation. The “print” function has quite a lot of detail associated with how to use it, but in its simplest it prints out the value of its argument on a line by itself.

Running this code, we see that we get the correct value, and we applaud ourselves for a great second program.

Diving Deeper for Lists

While we could end here, we want to give the reader a taste for what else Python can do. For instance, what if we wanted to do something with the numbers we found instead of just adding them up? What if we, say, wanted to find the median value or the average value of the numbers? Of course, this has no apparent use, but we still want to know how to do something besides adding.

What we really want to do is save all the numbers for later computation. To do this, we investigate Python’s native list type. Here are some examples of explicitly constructed lists:

list1 = []
list2 = [2,3,4,5]
list3 = range(1,1000)
list4 = ['a', 7, "hello!", [1,2,3,4]]

Obviously, Python’s lists are comma-separated lists of things enclosed in square brackets. The things inside the list need not be homogeneous, and can even include other lists. Finally, the “range(a,b)” function gives a list with the integers contained in the interval [a,b), where a,b are integers. One must be slightly careful in naming lists, because the “list” token is a reserved keyword in Python. It may not be used to name any variable.

Lists have quite amazing functionality; they are a more powerful sort of built-in type than integers. Specifically, lists have a whole bunch of named operations, which we call methods. Instead of invoking these operations with a familiar infix symbol like +, we do so with the dot operator, and then the name of the method. For instance, the following program appends the number 5 to the list [1,2,3]:

list1 = [1,2,3]
list1.append(5)

The “append” method is a native part of all Python lists, and we apply it to its argument with the function notation. In other words, this code might in some other language look like “append(list1, 5)”,  but since we recognize that the “list1″ object is the “owner” of the append operation (you can only append something to a list), it deserves a special place to the left of the operation name.

Applying this new data type to our original program, we get the following code:

goodNumbers = []
numberToCheck = 1

while numberToCheck < 1000:
   if numberToCheck % 3 == 0 or numberToCheck % 5 == 0:
      goodNumbers.append(numberToCheck)

   numberToCheck = numberToCheck + 1

print(sum(goodNumbers))

The “sum” function is a special function in Python (a method of no object, so we call it global) which sums the elements in a list, provided that addition is defined for those objects (if one continues her Python education past this blog entry, she will see such odd uses of the symbol +).

Now, to get back at the values in the list, we use the index operation, which has the syntax:

list1[index]

and returns the element of “list1″ at the specified index. In Python, as with most programming languages, lists are indexed from zero, so the first element of a list is “list1[0]“. To find the median value of our list of goodNumbers, we could use the following code:

length = len(goodNumbers)

if length % 2 == 0:
   twoMids = (goodNumbers[length/2] + goodNumbers[length/2 - 1])
   median = twoMids / 2.0
else:
   median = goodNumbers[length/2]

print(median)

The global “len” function gives the length of a number of different objects (lists, strings, etc). A good exercise for the reader is to walk through the above code line by line to understand why it works. Run it on lists of a number of different sizes, not just the results of the Project Euler problem. Better yet, try to break it! Even this simple program has a small bug in it, if we use the appropriate list instead of “goodNumbers”. If one finds the bug, he will immediately be prompted with what to do in case such input shows up. Should one warn the user or return some placeholder value? This question is a common one in computer science and software engineering, and companies like Microsoft and Google give it quite a lot of thought.

Finally, one may want to “save” this bit of code for later. As we already implied, we may use this code to find the median of any list of numbers, given the appropriate input. Mathematically, we want to define the “median” function. The idea of a function is a powerful one both in mathematics and in programming, so naturally there is a language form for it:

def functionName(args...):
   body
   return(returnValue)

A few notes: the entire body of a function definition needs to be indented appropriately; the definition of a function must come before the function is ever used; and not every function requires a return statement.

The easiest way to learn functions is to convert our median-finding code into a median function. Here it is:

def median(myList):
   length = len(myList)
   
   if length % 2 == 0:
      twoMids = (myList[length/2] + myList[length/2 - 1])
      medianValue = twoMids / 2.0
   else:
      medianValue = myList[length/2]

   return(medianValue)

And now, to use this function, we do what we expect:

print(median([1,2,3,4,5]))
print(median(range(1,1000)))
print(median(goodNumbers))

In the same way that we used loops to reuse small bits of code that had slight changes, we here reuse longer bits of code that have more complicated changes! Functions are extremely useful both for readability and extensibility, and we will revisit them again and again in future.

So here we’ve covered a few basic constructs in Python: numbers, conditional executions with if statements, looping with whiles, the basics of list manipulation, and simple function definition. After this small bit of feet-wetting, the reader should be capable to pick up (and not be bewildered by) a beginner’s book on programming in Python. Here are a couple such free online sources:

Feel free to ask me any Python questions here, and I’ll do my best to answer them. Until next time!