*For those who aren’t regular readers: as a followup to this post, there are four posts detailing the basic four methods of proof, with intentions to detail some more advanced proof techniques in the future. You can find them on this blog’s primers page.*

## Do you really want to get better at mathematics?

Remember when you first learned how to program? I do. I spent two years experimenting with Java programs on my own in high school. Those two years collectively contain the worst and most embarrassing code I have ever written. My programs absolutely reeked of programming no-nos. Hundred-line functions and even thousand-line classes, magic numbers, unreachable blocks of code, ridiculous code comments, a complete disregard for sensible object orientation, negligence of nearly all logic, and type-coercion that would make your skin crawl. I committed every naive mistake in the book, and for all my obvious shortcomings I considered myself a hot-shot programmer! At least I was learning a lot, and I was a hot-shot programmer in a crowd of high-school students interested in game programming.

Even after my first exposure and my commitment to get a programming degree in college, it was another year before I knew what a stack frame or a register was, two before I was anywhere near competent with a terminal, three before I learned to appreciate functional programming, and to this day I *still *have an irrational fear of networking and systems programming (the first time I manually edited the call stack I couldn’t stop shivering with apprehension and disgust at what I was doing).

In a class on C++ programming I was programming a Checkers game, and my task at the moment was to generate a list of all possible jump-moves that could be made on a given board. This naturally involved a depth-first search and a couple of recursive function calls, and once I had something I was pleased with, I compiled it and ran it on my first non-trivial example. Low and behold (even having followed test-driven development!), I was hit hard in the face by a segmentation fault. It took hundreds of test cases and more than twenty hours of confusion before I found the error: I was passing a reference when I should have been passing a pointer. This was not a bug in syntax or semantics (I understood pointers and references well enough) but a *design* error. And the aggravating part, as most programmers know, was that the fix required the change of about 4 characters. Twenty hours of work for four characters! Once I begrudgingly verified it worked (of course it worked, it was so obvious in hindsight), I promptly took the rest of the day off to play Starcraft.

Of course, as every code-savvy reader will agree, all of this drama is part of the process of becoming and strong programmer. One must study the topics incrementally, make plentiful mistakes and learn from them, and spend uncountably many hours in a state of stuporous befuddlement before one can be considered an experienced coder. This gives rise to all sorts of programmer culture, unix jokes, and reverence for the masters of C that make the programming community so lovely to be a part of. It’s like a secret club where you know all the handshakes. And should you forget one, a crafty use of awk and sed will suffice.

Now imagine someone comes along and says,

“I’m really interested in learning to code, but I don’t plan to write any programs and I absolutely

abhortracing program execution. I just want to use applications that others have written, like Chrome and iTunes.”

You would laugh at them! And the first thing that would pass through your mind is either, “This person would give up programming after the first twenty minutes,” or “I would be doing the world a favor by preventing this person from ever writing a program. This person belongs in some other profession.” This lies in stark opposition to the common chorus that everyone should learn programming. After all, it’s a constructive way to think about problem solving and a highly employable skill. In today’s increasingly technological world, it literally pays to know your computer better than a web browser. (Ironically, I’m writing this on my Chromebook, but in my defense it has a terminal with ssh. Perhaps more ironically, all of my *real* work is done with paper and pencil.)

Unfortunately this sentiment is mirrored among most programmers who claim to be interested in mathematics. Mathematics is fascinating and useful and doing it makes you smarter and better at problem solving. But a lot of programmers think they want to do mathematics, and they either don’t know what “doing mathematics” means, or they don’t really mean they want to do mathematics. The appropriate translation of the above quote for mathematics is:

“Mathematics is useful and I want to be better at it, but I won’t write any original proofs and I absolutely

abhorreading other people’s proofs. I just want to use the theorems others have proved, like Fermat’s Last Theorem and the undecidability of the Halting Problem.”

Of course no non-mathematician is really going to understand the current proof of Fermat’s Last Theorem, just as no fledgling programmer is going to attempt to write a (quality) web browser. The point is that the sentiment is in the wrong place. Mathematics is cousin to programming in terms of the learning curve, obscure culture, and the amount of time one spends confused. And mathematics is as much about writing proofs as software development is about writing programs (it’s not *everything*, but without it you can’t do anything). Honestly, it sounds ridiculously obvious to say it directly like this, but the fact remains that people feel like they can understand the content of mathematics without being able to write or read proofs.

I want to devote the rest of this post to exploring some of the reasons why this misconception exists. My main argument is that the reasons have to do more with the culture of mathematics than the actual difficulty of the subject. Unfortunately as of the time of this writing I don’t have a proposed “solution.” And all I can claim is a problem is that programmers can have mistaken views of what mathematics involves. I don’t propose a way to make mathematics easier for programmers, although I do try to make the content on my blog as clear as possible (within reason). I honestly do believe that the struggle and confusion builds mathematical character, just as the arduous bug-hunt builds programming character. If you want to be good at mathematics, there is no other way.

All I want to do with this article is to detail *why* mathematics can be so hard for beginners, to explain a few of the secret handshakes, and hopefully to bring an outsider a step closer to becoming an insider. And I want to stress that this is not a call for all programmers to learn mathematics. Far from it! I just happen to notice that, for good reason, the proportion of programmers who are interested in mathematics is larger than in most professions. And as a member of both communities, I want to shed light on why mathematics can be difficult for an otherwise smart and motivated software engineer.

So read on, and welcome to the community.

## Travelling far and wide

Perhaps one of the most prominent objections to devoting a lot of time to mathematics is that it can be years before you ever apply mathematics to writing programs. On one hand, this is an extremely valid concern. If you love writing programs and designing software, then mathematics is nothing more than a tool to help you write better programs.

But on the other hand, the very *nature* of mathematics is what makes it so applicable, and the only way to experience nature is to ditch the city entirely. Indeed, I provide an extended example of this in my journalesque post on introducing graph theory to high school students: the point of the whole exercise is to filter out the worldly details and distill the problem into a pristine mathematical form. Only then can we see its beauty and wide applicability.

Here is a more concrete example. Suppose you were trying to encrypt the contents of a message so that nobody could read it even if they intercepted the message in transit. Your first ideas would doubtlessly be the same as those of our civilization’s past: substitution ciphers, Vigenere ciphers, the Enigma machine, etc. Regardless of what method you come up with, your first thought would most certainly *not* be, “prime numbers so big they’ll make your pants fall down.” Of course, the majority of encryption methods today rely on very deep facts (or rather, conjectures) about prime numbers, elliptic curves, and other mathematical objects (“group presentations so complicated they’ll orient your Mobius band,” anyone?). But it took hundreds of years of number theory to get there, and countless deviations into other fields and dead-ends. It’s not that the methods themselves are particularly complicated, but the way they’re often presented (and this is unavoidable if you’re interested in *new* mathematical breakthroughs) is in the form of classical mathematical literature.

Of course there are other examples much closer to contemporary fashionable programming techniques. One such example is boosting. While we have yet to investigate boosting on this blog [update: yes we have], the basic idea is that one can combine a bunch of algorithms which perform just barely better than 50% accuracy, and collectively they will be arbitrarily close to perfect. In a field dominated by practical applications, this result is purely the product of mathematical analysis.

And of course boosting in turn relies on the mathematics of probability theory, which in turn relies on set theory and measure theory, which in turn relies on real analysis, and so on. One could get lost for a lifetime in this mathematical landscape! And indeed, the best way to get a good view of it all is to start at the bottom. To learn mathematics from scratch. The working programmer simply doesn’t have time for that.

## What is it really, that people have such a hard time learning?

Most of the complaints about mathematics come understandably from notation and abstraction. And while I’ll have more to say on that below, I’m fairly certain that the main obstacle is a familiarity with the basic methods of proof.

While methods of proof are semantical by nature, in practice they form a scaffolding for all of mathematics, and as such one could better characterize them as syntactical. I’m talking, of course, about the four basics: direct implication, proof by contradiction, contrapositive, and induction. These are the loops, if statements, pointers, and structs of rigorous argument, and there is simply no way to understand the mathematics without a native fluency in this language.

So much of mathematics is built up by chaining together a multitude of absolutely trivial statements which are amendable to proof by the basic four. I’m not kidding when I say they are absolutely trivial. A professor of mine once said,

If it’s not completely trivial, then it’s probably not true.

I can’t agree more with this statement. Of course, there are many sophisticated proofs in mathematics, but an overwhelming majority of (very important) facts fall in the trivial category. That being said, trivial can be sometimes relative to one’s familiarity with a subject, but that doesn’t make the sentiment any less right. Drawing up a shopping list is trivial once you’re comfortable with a pencil and paper and you know how to write (and you know what the words mean). There are certainly works of writing that require a lot more than what it takes to write a shopping list. Likewise, when we say something is trivial in mathematics, it’s because there’s no content to the proof outside of using definitions and a typical application of the basic four methods of proof. This is the “holding a pencil” part of writing a shopping list.

And as you probably know, there are many many more methods of proof than just the basic four. Proof by construction, by exhaustion, case analysis, and even picture proofs have a place in all fields of mathematics. More relevantly for programmers, there are algorithm termination proofs, probabilistic proofs, loop invariants to design and monitor, and the ubiquitous NP-hardness proofs (I’m talking about you, Travelling Salesman Problem!). There are many books dedicated to showcasing such techniques, and rightly so. Clever proofs are what mathematicians strive for above all else, and once a clever proof is discovered, the immediate first step is to try to turn it into a general method for proving other facts. Fully flushing out such a process (over many years, showcasing many applications and extensions) is what makes one a world-class mathematician.

Another difficulty faced by programmers new to mathematics is the inability to check your proof absolutely. With a program, you can always write test cases and run them to ensure they all pass. If your tests are solid and plentiful, the computer will catch your mistakes and you can go fix them.

There is no corresponding “proof checker” for mathematics. There is no compiler to tell you that it’s nonsensical to construct the set of all sets, or that it’s a type error to quotient a set by something that’s not an equivalence relation. The only way to get feedback is to seek out other people who do mathematics and ask their opinion. In solo, mathematics involves a lot of backtracking, revising mistaken assumptions, and stretching an idea to its breaking point to see that it didn’t even make sense to begin with. This is “bug hunting” in mathematics, and it can often completely destroy a proof and make one start over from scratch. It feels like writing a few hundred lines of code only to have the final program run “rm -rf *” on the directory containing it. It can be really. really. depressing.

It is an interesting pedagogical question in my mind whether there is a way to introduce proofs and the language of mature mathematics in a way that stays within a stone’s throw of computer programs. It seems like a worthwhile effort, but I can’t think of anyone who has sought to replace a classical mathematics education entirely with one based on computation.

## Mathematical syntax

Another major reason programmers are unwilling to give mathematics an honest effort is the culture of mathematical syntax: it’s ambiguous, and there’s usually nobody around to explain it to you. Let me start with an example of why this is not a problem in programming. Let’s say we’re reading a Python program and we see an expression like this:

foo[2]

The nature of (most) programming languages dictates that there are a small number of ways to interpret what’s going on in here:

- foo could be a list/tuple, and we’re accessing the third element in it.
- foo could be a dictionary, and we’re looking up value associated to the key 2.
- foo could be a string, and we’re extracting the third character.
- foo could be a custom-defined object, whose __getitem__ method is defined somewhere else and we can look there to see exactly what it does.

There are probably other times this notation can occur (although I’d be surprised if number 4 didn’t by default capture all possible uses), but the point is that any programmer reading this program knows enough to intuit that square brackets mean “accessing an item inside foo with identifier 2.” Part of the reasons that programs can be very easy to read is precisely because someone had to write a parser for a programming language, and so they had to literally enumerate all possible uses of any expression form.

The other extreme is the syntax of mathematics. The daunting fact is that there is no bound to what mathematical notation can represent, and much of mathematical notation is inherently ad hoc. For instance, if you’re reading a math paper and you come across an expression that looks like this

$ \delta_i^j$

The possibilities of what this could represent are literally endless. Just to give the unmathematical reader a taste: $ \delta_i$ could be an entry of a sequence of numbers of which we’re taking arithmetic $ j^\textup{th}$ powers. The use of the letter delta could signify a slightly nonstandard way to write the Kronecker delta function, for which $ \delta_i^j$ is one precisely when $ i=j$ and zero otherwise. The superscript $ j$ could represent *dimension*. Indeed, I’m currently writing an article in which I use $ \delta^k_n$ to represent $ k$*-dimensional simplex numbers, *specifically because I’m relating the numbers to geometric objects called simplices, and the letter for those is a capital $ \Delta$. The fact is that using notation in a slightly non-standard way does *not* invalidate a proof in the way that it can easily invalidate a program’s correctness.

What’s worse is that once mathematicians get comfortable with a particular notation, they will often “naturally extend” or even *silently drop* things like subscripts and assume their reader understands and agrees with the convenience! For example, here is a common difficulty that beginners face in reading math that involves use of the summation operator. Say that I have a finite set of numbers whose sum I’m interested in. The most rigorous way to express this is not far off from programming:

Let $ S = \left \{ x_1, \dots, x_n \right \}$ be a finite set of things. Then their sum is finite:

$ \displaystyle \sum_{i=1}^n x_i$

The programmer would say “great!” Assuming I know what “+” means for these things, I can start by adding $ x_1 + x_2$, add the result to $ x_3$, and keep going until I have the whole sum. This is really just a left fold of the plus operator over the list $ S$.

But for mathematicians, the notation is *far *more flexible. For instance, I could say

Let $ S$ be finite. Then $ \sum_{x \in S} x$ is finite.

Things are now more vague. We need to remember that the $ \in$ symbol means “in.” We have to realize that the strict syntax of having an iteration variable $ i$ is no longer in effect. Moreover, the order in which the things are summed (which for a left fold is strictly prescribed) is arbitrary. If you asked any mathematician, they’d say “well of course it’s arbitrary, in an abelian group addition is commutative so the order doesn’t matter.” But realize, this is yet another fact that the reader must be aware of to be comfortable with the expression.

But it *still* gets worse.

In the case of the capital Sigma, there is nothing syntactically stopping a mathematician from writing

$ \displaystyle \sum_{\sigma \in \Sigma} f_{\Sigma}(\sigma)$

Though experienced readers may chuckle, they will have no trouble understanding what is meant here. That is, syntactically this expression is unambiguous enough to avoid an outcry: $ \Sigma$ just happens to also be a set, and saying $ f_{\Sigma}$ means that the function $ f$ is constructed in a way that depends on the choice of the set $ \Sigma$. This often shows up in computer science literature, as $ \Sigma$ is a standard letter to denote an alphabet (such as the binary alphabet $ \left \{ 0,1 \right \}$).

One can even take it a step further and leave out the set we’re iterating over, as in

$ \displaystyle \sum_{\sigma} f_{\Sigma}(\sigma)$

since it’s understood that the lowercase letter ($ \sigma$) is usually an element of the set denoted by the corresponding uppercase letter ($ \Sigma$). If you don’t know greek and haven’t seen that coincidence enough times to recognize it, you would quickly get lost. But programmers must realize: this is just the mathematician’s secret handshake. A mathematician would be just as bewildered and confused upon seeing some of the pointer arithmetic hacks C programmers invent, or the always awkward infinite for loop, if they had not had enough experience dealing with the syntax of standard for loops.

for (;;) { ; }

In fact, a mathematician would look at this in disgust! The fact that the C programmer has need for something as pointless as an “empty statement” should be viewed as a clumsy inelegance in the syntax of the programming language (says the mathematician). Since mathematicians have the power to change their syntax at will, they would argue there’s no good reason *not* to change it, if it were a mathematical expression, to something simpler.

And once the paper you’re reading is over, and you start reading a new paper, chances are their conventions and notation will be ever-so-slightly different, and you have to keep straight what means what. It’s as if the syntax of a programming language changed *depending on who was writing the program*!

Perhaps understandably, the frustration that most mathematicians feel when dealing with varying syntax across different papers and books is collectively called “technicalities.” And the more advanced the mathematics becomes, the ability to fluidly transition between high-level intuition and technical details is all but assumed.

The upshot of this whole conversation is that the reader of a mathematical proof must hold in mind a vastly larger body of absorbed (and often frivolous) knowledge than the reader of a computer program.

At this point you might see all of this as my complaining, but in truth I’m saying this notational flexibility and ambiguity is a *benefit*. Once you get used to doing mathematics, you realize that technical syntax can make something which is essentially simple seem much more difficult than it is. In other words, we absolutely *must* have a way to make things completely rigorous, but in developing and presenting proofs the most important part is to make the audience understand the big picture, see intuition behind the symbols, and believe the proofs. For better or worse, mathematical syntax is just a means to that end, and the more abstract the mathematics becomes, the more flexiblility mathematicians need to keep themselves afloat in a tumultuous sea of notation.

## You’re on your own, unless you’re around mathematicians

That brings me to my last point: reading mathematics is much more difficult than *conversing* about mathematics in person. The reason for this is once again cultural.

Imagine you’re reading someone else’s program, and they’ve defined a number of functions like this (pardon the single-letter variable names; as long as one is willing to be vague I prefer single-letter variable names to “foo/bar/baz”).

def splice(L): ... def join(*args): ... def flip(x, y): ...

There are two parts to understanding how these functions work. The first part is that someone (or a code comment) explains to you in a high level what they do to an input. The second part is to weed out the finer details. These “finer details” are usually completely spelled out by the documentation, but it’s still a good practice to experiment with it yourself (there is always the possibility for bugs or unexpected features, of course).

In mathematics there is no unified documentation, just a collective understanding, scattered references, and spoken folk lore. You’re lucky if a textbook has a table of notation in the appendix. You are expected to derive the finer details and catch the errors yourself. Even if you are told the end result of a proposition, it is often followed by, “The proof is trivial.” This is the mathematician’s version of piping output to /dev/null, and literally translates to, “You’re expected to be able to write the proof yourself, and if you can’t then maybe you’re not ready to continue.”

Indeed, the opposite problems are familiar to a beginning programmer when they aren’t in a group of active programmers. Why is it that people give up or don’t enjoy programming? Is it because they have a hard time getting honest help from rudely abrupt moderators on help websites like stackoverflow? Is it because often when one wants to learn the basics, they are overloaded with the entirety of the documentation and the overwhelming resources of the internet and all its inhabitants? Is it because compiler errors are nonsensically exact, but very rarely helpful? Is it because when you learn it alone, you are bombarded with contradicting messages about what you should be doing and why (and often for the wrong reasons)?

All of these issues definitely occur, and I see them contribute to my students’ confusion in my introductory Python class all the time. They try to look on the web for information about how to solve a very basic problem, and they come back to me saying they were told it’s more secure to do it this way, or more efficient to do it this way, or that they need to import something called the “heapq module.” When really the goal is not to solve the problem in the best way possible or in the shortest amount of code, but to show them how to use the tools they already know about to construct a program that works. Without a guiding mentor it’s extremely easy to get lost in the jungle of people who think they know what’s best.

As far as I know there is no solution to this problem faced by the solo programming student (or the solo anything student). And so it stands for mathematics: without others doing mathematics with you, its very hard to identify your issues and see how to fix them.

## Proofs, Syntax, and Community

For the programmer who is truly interested in improving their mathematical skills, the first line of attack should now be obvious. Become an expert at applying the basic methods of proof. Second, spend as much time as it takes to clear up what mathematical syntax means before you attempt to interpret the semantics. And finally, find others who are interested in seriously learning some mathematics, and work on exercises (perhaps a weekly set) with them. Start with something basic like set theory, and write your own proofs and discuss each others’ proofs. Treat the sessions like code review sessions, and be the compiler to your partner’s program. Test their arguments to the extreme, and question anything that isn’t obvious or trivial. It’s not uncommon for easy questions with simple answers and trivial proofs to create long and drawn out discussions before everyone agrees it’s obvious. Embrace this and use it to improve.

Short of returning to your childhood and spending more time doing recreational mathematics, that is the best advice I can give.

Until next time!

Thanks for this post. I’m an experienced programmer who switched domains three years ago from web programming to “interactive art” which, in practice, is a combination of computer vision, machine learning, computational geometry, and computer graphics. With this switch I found a need (and an excitement), for the first time, to engage with the computer science literature. So much of the research progress in these fields has failed to transfer into accessible code and comprehensible texts, but remains locked up in research papers. I found very quickly that engaging with this research meant having to dust off and dramatically improve the minimal math I learned as an undergrad.

So in some ways I resemble that character with which you began this post: the programmer who just wants to understand the techniques that result from this mathematical work and finds the proofs and their forest of notation to be an inconvenience. To some degree I’ve been trying to get past that bias. And I greatly appreciate the advice you give here for notation reading and proof survival.

However, I’m not sure I completely agree with the formula for learning you recommend: to start with the basic techniques of proof and work your way up. I teach programming to grad students many of whom come from the arts and design and have little to no technical background. Our curriculum tried to bring along their grasp of the fundamentals of programming while simultaneously ensuring that, at every step of the way, they can make something that engages and rewards the interests that brought them to programming in the first place. If we just started with variables, if-statements, loops, etc without a graphical and interactive environment (Processing) in which they were, from the very first day, making work that visibly related to their interests, I don’t think many of them would stick with it long and even fewer would pick up the passionate motivation it takes to pursue one of those 20 hour-long bug hunting sessions you describe. The only reason they do that is that they can taste what the visual result will be if they can just solve this one problem.

I wish there was an equivalent medium for learning math. An environment that let me apply the result of various mathematical ideas to the objects that motivate my concern in the first place: images, 3d meshes, etc. and then let me open them up and explore them in more detail as my understanding deepened.

Or, if not a system, I guess I wish for more mathematical writing that was sympathetic to this mode of engagement. Ironically given the what you argue in this post, your blog here is actually one of the best examples of this kind of writing I’ve come across and this is exactly what I enjoy about it: how you root your explorations of a mathematical topic in a practical problem or application that is exciting enough to make it worth wrestling with the mathematical details. When a grasp of those details then also unlocks other exciting applications, that’s when you get some real forward momentum going.

Anyway, apologies for the long comment, but I just thought it would be worth putting in a word for those of us who learn top-down rather than bottom up!

This is fascinating to me. I’ve heard of Processing and always wanted to check it out, but I sort of cast it into the bin of “toy” programming languages like Scratch: something I might introduce to my children one day but wouldn’t seriously work with myself. I’ll definitely need to take a look at how I can use Processing in data visualization, and how well it would be suited to a first course in programming (as opposed to say, python, what I teach now). I’ve been sticking to the mentality that my students will go on to write code professionally, and so the focus is entirely on problem solving and testing (as you can probably tell is how I learned it). Until I become a professor, however, I am stuck teaching to that assumption. But I certainly agree with your sentiment that programming lessons need to hug very closely to the learner’s interests, as it did in my own experiences.

I think I should clarify at least that I don’t think an entire traversal of the mathematical literature is a good idea for programmers. Measure theory, for instance, is the foundation of probability theory (and hence machine learning), but the nature of discrete computation makes most naive probability theories suffice, and simply the knowledge that measure theory is a thing to make continuous probability rigorous is enough to get through most applications.

On the other hand, the “four basics” I describe are really just for interacting with mature mathematical papers and textbooks, which is what I assume most programmers are looking at when they try to learn more about machine learning and such. But for someone who just wants to learn mathematics to see what it’s all about, I think the real gem that can keep them interested is geometry (which is more or less a mathematical joke as it’s taught in elementary and secondary schools in the US today). So, for instance, if there was an art and design major who was interested in mathematics, I would still emphasize proofs above all else, but the proofs would be in the family of the “geometric method.” For a good introduction to this (which requires no background in mathematics at all), see this excellent book by Paul Lockhart. His view is essentially that mathematics can be done simply because it’s fun and proofs (esp geometric ones) are beautiful.

Thanks for the comment!

Processing is definitely more than a toy. While it has limited “production” uses outside of visual arts and design, you can use any java libraries you want in it. Combining that with how easy Processing makes it to create graphics and do interactivity is really powerful. For example, I taught a class at NYU’s Interactive Telecommunications Program last semester, called Makematics, where I tried to help students apply particular areas of CS research to their own interactive work. The syllabus is here: http://makematics.com/syllabus/2012-fall/ We covered marching squares, linear regression, SVMs, PCA (where I did Eigenfaces based significantly on your write-up here!), Dynamic Programming, and Bayes Rule. For each topic, I tried to cover the mathematical/CS behind the technique in a basic and then also show applications in different domains from computer vision to text processing to digital fabrication. It was incredibly helpful to be working in Processing since I was able to easily create libraries that implemented each technique, usually by wrapping code that already existed. And then, since it was Processing, I had immediate access to all kinds of cool interaction possibilities, from Kinect to image processing to generating files for laser cutting, etc.

I’m trying to do something similar for computational geometry right now…

What a coincidence, I’ve had a few computational geometry posts on the backlog for a while now. Computational geometry is, in my opinion, the most difficult area of computer science research out there.

Why do you think it’s the most difficult? I’ve found that the algorithms are relatively easy to follow and implement (though this may just be because the book (by Mark de Berg) I have is particularly good). However I’ve also found that it’s hard to generalize: understanding one topic doesn’t seem to apply or really help with others.

I learned from de Berg as well, and I’ve found that a lot of his pseudocode oversimplifies the technical details involved in implementing the damn things. In particular, I’ve found that computational geometry involves a lot of complicated routes to efficiently doing very simple operations (querying regions, finding intersections) and each one is a whole subfield of comparing tradeoffs between space and time, mostly involving detailed analysis of custom data structures.

For instance, it is known that a simple polygon can be triangulated in linear time. See this paper of Chazelle. However, the algorithm presented is so technical and complicated that most researchers have given up trying to actually implement it! It’s “hopelessly complicated.“

Hey Greg, I’m currently an undergrad doing CS and have done a lot of interactive art in my spare time. I want to look into it as a career but I have no idea how to “break into” the scene, get myself known, find jobs/opportunities related to it, etc. etc. Could you give me some pointers on how I might actually get started? Thanks!

People may quite reasonably want to use the code of others and learn how to do it better, but “don’t plan to write any programs and I absolutely abhor tracing program execution. I just want to use applications that others have written, like Chrome and iTunes.” We just don’t call that part “learning how to code”, but there is a lot of complex and very useful stuff you can do by using, configuring and combining code w/o writing (and knowing how) a singe line of code ever, and many, many people successfully do just that.

Similarly, people really may want to only “use” the proofs of others – and it currently seems very hard. So, is it fixable? I mean, not by fixing the “wanting” part as you describe, but fixing the “ease of use” part?

If someone wants to know if X is [provably] true, or if an item X always has property Y (or it just happens to have it in most cases) then a yes/no answer would suffice to be practically useful, and it doesn’t really matter how it was proven. Similarly, in CompSci it is very useful to be able to find out that algorithm X has a worst case/average case complexity of Y, without the proof.

What seems to be needed for this use-case of math, is being able to quickly check against a knowledge base if some statement X is proven to be true or false or undecideable. AFAIK, we don’t really have such a knowledge base. AFAIK, if we had such a knowledgebase with the exact proof “X really is true”, we don’t really have a standartized way for another user to define X so that it would match X in that proof – as you describe, notation is quite open to interpretation.

But the corollary isn’t “math shouldn’t be used that way” – the corollary is “hey math, here’s a todo-list for you to become more useful and usable. Anybody fancy to dig in that direction?”

As to your first comment, I do think it’s sort of frowned upon to combine others’ code without knowing how it works. Take for instance the “big data” community. There have recently been many outcries that these so-called data professionals are good at combining and running statistical tests, but very poor at interpreting them or adapting when a certain test gives an unexpected result. These are the coders who can call library functions, but not implement their own.

It is certainly true that people do use the theorems of others without a full understanding of the content, but that is often because someone has paraphrased the statement of a theorem into terms that one can understand. For instance, I might say that there are theorems in topology that let you determine the “shape” of a data set, and I can even give you an algorithm which produces some output that I claim records the shape of the data set (if two data sets have the same output, then they have the same shape). But to expect anyone to have only this amount of knowledge and to do something useful is quite ludicrous.

Your point about algorithm runtime is well taken. I think once someone has proved a big-theta bound on the runtime of an algorithm, it doesn’t need to be reviewed except for interest’s sake. That being said, the unfortunate reality is that almost every question you ask a mathematician is not going to have a known answer. Even if the question has been asked, lacking an exact answer often leads to all kinds of approximate answers, which can vary widely in the way that they approximate something. And so you see, once you start specifying exactly how things get approximated and how they differ, you’re already doing mathematics. Even the process of laying out exactly the statement of the problem is in nature mathematics. Problem specification alone is what requires dense notation, because in order to make any mathematical ground on a problem one needs to completely isolate it from the concerns of the real world. One prominent example of this is the time/space hierarchy of complexity theory. Very few algorithms have precisely known run times, so the best we can do is create a huge database of partially-known results and wait until things get better classified (with little hope that every question will one day be answered). And indeed, these databases do exist (they’re just often written for people who intend to establish results, not people looking for existing results).

“There is no corresponding “proof checker” for mathematics” <- Yes there is. An major example would be: http://en.wikipedia.org/wiki/Coq . Of course such systems demand complete formality – ie much more detail – than a normal mathematical proof, and are often biased towards constructive reasoning, but they're a very interesting area.

I think the mention of Coq actually supports my ideas than opposes them. Have you ever seen a proof of a nontrivial theorem written in Coq? They’re immensely complicated! Here’s an example of a proof of Markov’s inequality written in Coq (which has a one-line mathematical proof). The reader of the actual proof of Markov’s inequality must passively understand or be able to flush out on demand all of the little “lemmas” in the Coq program. And the restrictive syntax shows you that one cannot efficiently do mathematics without bending syntax enough to express your point.

And the real point is that there’s no proof checker for a proof written in a spoken language. Part of the reason is because there’s so much implicit background information and ambiguity in that kind of syntax.

Oh, I wasn’t intending to oppose your ideas – I think it was a very interesting article! You’re absolutely right that non-trivial Coq proofs can be complicated (I have the 4 colour theorem proof somewhere on my hard drive 😉 But the whole area of computer-assisted proof is steadily progressing and improving and it is an interesting entry vector into a more mathematical world if (like me) your background is more on the IT / computer science side. Likewise dependently typed programming languages such as Agda and Idris (and, to a lesser extent, Haskell).

Math notation is not a spoken language, though. I’ve seen dozens of proofs and explanations that are trivial when described in spoken language, but are completely impenetrable when written in “rigorous” form. If we *still* need something like Coq to *actually* be rigorous, what does the half-rigor of math notation really buy us? Several seconds saved on typing/writing extra parenthesis? Or the fuzzy feeling of being an elite?

I don’t care what notation you use when working on some problem, but I definitely do care about what people use in books, research papers and lectures. All of those things are meant to explain things to other people. Yet mathematicians use a notation that is actually worse than Perl one-liners (because at least those can be unambiguously executed). Can you imagine someone teaching programming by example using exclusively stuff like this: $h{$F[2]}.=”$F[0] “;END{$h{$_}=~/ ./&&print”$_: $h{$_}”for keys%h} ? Can you imagine someone defending such practice by implying that people who don’t understand the line above are simply not worthy?

Part of the problem (which I am very cognizant of) is that most mathematics research is written for other mathematics researchers with the same level of experience. The argument goes that it would be a waste of time to give background that everyone reading the paper already has. Another issue is that mathematics is really the art of argument, and so if someone hears your proof and sees it (or hears the beginning and sees how to proceed in an “obvious” fashion), that usually is good enough. And so the half-rigor (although any mathematician would claim to be able to flush out any details needed upon request) walks the balance between too rigorous and bloated with details.

This is part of the reason I keep a blog: to keep myself from falling into that trap. Or at least to know when it’s appropriate and when it’s not.

I think probably the most valuable thing a programmer gets out of studying mathematics is that elusive quality called “mathematical maturity”. I think that’s what this blog post is getting at too, it was just a bit surprising to me not to see that phrase used. Being able to read and write proofs is part of this, of course (and my main suggestion for that would be: go slowly! You cannot read a proof at the same speed you read prose, or a program! Do not think you can gloss over any part of it, until you get really practiced at it.) But applying mathematics in day-to-day programming might not involve as much reading and writing of proofs, as recognizing where proof methods are applicable. Invariants are very valuable in software engineering, and an invariant is basically a universal quantified statement: under all conditions C, some property p(C) is true. That sets it apart from a (necessarily finite) set of tests, and requires, well, a proof, instead. For a practiced programmer with mathematical maturity, much of the proving of an invariant might go on just in their head. It should ideally be documented at some point as well, of course, to communicate it to others who must maintain the code. But it might not make sense for that documentation to take the form of a traditional mathematical proof. But it should, ideally, be as rigorous as possible.

A question regarding “There is no corresponding “proof checker” for mathematics”: what do you think of the potential for adapting the automated theorem prover technology for the purpose of helping programmers learn mathematics? It seems imaginable that in a few decades, we could have systems that let you compose a proof on a computer — formally, but in a non-burdensome way — and the computer would be able to check it for you. And that such tools could be used in mathematics education. Of course, there may be some pitfalls, as a student might come out of such an education with the impression that the mathematical syntax (and foundational axioms) used by the system they learned on is the “right” syntax (and axioms) for mathematics… but hopefully that would not be a difficult misapprehension to dispel.

Good point. “Mathematical maturity” is a good way to sum up the goals of learning mathematics for programming.

As far as I know it the world of automated theorem proving (or alternatively automated theorem checking; two different things) is limited at best, and not helpful for learning. For instance, a very large class of proofs in geometry can be proven by a computer, but why and how this works essentially boils down to a few deep theorems of algebraic geometry (approaching graduate-student level). But on the other hand, the “right” way to teach geometry is by visual proofs: argue by symmetry and finding clever manipulations of geometric shapes to make things line up. It is certainly not to pose it as a computational problem.

But there is still much to think about on that. I think if there were any way to teach mature mathematics in terms of computation, it would be in the form of computational category theory. I’m working on it 😉

There’s a talk that Homotopy Type Theory would allow a better math formalization for proof assistant. https://en.wikipedia.org/wiki/Homotopy_type_theory

I am curious to hear thoughts about it.

Why overlook the engineering mindset (not to mention the scientific one)? Engineers use math and science to construct artifacts. Their mindset helps them to build better artifacts. Surely you realize this optimal (for engineering) mindset is not a mathematical mindset, such as yours. Not everyone needs to be a mathematician.

Look at any text on advanced engineering mathematics. Where are the proofs? Very few. IMHO, the reason is because many engineers feel like I do. The proof of a useful (to an engineer) theorem often involves tricks and techniques orthogonal to the understanding of what the proof means. Intuitively knowing a theorem must be true and actually proving it true are often two vastly different things. So in most cases, why bother with a proof?

Computers are becoming more and more important to all fields of engineering. It seems reasonable that understanding the capabilities and limits of computer programming would be of benefit to most any engineer. Even if the engineer never writes a single line of code.

I have to disagree, because historically most of classical mathematics (the kind that gets used by physicists and engineers) comes directly from their applications. As such, the reasons why the theorems are true have much more to do with the fact that it’s the most elegant way to describe a physical system than any particular mathematical trick. This holds true for calculus as well as machine learning. Of course there will be scaffolding, just as there are diagrams in engineering which don’t by themselves isolate the heart of a design. Knowing why a theorem is true gives you insight in how to apply it to a particular situation or to modify it or, better yet, to critique and improve upon it. I’m not saying that everyone needs to be a mathematician, but if you want to be better at mathematics and applying it to your work, you need to do more than just read the statements of theorems. And additionally, reading the theorems requires knowledge of the definitions, which is in itself a large undertaking.

Let me ask you a question. As an engineer, are you more interested in the capabilities and limits of a particular computer your simulation is running on? Or of an algorithm regardless of what machine it’s in? Or in what’s impossible (more than just infeasible) to do with a computer program?

The whole point is that in applied-math heavy fields such as machine learning and physical modeling w. numerical analysis you can be much more productive by spending a year learning how to better apply the already known mathematical tools to your field rather than spending time on the critique and improvement of them (which you really wouldn’t be able to do at all after just a year from starting).

Furthermore, in these areas there tend to be unproven rules of thumb that matter a lot for your simulations – where it’s clear that finding out why it is so would require far more time than the entire project where this is used.

It’s practical to specialize – expect that other people, the mathematicians will tackle the why question for the whole class of problems, and you will apply it to your narrow problem if it ever is solved. The math skills, however, are needed to find out if it is solved, say, last year and figure out how to apply it.

As for your question to George – for engineer CompSci the difference between impossible and infeasible doesn’t matter much if at all; but the feasibility itself is critical. What would matter is the dependencies between various parameters – what and how much do you need to restrict in your problem to turn it from an infeasible one to feasible. And sometimes theoretical compsci doesn’t help at all in algorithm comparison if the often ignored constant factors get large enough – machine learning application papers are full of purely empirical performance “proofs” by testing and benchmarking, and that is good enough. The proof of the pudding is in it’s eating.

As someone who studied both math and computer science, I loved your blog post and your entire blog in general. Great stuff

I missed out on university due to illness, so I’m lacking in “advanced” math skills, but that has in no way hindered my ability to make a living through programming. In my experience, most working programmers require very little math skills. For instance, while modern encryption may be based on large primes, most programmers will never implement an encryption library to begin with.

And while mathematical maturity is a trait that I find impressive and desirable, it’s in the same category as skills such as drawing, playing the piano or being able to do five consecutive back flips. More hypothetically useful and admirable than something that will be practically applicable in my life.

However, I have recently found a project which would benefit from some knowledge of geometry, particularly mesh transformations. Which gets to the crux of the issue for me: without a project and/or a social environment to compel me to learn a complex new skill, I quickly lose motivation. Or at least, never get to the point of mathematical maturity.

Is this an artifact of my personality? Or is it fair to generalize this situation to all programmers? All humans?

Are we unwilling to put in the required energy to learn a new skill without a concrete idea of where this energy expenditure is taking us? All those hours of mental energy exerted, slowly doing work upon the neurons in your brain in order to reconfigure them in service to some hypothetical goal. That requires an incredible imagination or a constant reminder of the applicability of the goal to your future self.

Anyways, one final question. For a programmer interested in geometry (particularly as it applies to computer graphics) would you be able to recommend any books that would put me on the path that you prescribed in this blog post?

I certainly agree with you about finding motivation, and I do struggle with this myself. I was lucky enough to begin my late-start mathematical career around a group of (mostly) motivated and like-minded individuals. Or at least they all got my math jokes 🙂 I would extend this difficulty much more quickly to all human beings than to programmers. I have the opinion that a programmer who is dedicated enough to debug code is more motivated to do something difficult than the majority of non-programmers.

As to your reference request, if you’re interested in computation and geometry, you may be interested in computational geometry. While it’s highly technical and difficult, most of the difficulty actually arises from the use of complicated data structures like heaps and quad-trees, but a seasoned programmer should have no difficulty understanding how they work and contribute to efficient runtime. The book I learned computational geometry from is that of Berg, Cheong, et al. While I do disagree with some of the ways they present certain topics, it definitely hugs closely to the programming side of things, and assumes you understand how algorithm termination proofs generally go. I should warn you though, the text is somewhat dense, and even I skipped a number of sections where I could tell the direction was taking me away from my goals.

If you’re interested in graphics programming in particular (that text does have some, but covers a lot of other things, too), then from my understanding linear algebra is important. However, unless you’re going for more advanced graphics techniques (the kind that discretize differential equations to, say, model cloth billowing in the wind) you won’t need too much theory. Unfortunately I don’t know of any references specifically geared toward that.

I’m torn here, because I love math (I actually started out as a programmer and in the past few years have mostly given it up in favor of studying pure math), but at the same time there are obvious counterexamples to your claim that practicioners of Non-Pure-Math-Field X need to intensively study proof-based mathematics in order to understand math. I don’t want to discourage anyone from studying math theory (on the contrary, I find it endlessly fascinating), but, for example, Feynman was famously derisive of rigorous math, which doesn’t seem to have handicapped him at all. As another example, Oliver Heavisides was an engineer who invented the differential operator in a wholly unrigorous manner. His attitude of “I do not refuse my dinner simply because I do not understand the process of digestion” is exemplified by other comments in this thread, and I think there’s certainly a place for that.

On the other hand, for practicioners in specific fields (like, say, machine learning) who say that it is sufficient to learn only the “relevant” theorems of their area, I have to wonder: how did anyone discover that these theorems and areas of math were relevant to machine learning or whatever in the first place? Surely the person who discovered their usage had to have understood the math theory in question before they could realize it had applications. There’s also the danger that if you don’t understand the theory behind your field in a deep way, you will never be able to truly advance it (because the open problems are pretty much by definition ones that the current bag of tricks is largely powerless against)

Physicists (Feynman and Heavyside in particular) have historically tended to abuse mathematics, and actually pave the way for new mathematical theories which have since gone on to give remarkable applications.

The simplest example I know of is in measure theory. Physicists invented the so called “Delta function” which was not a function but they pretended it was, and they pretended it could be integrated. They happened to get some awesome results out of that, and out of this (among other reasons) the field of measure theory was born, which properly defined the delta function as a “tempered distribution” or a generalized function. Measure theory then went on to form the basis of modern probability theory, which in turn gave theorists like John Nash an adequate framework to prove their theorems (this also includes the developers of PAC-learning theory, but perhaps to a slightly lesser extent than Nash).

The history of it all is fascinating and quite convoluted, but in my opinion the physicists were really doing math, and just not spending enough time to iron out the details (and rightly so, it’s a mess that only pure mathematicians should have to deal with). But what I propose people do to understand mathematics is not to study all of proof-based mathematics, but to know enough about proofs so as to be able to follow the simple arguments that abound mathematics. I’m talking about the things that literally every mathematician cannot do without, and which is used in every paper a mathematician writes. The only reason I call it intense is because it’s intense for a beginner in the same way that manipulating stack frames by hand is intense for someone who doesn’t work in security or operating system design.

I don’t agree that Feynman was derisive of rigorous math. He was certainly more than capable of it and he did not feel it was unnecessary for his work. But he was a physicist and what he wanted was answers not general proofs. He would skip steps and take shortcuts when doing calculations because he had a deep intuitive understanding of mathematics that let him know which parts were important and where he could use a trick or an approximation. I think he was actually an exemplar of “mathematical maturity”. Sort of a case of “you have to understand the rules before you can break them”.

He gave a great lecture about this called: The Relation of Mathematics and Physics: http://www.youtube.com/watch?v=uEVLHd8voRs

I particularly like this bit towards the end where he talks about approaches to mathematics that mathematicians and physicists have. And it has a funny twist at the end.

http://youtu.be/B09Ny_jp8ak?t=6m1s

What he certainly did disdain was rigorous, formulaic teaching of mathematics. This is a short video where he gives a good explanation of his feelings on that: http://www.youtube.com/watch?feature=player_embedded&v=5ZED4gITL28

For some reason wordpress changed the url for the middle video and took off the start time. The part I’m referring to starts at 6:00. I’ll see if I can paste a different version here that WP will be kinder to: http://www.youtube.com/watch?v=B09Ny_jp8ak&t=6m1s

Great post!

I’ve dropout math degree in my earlier years to become a software developer… and many years later I went back to college because I realize that finishing math major would make a great foundation to really understand machine learning theory and the “why, how and when this works” of the algorithms.

I can say that being a programmer help me out when I started to writing rigorous proofs, specially about being strict about “declaring” what kind of object I’m manipulating and what properties it has.

But being able to write and understand proofs, and specially being able to think hard about a statement for a long period trying to find out all the details until finally you have that moment when think “ah! Now I see this is a trivial consequences of this and this”, all of this mental exercise made me a way better programmer as now I think hard about if every code statement I write it makes sense, if it is understandable, if it is a beautiful succession of logical steps just like the way I try to write a proof.

Wonderfully said 🙂

+1 for the book recommendation alone. I finally see a book on set theory that explicitly states that the term “set” by itself is undefined.

Reading the first chapter preview at Amazon, I already see the need for a human to explain something to me:

Two sentences from the book [1]: “A possible relation between sets, more elementary than belonging, is equality.” and “Two sets are equal if and only if they have the same elements”. These already seem to be contradicting each other — how can equality be more basic than belonging if the former is defined in terms of the latter.

While I agree with the author on the difficulty I face when learning advanced mathematics (notation, generally tacitly assumed by the authors), one thing that also bugs me is use of human language (!) as a part of the proofs. How do I know that the proof is correct and is not impacted by something fundamental about the human language itself? Merely via inclusion of a human language in the proof, a lot more axioms “may” have been included than what meets the eye. Probably not, but I often find it hard to convince myself of this.

[1] http://www.amazon.com/Naive-Theory-Undergraduate-Texts-Mathematics/dp/0387900926

This is an issue that is usually rigorously solved by logicians, and ignored by working mathematicians. Part of the point is that mathematics is the art of argument, and if your point gets across then that’s good enough. The other part of the point is that mathematicians don’t all agree on the best way to create a completely rigorous foundation for mathematics, but pure mathematics basically has conceded that category theory is the best one we know of so far.

I say you should do enough examples with sets to make yourself comfortable with how they work, and worry more about proving basic facts about sets before you wonder about their foundations. You can always look up the dry formalization of ZFC set theory later, but the basics of proof are essentially required to understand how the formalization works in the first place. (and proof techniques don’t require set theory, it’s just that sets happen to be a simple and convenient vehicle to teach it with).

>> Part of the point is that mathematics is the art of argument, and if your point gets across then that’s good enough.

This clearly does not sound “mathematics” (which is about formal proofs rather than merely convincing) unless again this is the culture amongst mathematicians, which you know better.

>> category theory is the best one we know of so far

I have heard “category theory” before, but had never known the above. I’ll definitely read up more on this. If you have any suggestions on books/papers, please let me know.

I am generally already familiar with set theory basics, though had gotten stuck on diagonalization which someone on Hacker News (Colin Wright) helped me with. Now I can continue reading towards understanding Godel’s proofs.

This is very much a part of the culture. Convincing geometric arguments and picture proofs abound in topology, and they are often quite far from completely rigorous.

I’m going to start a category theory series soon which, in the spirit of this post, remains intertwined with programming for as much of it as possible.

Also, while Godel’s proofs are interesting and a subject of popular fascination, I personally think there is not much content in them. But when you do understand them, there are another class of Godel-type incompleteness theorems, one of which I presented on this blog in my post on Kolmogorov complexity. I like the elegance in these proofs, whereas Godel’s original incompleteness proof is a bit belabored (it’s a constructive proof, whereas the one in my post is only probabilistically so). And after that, if you want to see more about what modern mathematics is like, you can start reading about model theory (but it will quickly get denser). One area of logic that was fascinating to me was that of the Presburger arithmetic. It turns out there’s a natural algorithm to “eliminate quantifiers” and it provides a computable procedure to decide whether any logical formula written in that language is true or false. The only issue is that the problem has complexity , and as far as I know it’s the only naturally occurring problem with such a big lower bound. I wanted to write a post on it (still have a draft sitting somewhere) and I had a bit of Racket code to perform the algorithm, but my attention was displaced when it came time to write up the mathematical details.

Thanks. Those are enough pointers for me to be busy for a few days! Will anxiously wait for your posts on category theory.

> It’s as if the syntax of a programming language changed depending on who was writing the program!

Oh, so it’s Perl, then!

There are a bunch of algorithms which resist a full mathematical understanding by their very nature. Typically heuristic algorithms meant to do multi-objective gradient free optimization.

In fact, even more typical techniques that were eventually given a linear algebra footing start out not having a rigorous mathematical footing. I think your boosting example was one. Although I think they used methods from analysis. And then Game Theory. Been a while since I looked. Neural networks are another from the point of view of gaussian processes.

Point being, although it is a most suggestable sufficiency, you aboslutely do not need a strong mathematical background to invent a viable learning algorithm. Only to understand why it works. And sometimes that’s not even good enough (the behaviour of schema in genetic algorithms are still argued for example).

Boosting was purely a child of learning theory, and is formulated in the PAC model of learning (again, purely theoretical).

Even with heuristics, the question remains: when should you try for a heuristic as opposed to an exact solution? (when the problem is NP-hard or worse) It’s somewhat amazing, though, that even if the best algorithm is provably a very simple one, it won’t be accepted as good without some theoretical justification. Seeing it’s good in practice is fine and dandy, but if you don’t know for sure that it will be sensible, you won’t (or shouldn’t) risk a lot of money on it. These very kinds of uses without understanding have recently caused financial crises, if you’ll recall…

Are you certain it wasn’t justified after the fact? I have a memory reading that boosting was actually originally intended as a filtering algorithm. And I know that adaptive boost followed boosting which followed bagging – the last of which is very simple. For a while people could not explain the counter intuitive behaviour of ensembles beyond handwaving about smoothing and variance. Also an aside – nowadays Boosting is often explained in a game theoretic setting

As for NP-hard (NP-hard is the or worse, NP-complete is what you are looking for?) sometimes you have just have no choice because you are doing optimization over a non continuous domain. Or one with that is not differentiable.

The financial crisis was done on purpose not out of ignorance =D

Lots of methods are used without a full understanding. Learning Classifier Systems for one. Neural Networks and Random Forests, I argue are not as well understood mathematically as support vector machines although they work very well empirically.

I’ll point out that I am only arguing that theorem guided algorithms are sufficient, not necessary – machine learning is more like physics, where Mathematics and experimentation motivates insight, not theorems and proofs. They clarify insight.

I know the weighting system used in boosting was known ahead of time, but even the idea that one could use weak learners to produce a strong learner was quite novel and not at all obvious. It’s definitely not just a matter of smoothing, but that the training algorithm picks “special” distributions of the input data with which to provide the weak learning algorithms. Because, you see, in PAC learning an algorithm can only learn something if it can learn it regardless of the distribution of the data. I believe the original breakthrough was in a paper of Schapire, but I might be remembering that wrong. IMO its best interpretation is in terms of game theory.

NP-hard is worse than NP-complete (in the sense that NP-complete is contained in NP and NP-hard is not, so many NP-hard problems are not even verifiable in polynomial time). There is still a lot of research actively going into both neural networks and support vector machines, so I would say they are both poorly understood. But in truth the only way to understand why they work is mathematically (sure neural networks have biological interpretations, but cognitive psychologists don’t believe that in any seriousness; I know because I’m working with one right now 🙂 ).

Of course using things is good enough for many. My point is more that if you truly want a full or even a better understanding, you have to dig into the proofs. My issue in this post is simply with those who claim to want deep insights, and what stops them from getting it.

I agree completely that you need a strong mathematical understanding to really know what is going on. I even believe the tools used in machine learning are inadequate. Algebraic geometry, topology, Statistics as well as Group theory are all giving highly simplifying insights, bringing together a lot of things that were once deemed apart.

Up above you said NP-hard or harder, I was correcting you in that NP-hard or harder was redundant and asking if you meant NP-complete or harder.

I’ve implemented Adaboost so I understand the distribution meant to scale difficulty, it was in an improvement called Filterboost (a scalability and generalization improvement) that I read boosting was not originally a learning technique.

I agree that SVMs and ANNs are continually being better understood but think SVMs are better understood due to the increased mathematical tractability, being convex optimizable for a global and with built in regularization.

I completely agree that artificial neural network have very little to do with actual neural networks. At the most basic, Neurons are more like complex automata than simple functions.

There are still complexity classes that are “harder” than NP-hard, and many problems which are known to be in those classes and yet not known to be NP-hard. And of course there are always undecidable problems 🙂

I am a fledgling programmer who is banging my head against the wall (Well, actually I have just completed a course in actionscript… which I dunno if it is considered a real programming language?). But I really liked this piece somehow, it kind of got to me. How mathematics and Programming differ.

Reblogged this on syndax vuzz.

My experience has been in learning Math and Development not from the collegiate level of instruction, but from the school of hard knocks. I wouldn’t consider myself a master of either (though many I work with constantly place me in that area) but instead as a person who can understand both well enough to bridge the gap.

I find that Math and Programming go together like an Acid and a Base for a Chemist. If you only know the powers of one you will argue its use to the end without acknowledgement of its dangers. Once you understand the basics (and some of the advanced topics) of each you realize that the power comes from the combination of both, not in the application of either.

I’ve sat in many a room with many smart educated people, and I commonly find myself taking the exact opposite stance as anyone else.

If they are arguing what proof for a given formula is best, I suggest using a program sandbox to execute all solutions concurrently with a basic formula and visualization over the top to view where each formula stands out.

If they are arguing that a given algorithmic approach is best I find myself asking for the mathematical proof of each and using that to help choose an appropriate approach. Thus forcing them to think about the outcome, not just the edge case they are claiming best fit for.

Long story short; I think it takes strong individuals in each area to come to the best solution (math gives you a proof, development provides usability), but it also takes an idiot like me sitting in the middle playing Devils Advocate for both parties to find and use a solution.

– Jeremy

So, when has a practical programmer actually ever been called upon to use Fermat’s Last Theorem or a proof of the Halting Problem? I’ve been programming for almost 25 years and have never come upon any real-world programming problem that requires anything more sophisticated than a quicksort — not even a hash table, unless you count the stuff under the hood of the STL — let alone anything approaching serious mathematics — and this in a career consisting almost entirely of engineering and science, where you’d think the heaviest math would be. Okay, in my last job a couple of people I worked with/for developed some pretty heavy math stuff — but it almost never percolated down to *me*, even to implement, let alone to *develop*.

Also, I’m almost done with a Master’s degree in Computer Science, and overall I’d say about 85% of what I’ve been taught either has no practical relevance to me or is something I already knew. Only the remaining 15% is moderately new to me and of even *possible* practical utility. I really should have majored in Software Engineering — but I foolishly merely picked up where I’d left off with my Bachelor’s (1988) when there was no such distinction (and almost all they taught was what would today be considered pretty rudimentary programming; or maybe that was just *my* school).

On one hand, you’re right. I’m not here to say the practical programmer should use mathematics for anything. Considering that the vast majority of code is written to shuffle around credit card information and do basic arithmetic, I’d say that the vast majority of programmers use no mathematics at all. I’m only explaining why those programmers who do want to learn more mathematics have a hard time. And I don’t think it’s a surprise that programmers find mathematics so interesting: not because they

needit to do their work, but because one can do so many awesome things with it. (Indeed, the whole point of this blog is to explore those awesome things and how to do them)On the other hand, more and more algorithms today rely on mathematical analysis. Without getting into too many details, many “simple” algorithms in the randomized/approximation algorithms have extremely sophisticated analysis. This crops up in the analysis of large networks, data mining, and sub-linear streaming algorithms, and these problems are becoming more and more profitable to solve. For the everyday programmer, it’s probably just going to come in the form of someone else telling him the algorithm to code, not explaining why it works, and asking for empirical results. That doesn’t mean the mathematics isn’t important, but that they’re paying someone else to do it and they’re paying the everyday programmer to integrate the ideas into a potentially complex codebase.

Great post!

I’m coming form a pure programming background and have been able to achieve some of my life’s programming goals, but my weak math background has hindered my ability for graphical simulations and solving game theory problems which I really, really want to get into. I’m now catching up on the math, but so far I’ve really own gained some good symbolic technique, which is easy to learn solo because programs like Mathematica will check you your work.

Programming was very easy for me to learn solo because the IDE’s compiler/linker would tell me exactly what was going wrong. I could then isolate the error and ask other programmers on forums and IRC to help me. With an isolated test case, it was very easy to advance.

I’m learning to prove theorems while I study the construction of the real numbers in preparation for my quest to conquer Apostol’s Real Analysis, and I’m so frustrated that I don’t have a compiler/linker to go over my proofs. It’s harder to get help on these often trivial proofs because when I’m learning solo I sometime’s don’t even know how to orientate myself when proving basic theorems.

I do feel at the level of writing and verifying proofs, it absolutely feels like I’m writing a computer program, and I do appreciate my years of programming experience; I often don’t get that feeling of coding when I’m integrating or using algebraic techniques; I feel like I’m blindly brute forcing something, where I’ll try one technique, and if that technique doesn’t work, I’ll see if there’s any partial success to build from and if not, I’ll just have to try a different technique.

I find this post while searching for, “programming language to check number theory proofs”, and from I can gather, I’m outta luck. I’ll be investigating LISP, Coq, metamath and some other programming languages further for helping me with my proofs, but I think I’m going to just have to compare my proof’s to the books author and or pray someone in #math or physics-forums can tell I’m in the right direction.

Thanks again for blogging about this, you have a new follower for life.

I agree that programming skills do contribute to math skills a great deal! Thanks for reading 🙂

I have to say that if anything, my programming skills have hindered my math skills. I’ve written a fair amount of code, of pretty good code anyway, but most code isn’t really subtle, unless you’re doing funny lock-free stuff or template metaprogramming or whatever. When you zoom out it all becomes a very complex system… The fact that a lot of programmers write after a few beers is a bit of an indicator that for some code you don’t have to always completely be there. You can relax and try things… which you definitely do with math, but you have to be cognizant enough to catch yourself. The computer is the final judge and if it works and (as you say) is thoroughly covered with tests, you’ll know where you went wrong.

I’ve had to break these bad assumptions and habits while learning to become a mathematician/physicist. I’m in some sense writing a program, but really I’m instructing someone how to imagine something. Trying to brute force a proof rarely if ever works for me. It takes a bit of careful sitting down and thinking about what you have. Brute forcing requires that you have a well defined function to tell you when you’re wrong… and if that function was so well defined you wouldn’t have to brute force in the first place. Anyhow. I felt hindered because I’m used to rushing through problems with my hand held by the computer and an arguably very simple framework. Mathematical abstraction results in swinging huge swords and doing a lot of damage if you’re not careful. Good computational abstraction makes it feel like you’re still swinging a small sword, I feel.

Nicely said. I feel the same way, and I think after gaining some mathematical maturity it can go in reverse if you’re not careful too. You can spend too much time on a program that shouldn’t need that much contemplation, and you feel the need to justify steps that don’t need justification (or at least, non-mathematicians don’t need justification).

Re: if it’s not completely trivial its probably not true

Cue one of the 2 math jokes I know: (the other one starts ‘there are 3 kinds of mathematicians…’)

Student is working on his assignment and gets stuck part way through a proof.

Eventually, he decides to try working backwards from what he needs to prove.

He makes better progress, but Unfortunately he soon gets stuck again.

Desperate, he writes => in the space between his forward and backward progress (after all, the professor might not notice!) and passes the damn thing in.

Following week,he gets back the graded assignment. At the top of the page, there are two large marks – a crossed out D and a big, underlined A+.

In the middle of the page, pointing at the spurious => are two large question marks, also crossed out. There was also, in an almost indecipherable scrawl, the following:

This implication seemed to be totally unsupported, but after several hours, I realized it was completely trivial.

I think a big part of the problem is the way we do math education these days. Most students don’t even encounter proofs in high school, even though it is possible to give a very thorough introduction to proofs using only basic algebra. Just look at Daniel Velleman’s excellent book How To Prove It. I think if we could somehow incorporate a book like this into the high school curriculum, we’d see a renewal of interest in mathematics and students would be far less apprehensive about deciphering mathematical notation later in their educational careers.

Another good option for such a text would be Lockhart’s “Measurement.” It was written specifically to address this concern.

Really interesting connections between proofs and programming, including both the uncanny similarities and the extremely polar differences. The “distinction” between trivial and nontrivial though can be really misleading, but you seem to be very aware of this. Technically all theorems are a sequence of trivial steps, but the art lies in knowing which steps to use and what order to use them in. And that part is highly nontrivial.

Anyway, I am sort of approaching this problem from the opposite perspective—my mathematical training is much stronger than that in computer science or programming. I agree that for coding, a solo session can be extremely productive because we can get immediate feedback on our code. But for mathematics, doing proofs alone can seem daunting at first because there is no easy way to check the validity of a proof. I think it takes a while for one to become accustomed to writing proofs, and after that point it becomes much more natural.

Thanks for the article!

Just commenting to say thanks.

I’m afraid I’ll soon get addicted to this site.

This post hit the nail on the head so hard.

Thank you very much for writing these articles!

Incredible post. I really enjoyed reading it. The section about syntax in mathematical proofs was intriguing. I would like to see a standard in mathematical proofs that is like python or even in java with javadocs. I hate referencing an article I that I don’t remember the url for, thanks to stumbleupon, but I saw a similar article discussing how archaic the standard mathematical proof is. In an age of computers, a new proof driven language is needed to make writing, sharing, and error checking proofs less extraneous . While some proofs can be simple, the most prolific of proofs can take so long to “debug” and “decode” they become useless to the non-mathematician, or the amateur. The programming world and the hardcore mathematicians need to merge and develop a python-proof engine.

Jeremy, great post. I am no mathematician, but interested in the subject. couple questions:

1. I just finished reading Devlin’s http://goo.gl/KlRBfG and it does talk about the barebones of terminology and proofs. not sure if you are familiar with the book, but is this the general direction you recommend?

2. when I started reading your post I thought you would go in the direction of experimental math http://goo.gl/4sepn as an initial bridge between math and programming. again, I’m no mathematician, but i got the impression that experimental math would provide me with the tools to at least break complicated mathematical statements into smaller pieces, test them one at a time, and then try to bring them into a cohesive numerical demonstration (not proof). is that the case?

1. is a good introduction. I actually participated (lightly) as a course TA for Kieth’s online course which used that book. Another book I’d recommend if you’re interested in mathematics with a CS flavor is Sipser’s Introduction to the Theory of Computation. The big proofs in that book are all prefaced with a “proof idea” section which helps a lot in parsing which mathematical details are important and which are just necessary scaffolding.

2. This is a technique that all mathematicians use every day, but for checking the steps of proofs. In fact, I will often write computer programs to verify properties of things (say, by exhaustion) before I try to go and prove them. But for me these are always prerequisites to a formal proof, and the statement “we verified this by computer search” is a bit taboo, even in the theoretical computer science community, because it doesn’t explain why something is the case.

That being said, as you get deeper into mathematics your “experiments” become more abstract. Instead of saying something like “Let’s try this when n=4” you say, “let’s try this for any prime which is 1 mod 4” or worse, “let’s try this statement about general fields over an algebraically closed field of characteristic not equal to 2, where everything is nice.”

Great post and discussion here. I just wanted to suggest that one approach a programmer might take to both further motivate and aid their learning of mathematics would be to use their algorithmic chops to try to create meaningful visualizations of their subject of study. (Consider the work of Jason Davies or Steven Wittens.) Exercise: reproduce and adapt some of the visualizations found in Hilbert and Cohn-Vossen’s _Geometry and the Imagination_.

Also, Motion Planning and Optimal Transport problems provide a nice context for the application of concepts that typically fall under the rubric of “higher math” … serious algorithmic challenges only now receiving mathematical illumination.

I really enjoyed this post. In all seriousness, you should consider writing a book specifically tailored to this subject. You seem to have a natural sympathy for the coder, which most math guys lack.

I think the difference between the disciplines is rooted in the environment in which they occur as much as the culture surrounding them. Mathematics is and probably always was an art practiced in academic venues. Knowledge is stacked atop existing knowledge, which allows the initiated to follow along.

While computer science certainly has roots in the university, it grows and mutates in the wild. A coder can learn more in trying to hack a piece of open source software than they can writing thousands of academic examples. Unlike the academic world, knowledge doesn’t require the same pyramid structure underneath. You can get by with understanding only what you need to, and tailor your learning to the specific problem you are trying to solve.

Like one of the comments above suggested, there is an immediate payoff to your learning. You have a working program, which now does what you wanted it to do. Not what you were supposed to learn, but what you wanted to learn. This creates the confidence to pull out the tin snips and sledgehammer to go at another coding project.

With each victory, your knowledge increases in jagged chunks, which can be crudely summoned and at some point refined. You don’t have to be clever to learn to code, but if you keep at it long enough, cleverness might sneak up on you.

You can follow this sleazy path through the back alleys of the intellectual metropolis, until someone asks you to provide Big O notation for something. Within moments of reading that damned algorithms book for the tenth time, you realize that your bluff has been called. You are in fact too stupid to be reading this book. You understood the first 20 pages, then they started slipping the nonsense in.

So you do what has always worked before. You try to learn just enough to move forward. Sadly, all of the math on Wikipedia is written in math, so you are still screwed. There are many math sites, but they are all intended for college students studying math. If you try to ask a question, they will respond in math. Unlike programming, there just don’t seem to be primitive elements, which scale to practical use.

Mr. Krummel, nicely put. A programmer would like some path to the symbolic abstractions via the concreteness of code … a graduated semantic ascent like that offered by the Little Schemer, culminating in formal notation. Of course, proofs need not be avoided along the way, just refuted/generalized along the lines of Lakatos’ Proofs and Refutations.

I have been thinking about a book. I said in a recent post that I had long term plans to cover certain topics. A book would be super-duper long term by comparison. I don’t think I could start until after completing my PhD.

I do think mathematical knowledge is jagged as well. Mathematicians tend to specialize heavily in a subfield of a subfield of a field. I think part of my point is that the baselines are very different in the two fields. In programming you need to know how to use a text editor before you can write a program, and you should be familiar with a handful of languages and tools if you intend to be a software developer. In mathematics, the baseline is following, questioning, and generating logical arguments. I think that if the general educated public had a solid mathematical foundation there wouldn’t be such a perception of the pyramid of knowledge (or at least, that wouldn’t be considered the main obstacle). As you can see, for something like group theory it only takes me a post or two to make the gist of these coveted secrets public. It’s not research level mathematics, but it’s enough to go out and implement RSA.

Internalizing intuition about groups to the level required by a mathematician takes just as much practice as programming, and I think

that’sthe kind of culture that turns programmers off (and turns mathematicians off to programming!). Mathematicians expect this intuition to be developed by exercise, proof, and counterexample. Most non mathematicians just want to know how things work without all of that.Here’s the thing: I have spent seventeen years studying mathematics. I began at around four years old, then continued throughout my formal education until I did my degree in computer science, during which I took some modules in discrete mathematics. There is no doubt that I have spent far, far more of my life engaged in deliberate study of mathematics than any aspect of programming, so why do I feel so completely lost when trying to research and apply mathematics while programming?

It’s frustrating to read posts like this that suggest both that I don’t understand the basics of mathematics, and that I’m not prepared to expend the effort to learn. The way I see it, I, and just about everyone who went through the formal schooling system, has had an extraordinary amount of mathematical education. That it was apparently useless in preparing us to tackle any kind of advanced topic in mathematics, and we’re still “beginners”, is pretty sick-making. It’s like learning to read and write all throughout school, only to get out and find that everything anyone writes is in a different, completely incomprehensible language.

The sad truth is that little of what goes on in secondary school education is “mathematics” in the sense that you want it to be. Despite that formulas are often seen as the end result of mathematical work, mathematics is not about applications or formulas. It’s about the reasoning taken to get to that end. And so I ask: how many of those seventeen years did you spend trying to solve problems for which you weren’t given an algorithm ahead of time? The problem is that you’re never taught in school to play with mathematical concepts in order to gain a better intuitive understanding of them. The analogy is that you spend 17 years learning to read and write, but all you really do is practice spelling and grammar; you never open a novel or write a short story (no matter whether it’s bad literature, you’re never given the chance). Or you learn to write sheet music but never play an instrument. You might be under the impression you spent 17 years studying music, but can you really expect to have any proficiency at all playing the piano?

When you see a new feature in a programming language, what do you do to learn about it? The best answer is: you write little programs that use it in progressively more complicated ways. This is the same attitude you need to have about mathematics when you encounter something you don’t understand. You’re

supposedto feel lost, more so in mathematics than programming (all mathematicians do!). Most people expect mathematical understanding to come to them WAY faster than is realistic. When I encounter a new definition (forget the big theorems!) I spend tons of time tinkering with examples and trying to prove simple facts before moving on. Even then I rarely get the intuition I’m looking for, and similarly every little program you try to write with your new feature might have compiler errors. But when it comes to mathematics a lot of people, perhaps due to the duplicitous nature of their schooling, expect to be told what to do and how to do it, and they never get any real intuition about things. It’s like you’re writing a program and you demand to be told what to type (not what language features or data structures are appropriate, not whether such and such belongs as a separate function, but what order to type which keys).I don’t mean to say that you do this, but it’s an extreme form of the same attitude (I’ve actually had programming students ask me what to type in their programs). The point is that mathematical thinking skills are largely unrelated to knowledge of mathematical facts. The central mathematical skill is being able to form and answer questions, to gain traction in the face of being completely lost. Because I am “completely lost” every day I go into work, and by the end of the day I am a tiny bit less completely clueless.

And I don’t mean to say programmers can’t or don’t want to learn mathematics. I was originally a programmer who had never done math. I’m saying that I see a lot of genuinely interested programmers try to learn math, and ultimately lose hope for some or all of these reasons.

Is there a curriculum list you would recommend for someone starting out?

There are similarities between programming and mathematics. Both involve abstract modeling of a real world manifestation. Except the language is different. Programmers conversant in multiple distinct languages, e.g. C++ and Java can design how to write a given program in either language. Math is just another way to ‘solve’ a problem, just way more concise. For example, a fibonacci sequence in math is a few characters but translate to few lines in Java.

I like to think of it a continuum from abstract math (e.g group theory) to basic math to programming languages. Just choose the right abstraction and iteratively refine it. (aka top-down design)

Another Jeremy’s article along the same lines that I’d recommend is http://j2kun.svbtle.com/programming-is-not-math-huh Pretty much great read, he’s talking that math isn’t that formal as programming language, but more formal than human language; it’s in between them.

“… it was… three [years] before I learned to appreciate functional programming…”

Considering your formal training in mathematics, I am rather surprised to read this!

The math came after the programming.

Ah, okay. That makes sense 🙂

There is definitely something “wrong” with current mathematical practice (greek, letters, stacked subscript/superscripts, untraceable context for implicit denotations, etc…) all this probably inherited from the “paper and pencil” constraints for terseness and which makes it unnecessarily awkward but it goes further than that.

There is a deep rooted “culture clash” with programming and only a few people can partake in both cultures, well summarized by Newcomb Greenleaf (a mathematician).

https://www.researchgate.net/publication/225112269_Bringing_mathematics_education_into_the_algorithmic_age

What’s wrong with «greek letters, subscripts/superscripts…»?

What’s wrong with all this is that these are not *mnemonic* (and cannot be…) because their meanings change according to context and the context itself is hard to track, i.e. if you dont recall EXACTLY at wich place you are in the text and which conventions apply you are irremediably lost!

You cannot “search” for the last time, say, δ was defined you have to restart from the beginning of the paper, chapter, whatever, this is UNACCEPTABLE!

It means you cannot cursorily browse a large corpus, this the opposite of software best practices.

The 5000 or so developers of the Linux kernel could not maintain consistency over the 20 million+ lines of code if all names where ‘i’, ‘j’, ‘k’, ‘δ’ or such, actually this very mangling of identifiers is what is used in code obfuscation contests. 🙂

Okay, first I have to say I am not a mathematician but a programmer.

The problem with oscillating meaning of letters is that one can’t reliably call variable «foo» either, because it easily could be multiplication of variables «f × o × o». In general I do agree, in programming one-variable-names are only used in context a few lines long, so that the definition and usage is easily seen. I just want to note that this is not the problem of «greek letters, subscripts, etc» per se, but rather the naming convention.

Every letter is usually defined once (as I’ve seen at least), so it is enough to find the first usage of the letter. The problem with broken search to me seems is that mathematicians are using bad instruments, which are usually encode formulas as images instead of Unicode. There’re good solutions, for example on math.stackexchage is used «MathJax», which does encode formulas in Unicode. Entering Unicode letters (to use the search) not a problem nowadays. For example, I am using Compose key with config from github, so for example entering greek δ is simply «Compose + * + d».

Why doesn’t Jeremy-kun use MathJax by the way? ☺

Reblogged this on The Order of SQL.

I found your blog via quora . As a programmer receny trying to get into maths , I can’t help but agree. The syntax of mathematics makes it very hard for me to teach myself . A good way to curtail this is for book authors to agree on a particular notation. The sigma summation symbol is one of the primary reasons while I could not go further with some text book. This is the best thing I have read this year

Your post is have made me understand reasons behind the things that they are in math BUT WOULD IT KILL THEM TO AT LEAST USE BRACKETS? Not to say anything about your superscript example: what should one do when there IS a need to destiguish between superscript-as-power and superscript-as-index/argument

I mean, your SUM example give me thought: why not just have i belongs to natural numbers; i belongs to [1,10];? All this talk about simplyfying makes me think about 1+2=3+3=6