Saturday, May 29, 2010

The Resurgence of Parallelism

Parallelism is not new; the realization that it is essential for continued progress in high-performance computing is. — The Resurgence of Parallelism, CACM
This must be the best quote countering the "faddish rallying cries" for in-language parallelism. Really, unless you're doing HPC, parallelism seems to be a red herring.

For scalable storage for example, I'd much rather extend shared-nothing to the per-core level.

Linus sez:

I'm ranting against the idiots who continue to talk about how software must be more highly threaded, and how multi-core is the inevitable future.

Those people are morons. It's not the inevitable future at all. Software programmers should not think that they have to write multi-threaded programs. It's too hard, and the payoff is too little. And it absolutely isn't where the industry is even going.

Friday, May 28, 2010

Type Constructor Polymorphism

I'm starting to dig type systems, which I have ignored for years. I'm still not able to make it through a single page of TAPL in one sitting, so my learning here is purely intuitional.

I think generics in Java are quite nice, surface issues aside. (In theory, that is. In practice, "Java seems to have been designed to strike a careful balance between making the type system as obstructive as possible while minimizing any actual guarantees of correctness" . I think this is mostly due to Java's lack of RTTI for generics. C# doesn't have the same problems, except in places where they intentionally copied Java.)

My first nice discovery was that there's a beautifully simple theory behind generics: F-bounded polymorphism. (Reassuringly, it wasn't developed in academia, but by practitioners at HP Labs.)

In ordinary bounded polymorphism, you can give bounds to your type parameters: A <: Foo. The bound here is Foo, which means that all actual As must be subtypes of Foo.

F-bounded polymorphism is a simple extension of ordinary bounded polymorphism, in which the bound may recursively contain the type it bounds: B <: Bar<B>. It turns out that with this simple (but at first somewhat confusing) concept, we can statically typecheck the expression problem, which is a standard measure of the worth of a type system.

Type constructor polymorphism is another beautifully simple theory. In type constructor polymorphism, the bounds of your types may not only be concrete types, but also functions from types to types: C <: F<D> says that Cs must be subtypes of Fs, which are type-level functions that take a D as input. (Type constructor polymorphism also works naturally with F-bounded polymorphism.) It turns out that with this simple concept, we can statically typecheck GADTs, an advanced issue in typeful programming.

My take-away: so far studying type systems has been a lot of fun, though extremely challenging, and the concepts are simpler and more profound than I thought. As long as we keep type systems optional, and never let them get in the way, I think we should strive to understand and apply them in new languages.

Wednesday, May 19, 2010

Revisiting Multimethod Dispatch

I've been bitten two times by CLOS' multimethods, so badly that I became totally anti-MMD. The problem is that in CLOS, modifications to your class hierarchy can have unpredictable and unintuitive effects as to what method gets called.

The problem is neatly described in the Diesel specification (page 33), by Craig Chambers (now at Google, like his PhD student, Jeff Dean):
Multiple dispatching introduces a ... potential ambiguity even in the absence of multiple inheritance, since two methods with differing argument specializers could both be applicable but neither be uniformly more specific than the other. Consequently, the key distinguishing characteristic of method lookup in a language with multiple inheritance and/or multiple dispatching is how exactly this ambiguity problem is resolved.

Some languages resolve all ambiguities automatically. For example, Flavors [Moon 86] linearizes the class hierarchy, producing a total ordering on classes, derived from each class’ local left-to-right ordering of superclasses, that can be searched without ambiguity just as in the single inheritance case. However, linearization can produce unexpected method lookup results, especially if the program contains errors [Snyder 86]. CommonLoops [Bobrow et al. 86] and CLOS extend this linearization approach to multi-methods, totally ordering multi-methods by prioritizing argument position, with earlier argument positions completely dominating later argument positions. Again, this removes the possibility of run-time ambiguities, at the cost of automatically resolving ambiguities that may be the result of programming errors.
So, the problem with CLOS is that it linearizes classes, and then also makes the function arguments to the left more important than those to the right.

Diesel's solution, which I don't understand fully yet, instead warns programmers of such ambiguities, and requires them to be resolved manually. Which sounds good, and may be a reason to revisit MMD for me.

Multimethods do rock, after all.

Type systems for dynamic languages

In the past days I've studied type systems for OOP, with an eye towards making them work in a Lisp.

Cecil and its successor Diesel already do this. There, type systems are purely optional:
Diesel supports the view that static type checking is a useful tool for programmers willing to add extra annotations to their programs, but that all static efficiently-decidable checking techniques are ultimately limited in power, and programmers should not be constrained by the inherent limitations of static type checking. ... Accordingly, error reports do not prevent the user from executing the suspect code; users are free to ignore any type checking errors reported by the system, relying instead of dynamic type checks. Static type checking is a useful tool, not a complete solution
I think that's a good spirit.

Both languages support F-bounded polymorphism, which is a fancy way of saying that you allow recursive types, like T extends Foo<T>. With this simple mechanism, the expression problem can be typed, and that's something.

(Scala and O'Caml extend upon this, each in their own way (here and here), but I think they're much too complicated for what I have in mind.)

Saturday, May 15, 2010

Typeful Dynamic Programming, or, Java/C# Generics and what does F-bounded polymorphism have to do with it?

On LtU, Anton van Straaten repeatedly and vehemently made the point that users of dynamically type-checked languages don't program without types. Instead, in my interpretation, they do they type checking and inference in their heads.

Now, Java can in fact be used as a dynamically type-checked language: just declare all your variables, parameters, and return values as Object, and use instanceof to dispatch on types or reflection to invoke methods. Runtime type information (RTTI) saves the day. But, Java with generics also offers a quite potent type system. (Even if it's hampered by type erasure for backwards compatibility. C# keeps RTTI about polymorphic type variables around (the actual class of T in List<T>).)

And Java's generics are in turn based on F-bounded polymorphism for object-orientation, a nice and rather simple way for type checking parameterized classes. F-bounded polymorphism can do cool stuff like solve the expression problem (if heavy-handedly), and type families of mutually recursive types.

* * *

I recently had the case of a Listener, that's parameterized over its Broadcaster, that again is parameterized over the listener. But I didn't know Java generics well by then, and just gave up on typing it. With F-bounded polymorphism, the two types could be defined like this:


It's weird, but I'm developing a bit of intuition about it. And I have to say, the expressivity is nice, even if the syntax is bad. But my hope is that macros can help factor out some common use cases of F-bounded polymorphism.

* * *

One of the goals of typesystems like O'Caml's is to never have to store runtime type information (RTTI). But in a Lisp or other dynamic languages, users are already prepared to pay for RTTI anyway, so type systems can be layered on top of the completely dynamically type-checked language. What does this say to us as people looking for the ultimate dynamic language?

For one, there's a vista that an advanced type system like F-bounded polymorphism and a at its core dynamically type-checked language can work well together. (Cecil does exactly this.) O'Caml and Haskell programmers routinely write down the exact types of variables and functions. And anyone who has ever experienced these languages knows about the value of their type systems (no pun intended). Let's try out advanced types in dynamic languages!

Second, in a completely dynamically type-checked language augmented with an advanced type-system, compiler type errors become warnings about potential runtime type errors. Why? Because it's guaranteed that you'll get a runtime error for type violations from the dynamically type-checking core language anyhow. This means, the error listing from the compiler changes to a dynamic console of current warnings about expressions that may yield a runtime type error, if executed. But the compiler lets you start your program anyway. Until you hit the first actual error.

Tuesday, May 11, 2010

Note to you

I'm writing this blog because I think a lot about PLs and because sometimes I like to write about my thoughts.

If I seem to diss a PL, don't take it too seriously. This is just my personal outlet, and I'm basically writing for myself.

Mucho thanks for all the interesting and thought-provoking comments so far, and keep 'em coming! (They may have even changed my stance on Clojure a bit ;))

—Manuel

Monday, May 10, 2010

The Lisp Ethos of Total Dynamicity

All true Lisps have a very important feature: you can enter any expression any time. (Subject to some static rules of course. E.g. you can't use a GO outside of an enclosing TAGBODY.) Heck, you can redefine a class while you're in the debugger that's waiting for you to continue with a restart.

So, I was quite shocked to learn that Python, heralded as a dynamic language, doesn't allow you to redefine classes. Of course, redefining classes isn't something you do all the time, but in Lisp implementations, supporting stuff like this is simply a question of honor.

Thank god Slava is kicking everybody's asses left and right outside of Lisp proper. I hope this will lead other dynamic language implementors to adopt the same ethos.

In my own Lisp implementation adventures, I was (of course!) seduced to think that maybe such total dynamicity could be sacrificed. After all, I can just restart the app I wrote to have the changes take effect. But in the end, it just wouldn't be a true Lisp. And it would add a load of bollocks to the language and its specification.

Concurrency FUD

A recent comment — and a golden Czech pilsner — puts me in the right mood to comment on one of my pet peeves: the fallacy of an urgent need for in-language concurrency:
few people realize that stateful objects are dead in the water right now
Yeah, if you use them from multiple threads. Sure. But you don't have to do that, unless you're very special, or you're writing very special programs. Like say, you're a scientific programmer raised on a diet of FORTRAN and C. Then of course, poor you need all the language support for concurrency you can get.

For the kind of programming I'm interested in (internet apps and servers), do like DJB told ya and spawn a friggin' process. Really, for servers, creating much more threads than you have hyperthreads is a design mistake. That's my current thinking, YMMV. Linux offers such great new tools (like eventfd, signalfd, and timerfd), that you don't have to leave the cosy confines of your epoll-loop ever again.

Should all programs be event-driven and written as small Unix processes? Of course not. But the paradigm is applicable to so many server-style programs. And it forces you to communicate via protocols, which may lead you to a better design to begin with. And it also enables managing and upgrading clusters by simply restarting the process (or a new version). All of which in-language concurrency doesn't do.

I'm not against threads or whatever in the language, they have their uses. I just think that most of the time, another solution is actually better. If I were into non-event-driven design, I'd probably look at goroutines, which communicate via channels.

All in all, I don't think we have to redesign our languages for manycore. Keep on using them good old objects in your (hopefully) dynamic PL, and just spawn more of your apps. Quite likely, your architecture will become better for it.

Sunday, May 9, 2010

No New Ideas, Please

Nothing bothers me more than a flawed language feature, for which there's a well known, working solution in another language. Really, I think PL design is so hard, that if you design something new (like some abstraction or concept), there's a 98% chance that it will suck. Hard. So hard.

So, one of my goals with my new Lisp is to faithfully reuse large, modular parts of functionality developed by others, and proven in the field. Preferably for decades. In a way, this changes PL design from invention to curation, and I'm very happy with it.

Here's what Ell is made of:
  • evaluation and expansion: R6RS Scheme (but as a Lisp-2)
  • control flow: Common Lisp
  • micromodule system: Chez Scheme
  • compilation and phase separation: PLT Scheme
  • macros: SRFI 72
  • object system: proper subset of CLOS
  • inline C: Goo
  • condition system: Dylan
  • dynamic variables: ISLISP
All of these parts are copied, in what I hope is a faithful way, that exactly preserves the original design, or a subset of it. That's the goal, at least. Of course there's some need for editing these features, so that they fit well together, but that shouldn't change the original intent and abstractions.

Like Peter van Roy, I think that programming-in-the-small is a solved problem. Really, innovating in language design microfeatures, like say, some new way to pass parameters, is so boring, and will with great likelihood make the language landscape even more trite than it is now. Much better to reuse something that already exists, and adapt it faithfully.

All of this applies only, of course, if you're building a language that follows some common plan, like say "object-oriented dynamic language". In that case, please reuse. If you're doing something totally new and/or crazy (like, say, a functional reactive PL), you're of course free to invent what you want.

Ell Kernel Language

I'm nearing completion of the kernel language of my new Lisp, the Executable and Linkable Lisp, ell(1).

It's basically what I want from a modern Lisp, as simple as possible, but not simpler. The language will be translated to straightforward C (or C++, don't know yet), and enable true Lisp dynamicity and very deep Unix integration.

The major goal of the language is to make programming delightful. Of the "right" kinds of programs that is, since a language can't be delightful for everything.

Dave Moon's Programming Language for Old Timers also proves a great inspiration again. Am I an old-timer already? Dunno.

Specifically, Moon gives a no-nonsense description of hygiene, something which the Schemers have slight trouble with, as they seem to be a bit high on their own kleverness.

Thursday, May 6, 2010

My Next Lisp

Unix won.

The next Lisp has to rename all occurrences of foreign (and even worse, alien) to native. This alone will be a major step forward for Lisp. And thus, humanity. The next Lisp should also use all abstractions directly as they appear in POSIX: file descriptors, sockets, etc.

Ruby won.

Of course, Ruby does it wrong, but at least it's trying hard. And I think Ruby's semi-stateful, semi-functional programming model is the right way forward. For most internet apps, that is. We'll be programming crash-only (err...), event-driven servers most of the time anyway. For that kind of programming, Lisp is just right.

The original "LISP genotype" is junk DNA.

McCarthy's original LISP, you know the one built from CONS, CAR, CDR, QUOTE, etc has nothing interesting to tell us today. Clinging to or idolizing this model is the sign of a misunderstanding. Modern Lisps have nothing to do with this (flawed) historical model.

The cons is a relic.

I repeat, relic.

Think of the experts.

Instead of accommodating n00bs, the next Lisp should follow CL's model and be a language for experts, one that you grow into over the years.

Objects won.

For the type of programming done usually in Lisp, Smalltalk-style object-orientation is the jolly good icing on the cake. While that programming is arguably stone-age compared to Haskell, there's no need for Haskell envy. Haskell hasn't yet developed to the point where Haskell is the world's finest Lisp.

More: Steps towards an Acceptable Lisp.

Next Lisps

I'm a sucker for articles mentioning "the next Lisp". I just read Mark Tarver's, and I thought I'd jot down some quick notes. Not to be taken too seriously, but still.

Quotes are from The Next Lisp: Back to the Future.

Recursion as the primary mean of expressing procedure calling.

Recursion is highly overrated. When I call MAP, I don't care whether that's implemented using recursion or gotos. Also, recursion will blow the stack. :P And for some good old appeal to authority: Guy Steele says that we have to get rid of the accumulator idiom, and I heartily agree.

Heterogenous lists as the way of composing and dissecting complex data.

I'd much rather use real data structures, like java.util's List, Set, Map, and TreeMap. They seem to work great for about 99% of my data structure needs. The cons is a relic.

Programs as data.

Code is more than data.

The drive was to make [Common Lisp] compatible with the previous dialects and the easiest way was to union these features together.

Agreed. CL's kernel could be a bit smaller, without losing any cool.

The Common Lisp community has tried to deal with this through libraries. But these libraries are unsupported, and either undocumented or poorly so.

Agreed. The next Lisp has to attach itself to another platform. I root for GNU/Linux. Using the JVM means mixing business and pleasure way too much.

It’s a great language but it doesn’t look great.

I don't think Lisp can be made to look great, and I don't think it matters. Clojure seems to try this, and it's even more butt-ugly than CL.

For example, nearly every other newbie who hits CL for the first time complains about the brackets. And when they do they are told to shut up because they don't understand Lisp.

Until we have generalized 2D hypercode, the parentheses are the only sane, rational thing to do, in the ugly reality of a universe built on plain-text files.

Python is not new; its just a sugaring on the Lisp genotype in favour of a syntax that people feel comfortable with. And look how far that idea went.

%#$^#@$@$^#$%#@^@^!!!???

IMO, Python is farther away from the Lisp genotype than Java. At least Java has post-1980's scoping rules.

Also, except for popularity, Python didn't go anywhere as a language.

Tuesday, May 4, 2010

Doing the Right Thing

As I've said before, the great thing about Lisp is it's age. Like a good wine, Lisp has had a long time to age.

I think one of the best examples of the ripeness is R6RS' expansion process. It's a very good specification of a quite complex process, namely the processing of a number of expressions by the compiler.

The complexity stems from the fact that Lisp is at its core a multi-phase language: source code contains runtime values, such as variables and functions, as well as compile-time values, such as macros.

In a Lisp source file, one expression may be for execution at runtime, while the next is for execution at compile-time.

The intermingling of these different phases in a single source file seems to be somewhat of a historical accident, as one day we'll all be programming in a hypercode environment, but well, that's how it is.

Anyway, if you're writing a Lisp, study R6RS' expansion process and enjoy! :)

Understanding Hygiene (part 1)

Because macroexpansion is performed before compilation, the compiler receives only core language expressions with known semantics (LAMBDA, DEFINE, SET, IF, ...).

What's important though is that the expressions received by the compiler contain a mixture of user- and macro-written code. For example, given the following hygienic OR macro:
(define-syntax or (form1 form2)
`(let ((tmp ,form1))
(if tmp
tmp
,form2)))
The code
(or foo bar)
is translated to:
(let ((tmp foo))
(if tmp
tmp
bar))
The blue parts are user-written, the white parts are macro-written.

The important thing is that parts with different colors do not interact.

So even if we use the variable name TMP, the code still works as expected because a white TMP doesn't interact with a blue TMP.
(or foo tmp)
is translated to:
(let ((tmp foo))
(if tmp
tmp
tmp))
Without the colors, the code would always return the value of FOO.

This is one part of the equation: the macro-written white TMP doesn't shadow the user-written blue TMP.

The other part is that user-written code mustn't shadow macro-written variables. If you have a (stupid) macro:
(define-syntax my-list (&rest args)
`(list ,@args))
and a user uses it thusly:
(let ((list 12))
(my-list 1 2 3))
This expands to:
(let ((list 12))
(list 1 2 3))
Again, this causes no problem, because different colors don't interact.

Everytime you create a piece of code with quasiquote, the system generates a new unique color.