Wednesday, March 17, 2010

Polyvariadic PrimeFib problem

begin Numbers.lhs

Some time ago, I claimed in a reddit comment, that it's possible to write a function that accepts only prime number of arguments, that are all strings, but occurrences of integers at the indices that are Fibonacci numbers. Sounds cool and a little bit retarded, doesn't it? kalven asked about it, so now I have to prove, that I wasn't bluffing.

But why would I even say such a thing? Well, it started with this post about polyvariadic functions in Haskell. The author claimed, that most modern languages have support for functions taking variable number of arguments. Let's take a look at Java: this "support" consists of hidden array creation and passing that array as a single argument. Other languages work in a similar way, with the exception of C, where everything is allocated on the stack, and it works thanks to grit, spit and a whole lotta duct tape. This approach isn't as glamorous as it sounds, and there are disadvantages. All arguments have to be of the same type, even in dynamically typed languages - they only have one universal type. To use arguments that have different types, they have to be upcasted at the call site (e.g. to Object in Java) and later downcasted in such vararg function. It is unsafe and forces all arguments to have the same size or use autoboxing. In Haskell, thanks to better syntax and boiler-plat free list creation, it's easy to pass many arguments to a function, just pass a list. Compare:

Apologies for all those underlined words, it seems that my blog software doesn't like certain bad words.

class Test {

public static void main(String[] args) {
int x = foo(1, 2, 3, 4);
System.out.println(x);
}

static int foo(Object... args) {
return args.length;
}
}

With this:

main = do {
x <- foo[1, 2, 3, 4];
print x;
}

foo args = do {
return (length args);
}

The kind of parens is the only difference in syntax or semantics (of vararg function call). Of course, being of the same type means something different in Java (sub-typing), than in Haskell, but that's not the point of this post. The point is, that Haskell is the only usable language*, that truly allows polyvariadic functions, because it's possible to write functions that take variable number of variably typed arguments, which is not possible to fake with a hidden list wrapping. As usual, now is the time to write as usual, Oleg already did it. Some polyvariadic functions are easy to implement (e.g. one that takes even number of arguments, with alternating integer and string arguments, do try to implement it), some could actually be useful.

* usable language is one, that makes it possible to download pics from the internet.

Others are just a useless, contrived example, but can prove that anything is possible in Haskell. The problem with the function of prime number of arguments, where Fibonacci indices are integers, while the rest are strings has been solved by cdsmith. Nevertheless, I want to present another two solutions - one is easier to implement, but full of type hacks, and the other needed some thinking specific to this problem, but works much better - no boiler-plate required. Solutions are independent, so don't get discouraged by the first one, you can still be able to enjoy the second.

This post, like the previous one, needs ghc-6.12.2, 6.10 or 6.12.1 with hacked HList library. It's splitted across 4 files for modular reasons, the code is available at the blog repo.

Both solutions rely on prime and Fibonacci numbers, so we need type predicates for deciding if the number is prime/fib. I'm lazy and I didn't feel like translating regular algorithms to the type level, so I've assumed, that there are no prime/fib numbers bigger then 20, that way, it's possible to enumerate the positives, and choose all the other numbers as negatives.

Both predicates are similar, type-level function, that results in HTrue/HFalse, method has default bottom definition, because it will be only used at the type-level. Implementation uses classic trick, but as usual, I prefer equality constraints to TypeCasts wherever possible.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , TypeFamilies
> , UndecidableInstances
> , TypeSynonymInstances
> , OverlappingInstances
> #-}

> module Numbers where
> import Data.HList
> type S n = HSucc n
> type Z = HZero
> class IsPrime n result | n -> result where
> isPrime :: n -> result
> isPrime = undefined
> instance IsPrime(S(S Z)) HTrue
> instance IsPrime(S(S(S Z))) HTrue
> instance IsPrime(S(S(S(S(S Z))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S Z))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S Z))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))))) HTrue
> instance result ~ HFalse => IsPrime a result
> class IsFib n result | n -> result where
> isFib :: n -> result
> isFib = undefined
> instance IsFib(Z) HTrue
> instance IsFib(S(Z)) HTrue
> instance IsFib(S(S Z)) HTrue
> instance IsFib(S(S(S Z))) HTrue
> instance IsFib(S(S(S(S(S Z))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S Z)))))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance result ~ HFalse => IsFib n result

begin HListExtras.lhs

Before we get to the first solution, we need two list functions, that are missing from HList library. hPartition and hAll are type-level equivalents of partition and all functions from Data.List module. The implementation follows closely the style of the other, higher-order functions from HList, so there aren't any explanations, you can always read the excellent HList paper. If you want more explanations, please ask in comments/email/irc, I could explain it some more if there's a need.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> #-}

> module HListExtras where
> import Data.HList
> class HPartition p l r1 r2 | p l -> r1 r2 where
> hPartition :: p -> l -> (r1,r2)
> class HPartition' flag p l r1 r2 | flag p l -> r1 r2 where
> hPartition' :: flag -> p -> l -> (r1,r2)
> instance HPartition p HNil HNil HNil where
> hPartition _ _ = (HNil, HNil)
> instance ( Apply p x flag
> , HPartition' flag p (HCons x xs) r1 r2
> ) => HPartition p (HCons x xs) r1 r2 where
> hPartition p l@(HCons x _) = hPartition' (apply p x) p l
> instance HPartition p xs r1 r2 => HPartition' HTrue p (HCons x xs) (HCons x r1) r2 where
> hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
> in (HCons x r1, r2)
> instance HPartition p xs r1 r2 => HPartition' HFalse p (HCons x xs) r1 (HCons x r2) where
> hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
> in (r1, HCons x r2)
> class HAll p l r | p l -> r where
> hAll :: p -> l -> r
> instance HAll p HNil HTrue where
> hAll _ HNil = hTrue
> instance (Apply p x b1, HAll p xs b2, HAnd b1 b2 b) => HAll p (HCons x xs) b where
> hAll p (HCons x xs) = apply p x `hAnd` hAll p xs

begin Solution1.lhs

Before we can write a function that meets the specification, we have to decide what this function should actually do:) It's funny, if a type-blind person would read the problem, the only conclusion would be, that it's possible to write a function in Haskell - truly amazing. The easiest thing to do (beside ignoring arguments and returning unit value), is to return a pair of lists, one will collect integers, and the other strings.

So what is the first solution about? It's based on a simple idea, we start with a function that accepts any number of integer and string arguments, in any configuration, and then we write a constrained wrapper.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> , TypeFamilies
> , TypeSynonymInstances
> , NoMonomorphismRestriction
> , OverlappingInstances
> #-}

> module Solution1 where
> import Data.HList hiding (TypeEq, typeEq)
> import Data.HList.TypeEqGeneric2
> import HListExtras
> import Numbers
> type X = ([Int], [String])

Function foo is defined inductively, very similar to c function from introduction of Oleg's polyvariadic page, nothing fancy here:

> class Foo result where
> foo :: X -> result
> instance Foo X where
> foo (ints,strings) = (reverse ints, reverse strings)
> instance Foo r => Foo (Int -> r) where
> foo (ints,strings) = \k -> foo (k:ints,strings)
> instance Foo r => Foo (String -> r) where
> foo (ints,strings) = \s -> foo (ints,s:strings)
> testFoo = foo ([],[]) (1::Int) "2" "3" (4::Int)

We have to specify Int types of every argument, because otherwise it's just a type variable. Strings are already monomorphic. When we decide that foo has enough, we force it to be of final X type.

*Solution1> :t testFoo
testFoo :: (Foo t) => t
*Solution1> testFoo :: X
([1,4],["2","3"])

Now, if we could get a hold of the final type of particular usage of foo function, extract from that the list of argument types, it would be very easy to verify correct usage.

Turning type of function into a heterogeneous list of types (values are bottoms) of arguments is performed by the following, type-level function. It's very similar to ResultType from my previous post, but it returns every type but the last one, which is the type of the result. It relies on the same trick to overcome problems with functional dependencies and overlapping instances.

> class FunctionArguments f r | f -> r where
> functionArguments :: f -> r

Base instance, if the type is not a function, return empty list, we don't care about result type:

> instance result ~ HNil => FunctionArguments x result where
> functionArguments _ = HNil

Recursive case - if argument is of arrow type, put the argument into list head and recurse down the rest of the type. Value of this type is of course bottom, it's not possible to come up with anything meaningful, but it's not needed fortunately, we only care about the types. Similarly, the rest of the type is a result of applying dummy bottom value to our input function.

> instance ( FunctionArguments rest others
> , result ~ (HCons x others)
> ) => FunctionArguments (x -> rest) result where
> functionArguments f = HCons undefined $ functionArguments $ f undefined

In Haskell it's easy to get a hold of value of the result type, and use it inside its definition, thanks to the following trick:

primeFib = let r = foo ([],[])
_ = ....
in r

In that dotted block, we can operate on the final value. The problem with let-expressions is, that they are polymorphic, and this together with other very polymorphic, type-level code can result in ambiguities, that can be resolved only with explicit type signatures and usage of ScopedTypeVariables. But, I don't want to write down that type, it's very complicated (over 10 lines!), because it records the whole algorithm. Fortunately, we can choose the case-expression, which has monomorphic bindings, and let ghc figure out the type:

> primeFib = case foo ([],[]) of
> r -> let _ = checkList $ functionArguments r
> in r

It's easy to see, that this block doesn't do anything, but type-checker still takes it under consideration, so it's possible for it to constrain the resulting type. The only thing left, is to define checkList function, that will do all those calculations.

There's one more type-level (defined in terms of classes) piece of code needed: equivalent of zip [1..] value-level function, that will pair every element of the list with its index, which will allow to check its primality and fibonality.

NumberList takes a counter, that represents the first free index number, list, and returns list of numbered pairs:

> class NumberList n l result | n l -> result where
> numberList' :: n -> l -> result

Base case is trivial, there's nothing to number, return empty list:

> instance NumberList n HNil HNil where
> numberList' _ _ = HNil

Inductive case, tag the head value of the input list with the current counter, and recurse down the list with incremented counter:

> instance (HNat n, NumberList (S n) xs ys) => NumberList n (HCons x xs) (HCons (n,x) ys) where
> numberList' n (HCons x xs) = HCons (n,x) $ numberList' (hSucc n) xs

Numbering starts with one:

> numberList = numberList' (hSucc hZero)

checkList function, like any code operating on a list in a functional language, will use higher-order functions, if you remember explanations from my previous post, this requires defunctionalization. There are 4 HOFs used, and their arguments have to be adjusted accordingly:

  1. map snd
> data Snd = Snd
> instance Apply Snd (a, b) b
  1. partition fibIndex partitions list of pairs, according to predicate that checks fibonality of the index.
> data FibIndex = FibIndex
> instance IsFib n r => Apply FibIndex (n,x) r
  1. (and 4.) there's all isInt and all isString, this could be implemented as two different functions, but it's possible to defunctionalize partial application of type-level equivalent of ==, TypeEq. This creates closures, that have to be reifed as data types in normal circumstances, but at the type level we use type parameter, and since we don't really care about value level in these predicates, we don't need any constructors, but I'm not a fan of EmptyDataDecls. This makes Is a phantom type.
> data Is x = Is
> instance TypeEq x y r => Apply (Is x) y r

I think, that checkList's definition doesn't need any explanations, we already have all the hard type-level parts defined, the rest is just functional programming:

> checkList l = intsOk `hAnd` stringsOk `hAnd` isPrime len :: HTrue
> where (fibIndices, nonFibIndices) = hPartition FibIndex $ numberList l
> removeIndices = hMap Snd
> isInt = Is :: Is Int
> isString = Is :: Is String
> intsOk = hAll isInt $ removeIndices fibIndices
> stringsOk = hAll isString $ removeIndices nonFibIndices
> len = hLength l
> testPrimeFib1 = primeFib (1::Int) (2::Int) (3::Int) "4" (5::Int) -- correct
> testPrimeFib2 = primeFib (1::Int) "2" -- string at fib index
> testPrimeFib3 = primeFib (1::Int) (2::Int) (3::Int) "4" -- not prime number of args
*Solution1> testPrimeFib1 :: X
([1,2,3,5],["4"])
*Solution1> testPrimeFib2 :: X
Top level:
Couldn't match expected type `HTrue' against inferred type `HFalse'
When using functional dependencies to combine
HAnd HFalse HTrue HFalse,
(...)
*Solution1> testPrimeFib3 :: X
Top level:
Couldn't match expected type `HTrue' against inferred type `HFalse'

*Solution1> :t testPrimeFib1
testPrimeFib1
:: (HMap Snd r2 ys2,
HMap Snd r11 ys1,
HAnd r r1 t'',
IsPrime (HSucc (HSucc (HSucc (HSucc (HSucc n))))) result,
HAnd t'' result HTrue,
HAnd HTrue b23 r1,
HAll (Is String) ys2 b23,
HAnd HTrue b2 r,
HAnd HTrue b21 b2,
HAnd HTrue b22 b21,
HAll (Is Int) ys1 b211,
HAnd HTrue b211 b22,
HPartition FibIndex ys r11 r2,
NumberList
(S (HSucc (HSucc (HSucc (HSucc (HSucc HZero)))))) others ys,
HLength others n,
FunctionArguments t others,
Foo t) =>
t

It works. Pros of this approach:

  • easy implementation:
    • many things available in HList
    • others (like FunctionArguments) are very general and are usable with other, non variadic problems
    • problem boils down to list manipulation
  • it scales to other contrived examples - only checkList has to be modified

Cons:

  • a lot of boiler-plate at call-site
  • huge types of intermediate expressions
  • unreadable type errors

begin Solution2.lhs

This time, we'll analyze this specific problem, and come up with much better and easier solution. What's the first problem with primeFib from my Solution1, and test from cdsmith's solution? Asking ghci about the type of both those functions results in x - simple type variable. Sure, there's also entire algorithm hidden in the type context, that could hide in the wardrobe and scare little children at night. But, we could tell more about that type, because we understand the problem. The function has to take only prime number of arguments, and the smallest prime number is 2. 1 and 2 are Fibonacci numbers, so clearly the type of bare primeFib function should be Int -> Int -> r, that much we can tell without choosing what to do later.

The second solution will be more aggressive, instead of checking the type at the end, when the only possible outcome is a type error, this approach will try to generate the correct type. It won't use any type-level tricks, in fact there will be three classes without functional dependencies - I almost forgot how to use those.

Here's how the algorithm will work. Let's assume that we are in the middle of computation and our functions has the following type:

primeFib :: a1 -> ... -> an -> r

and we want to generate the r part. The first thing to check, is to decide if n is a prime number. If so, we have two choices (let's call this situation overlap), we can either stop and return the accumulator (did I mention, that we are carrying an accumulator?), because we already have accepted prime number of arguments, or we can generate a bigger solution (this boils down to the other primality branch). If n isn't prime, it's clear that we must accept more arguments (this situation will be called generate), so r will be a function type an+1 -> r'. Now we have to decide the type of an+1: if n+1 is a Fibonacci number, an+1 will be Int, otherwise it is String. What about r' ? Well, we repeat the process with the successor of n.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , FlexibleContexts
> , UndecidableInstances
> , TypeSynonymInstances
> , NoMonomorphismRestriction
> , OverlappingInstances
> #-}

> import Data.HList
> import Numbers
> type X = ([Int], [String])

Foo will be the entry point to the algorithm. n is our counter - how many arguments have been already eaten. foo method has to carry our accumulator. Notice that there are no functional dependencies - we want it to have many instances (as many as there are prime numbers, which is a finite number, when there are no primes bigger than 20 :)

> class Foo n result where
> foo :: n -> X -> result

Foo has only one instance, which means that it's a class synonym, and it's possible to write it as a function. I prefer a class here, because it will be used in two places, function approach would mean writing down twice this synonym, class approach can use type simplifier to do this rewriting.

This instance calls the next, auxiliary class, with an additional knowledge of the primality of n. It will be calculated as soon as n is known, because IsPrime is a type-level function.

> instance (IsPrime n flag, Foo' flag n result) => Foo n result where
> foo n acc = foo' (isPrime n) n acc

Foo' class can now dispatch on the primality of the counter (flag argument):

> class Foo' flag n result where
> foo' :: flag -> n -> X -> result

So, n isn't prime, we forward to the Generate class, but we also provide Generate class the result of IsFib on the successor of n, so it can dispatch on that fact.

> instance (HNat n, IsFib (HSucc n) flag, Generate flag n result) => Foo' HFalse n result where
> foo' _ n acc = generate (isFib $ hSucc n) n acc

Generate class has a functional dependency, if we know that n+1 is a Fibonacci number, we know which instance to choose - thus we know part or the resulting type.

> class Generate flag n result | flag -> result where
> generate :: flag -> n -> X -> result

n+1 isn't a Fibonacci number, generate a function from String and continue the recursive process by calling Foo with a bigger counter. At the value level, return a function that will insert eaten argument to the accumulator.

> instance (HNat n, Foo (S n) result) => Generate HFalse n (String -> result) where
> generate _ n (ints,strings) = \s -> foo (hSucc n) (ints, s:strings)

The other case, similar to the previous one, but for integers:

> instance (HNat n, Foo (S n) result) => Generate HTrue n (Int -> result) where
> generate _ n (ints,strings) = \k -> foo (hSucc n) (k:ints, strings)

Now the other branch of our first if, if counter was a prime number, go to the Overlap situation:

> instance Overlap n result => Foo' HTrue n result where
> foo' _ n acc = overlap n acc

This class has two overlapping instances - this is what stops the process, because it cannot decide which one to choose without further knowledge.

> class Overlap n result where
> overlap :: n -> X -> result

Final case - our result type has been forced to X, return reversed accumulators:

> instance Overlap n X where
> overlap _ (ints, strings) = (reverse ints, reverse strings)

Otherwise, go to the Overlap case. There's no need to decide if n+1 is a Fibonacci number, we can reuse Foo' HFalse instance, which does just that:

> instance (Foo' HFalse n result) => Overlap n result where
> overlap n acc = foo' hFalse n acc

Obviously, we start with empty accumulators and counter value of zero - we haven't accepted any arguments yet.

> primeFib = foo hZero ([],[])

This solution always generates the best approximation of the final type:

*Main> :t primeFib
primeFib :: (Overlap (HSucc (HSucc HZero)) result) => Int -> Int -> result
*Main> :t \a b c d -> primeFib a b c d
\a b c d -> primeFib a b c d
:: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) =>
Int -> Int -> Int -> String -> Int -> result

This results in less boiler-plate required at the call-site:

> testPrimeFib1 = primeFib 1 2 3 "4" 5
*Main> :t testPrimeFib1
testPrimeFib1 :: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) => result
*Main> testPrimeFib1 :: X
([1,2,3,5],["4"])

And better error messages:

*Main> primeFib "1"
:1:9:
Couldn't match expected type `Int' against inferred type `[Char]'
In the first argument of `primeFib', namely `"1"'

*Main> primeFib 1 2 3 "4" :: X
Top level:
Couldn't match expected type `([Int], [String])'
against inferred type `Int -> result'

The only disadvantage of this approach is the fact, that it required thinking specific to this problem, it could be much harder (impossible?) in different, contrived problems.

That is all, thanks for reading, comments are welcome.

1 comment: