serious stuff
www.cubbi.com
personal
programming
fibonacci numbers
forth examples
postscript examples
muf examples
joy examples
j examples
scheme examples
hope examples
ocaml examples
haskell examples
prolog examples
c++ examples
java examples
assembly language examples
fortran examples
c examples
sh examples
awk examples
perl examples
tcl examples
asmix
hacker test
resume
science
martial arts
fun stuff
www.cubbi.org

Fibonacci numbers are defined as follows
F(0) = 0
F(1) = 1
F(n) = F(n1) + F(n2)
for negative n: F(n) = (1)^{n+1}F(n)
Note that examples and benchmarks made before september 2009 on this page use
the older, less general, definition, with F(0) = 1.
September 2009 UPDATE:
UPDATE IN PROGRESS: Five years since I wrote this page, I've decided to rewrite
all examples to be more useful, the subroutines will be as general as
possible (no more 2x2 matrix multiplication), all code will be commented,
each example will now take its arguments from command line (where possible)
April 2014 UPDATE:
This effort was abandoned somewhere in 2010.
ALGORITHMS AND BENCHMARKS SUMMARY
1. EXPONENTIAL COMPLEXITY 
1A  naive binary recursion execution time: T(n)=t_{o}*φ^{n} 
asm java C 
t_{o}=2.8 ns t_{o}=3.0 ns t_{o}=3.1 ns 
2. LINEAR COMPLEXITY 
1B  cached binary recursion / memoization 
pending 2009/2010 update 
2A  cached linear recursion / infinite lazyevaluated list 
pending 2009/2010 update 
2B  linear recursion with accumulator 
pending 2009/2010 update 
2C  imperative loop with mutable variables 
pending 2009/2010 update 
3. LOGARITHMIC COMPLEXITY 
3A  matrix multiplication 
pending 2009/2010 update 
3B  fast recursion 
pending 2009/2010 update 
3C  Binet's formula with rounding 
pending 2009/2010 update 
EXAMPLES
ALGORITHMS AND BENCHMARKS IN DETAIL
Please remember that these benchmarks do not show strength or weakness of any
programming language. All they show is how efficient specific language
translators are at implementing the featured algorithms. It is currently
(as of 2009) in the transitional stage from OLD to NEW benchmarks using modern
hardware and better written examples.
Benchmarked on Intel Pentium Core i7 2.66GHz running Linux 2.6.31 with 12GB RAM, in x86_64 mode
OLD benchmarks: Intel Pentium 4 2.20GHz running Linux 2.5.75 with 640M RAM
NEW RAW BENCHMARK DATA available if you want to see them, as well as
OLD RAW BENCHMARK DATA.
ALGORITHM 1A: NAIVE BINARY RECURSION 
 If n equals 0, return 0
 If n equals 1, return 1
 Recursively calculate F(n1)
 Recursively calculate F(n2)
 Return the sum of the results from steps 2 and 3
Complexity (fixed precision): O(exp n) speed, O(exp n) space
The number of additions grows exponentially with n, and each recursive call
requires memorization of state (since neither call is tailrecursive), which
makes space complexity also exponential.
This algorithm becomes uselessly slow for F(n) with n>50 and all measured
results easily fit in 64bit integer data type. With the duration of
the integer arithmetic operations independent of n, the time to execute these
programs is almost perfectly t_{0}φ^{n},
which is why the plots of the logarithm of execution time vs. n are
parallel straight lines.
This algorithm was implemented directly in all languages featured on this web
page, since all of them allow recursion. Only Fortran did not allow recursion
in the past, but it is supported since the 1990 standard.
Curiously, Java outperforms C++ on this test, but it's one of the very few
artifical tests
where it does so

Language  Translator [*]  T_{0}  T(46), seconds  
Assembly 
GNU as 2.19.1 [c] 
2.8 ns 
11.63 
log(Execution time, seconds) vs. Fibonacci number calculated

Java 
Sun Java SE 1.6.0_16b01 / HotSpot 64Bit VM 14.2b01 [b] 
3.0 ns 
12.34 
C 
GNU gcc 4.4.1 [c] 
3.1 ns 
12.57 
C++ 
GNU g++ 4.4.1 [c] 
3.1 ns 
12.61 
OCaml 
INRIA OCaml 3.11.1 [c] 
3.4 ns 
13.98 
Fortran 
GNU gfortran 4.4.1 [i] 
4.3 ns 
17.62 
Forth 
GNU gforth 0.7.0 [i] 
15.3 ns 
61.91 
Scheme 
MIT Scheme 7.7.90 20090107 [b] 
22.6 ns 
98.04 
Haskell 
GHC 6.8.2 [c] 
31.3 ns 
129.1 
Joy 
Joy1 by Manfred von Thun, 07May03 [i] 
199 ns 
813.5 
PostScript 
GPL Ghostscript 8.70 [i] 
284 ns 
1147 
awk 
GNU Awk 3.1.7 [i] 
417 ns 
1729 
MUF 
FuzzBall MUCK 6.09 [b] 
716 ns 
2939 
perl 
perl 5.8.8 [i] 
755 ns 
3101 
J 
Jsoftware J64 6.02 [i] 
1.08 μs 
4386 
Hope 
Hope by Ross Paterson, 08Dec03 [i] 
1.52 μs 
6035 
Prolog 
SWI Prolog 5.6.64 [b] 
2.17 μs 
8900 
Tcl 
Tcl 8.5.7 [i] 
3.37 μs 
13700 
sh 
bash 4.0.33(2) [i] 
101 μs 
415000 




[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1) except for MUF, where an internal time function was called from the program.
T(46) was measured directly only when it was less than 2000 seconds, otherwise it was extrapolated from execution times for smaller inputs.
In all cases I tried to pick the fastest one of the language translators that are readily available and
offer unlimited precision integer arithmetics (it becomes important in further tests).
ALGORITHM 1B: CACHED BINARY RECURSION / MEMOIZATION 
 If n equals 0, return 0
 If n equals 1, return 1
 Read F(n1) from cache or recursively calculate if not cached yet
 Read F(n2) from cache or recursively calculate if not cached yet
 Return the sum of the results from steps 2 and 3 and save it in the cache
Complexity (unlimited precision): O(n^{2}) speed, O(exp n) space
The common optimization technique applied to binary recursions is memoization:
for any given set of arguments, the recursive function is calculated only once,
and the result is stored in memory. Further calls to the function with the same
arguments return the previously calculated result.
Since no function result is calculated twice, the number of additions
required to calculate F(n) is reduced to n. The number of cache lookups
is instead exponential but, assuming it is implemented as a suitable
randomaccess data structure (array or hash table), the time of each
cache lookup is O(1).
Space requirements for this algorithm are still exponential because of
binary recursion.
In the programming languages that have builtin support for memoization
(such as J or Factor), algorithm 1A can be turned into 1B with just one
keyword. For other languages, I manually implement a suitably generic data
structure.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  




sqrt(Execution time, seconds) vs. Fibonacci number calculated





[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
ALGORITHM 2A: CACHED LINEAR RECURSION 
When previously calculated Fibonacci numbers are stored in a data structure,
there is no need for binary recursion, the entire structure can be filled
with only n iterations.
 infinite lazy evaluated list.
 Construct an infinite list F with F[0] equal 0 and F[1] equal 1
 Define F[i] as the sum of F[n1] and F[n2].
 Return the nth element of the list
In the languages that support lazy evaluation, either implicitly (like Hope
or Haskell) or explicitly (like OCaml or Scheme), it is possible to
construct the data structure containing all Fibonacci numbers and simply
return the nth element of the structure.
 randomaccess container.
 Create a container F with the elements F[0] equal 0 and F[1] equal 1
 Recursively (or iteratively) calculate the elements F[2..n]
according to the formula F[i]=F[i1]+F[i2]
 Return F[n]
In the languages that only support eager evaluation, the entire data
structure must be filled with an explicit loop (or a tailrecursive function).
If the container permits random access with O(1) complexity, the total
execution complexity is O(n), assuming fixed precision. For large precision
numbers, the complexity of addition grows with n as well.
 Complexity (fixed precision): O(n) speed, O(n) space
Complexity (unlimited precision): O(n^{2}) speed, O(n^{2}) space.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  




sqrt(Execution time, seconds) vs. Fibonacci number calculated





[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
ALGORITHM 2B: LINEAR RECURSION WITH ACCUMULATOR 
 Define a recursive function λ(n_{1},n_{2},n) such as
 λ(n_{1},n_{2},n) returns 0, if n=0 and 1 if n=1
 λ(n_{1},n_{2},n) return the result of
λ(n_{2},n_{1}+n_{2},n1), if n>1
 Return the result of λ(0,1,n)
Complexity (fixed precision): O(n) speed, O(1) space
Complexity (unlimited precision): O(n^{2}) speed, O(n) space
These algorithms execute n additions to calculate F(n), and store the intermediate
results in the recursive function's additional arguments, which serve as
accumulators. The time to execute is, therefore n * time to execute
one addition and one recursive call. The addition of large
numbers if O(n), and recursive calls are typically constant time and never greater
than O(n). The time to execute the algorithm is T(n)=t_{0}n^{2}.
which is a straight line in the plot of sqrt(T) vs. n.
This recursive functions is easily written as tail recursion, which means
it does not contribute to space complexity, but space is needed to store
the accumulated sum, which grows as O(n). In the languages that do not support
tail recursion, such as C/C++, space grows as O(n^{2}), and the
algorithm quickly becomes useless. Such languages are more suited for 2C (iteration) anyway.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  
Haskell 
GHC 6.8.2 [c] 
12.4 fs 
12.42 
sqrt(Execution time, seconds) vs. Fibonacci number calculated

Prolog 
SWI Prolog 5.7.15 [b] with GMP 4.3.1 
21.3 fs 
22.22 
OCaml 
INRIA OCaml 3.11.1 [c] 
32.2 fs 
32.14 
Java 
Sun Java SE 1.6.0_16b01 / HotSpot 64Bit VM 14.2b01 [b] 
34.1 fs 
31.27 
Scheme 
MIT Scheme 7.7.90 20090107 [b] 
48.8 fs 
47.88 
Forth 
GNU gforth 0.7.0 [i] 
 
 
C++ 
GNU g++ 4.4.1 [c] with GMP 4.3.1 
 
 
J 
Jsoftware J64 6.02 [i] 
 
 




[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
J reports stack overflow when trying to calculate numbers over ~3000,
and I could not find a way to increase available stack size in jconsole
C++ linked with GMP consumes memory at a very large pace, exceeding 20Gb when calculating
500,000th number
Forth (Gforth linked with GMP) also consumes memory at a very large pace, running into tens
of gigabytes about the 500,000th number
ALGORITHM 2C: IMPERATIVE LOOP WITH MUTABLE VARIABLES 
 Assign value 0 to variable n_{1} and value 1 to variable n_{2}
 Assign the value of n_{1}+n_{2} to variable n_{2} and assign the old value of n_{2} to n_{1}
 Repeat step 2 n times.
 Return the value of variable n_{1}
Complexity (fixed precision): O(n) speed, O(1) space
Complexity (unlimited precision): O(n^{2}) speed, O(n) space
These algorithms execute n additions to calculate F(n), and store the intermediate
results in mutable variables. Time to execute is, therefore n * time to execute
one addition and two variable read/store operations. The addition of large
numbers if O(n), and store/retrieve is typically constant time and never greater
than O(n). The time to execute the algorithm is T(n)=t_{0}n^{2}.
which is a straight line in the plot of sqrt(T) vs. n.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  
Forth 
GNU gforth 0.7.0 [i] with GMP 4.3.1 
11.1 fs 
11.13 
sqrt(Execution time, seconds) vs. Fibonacci number calculated

C++ 
GNU g++ 4.4.1 [c] with GMP 4.3.1 
23.2 fs 
22.62 
Java 
Sun Java SE 1.6.0_16b01 / HotSpot 64Bit VM 14.2b01 [b] 
31.6 fs 
29.52 
OCaml 
INRIA OCaml 3.11.1 [c] 
35.9 fs 
34.52 
Scheme 
MIT Scheme 7.7.90 20090107 [b] 
50.8 fs 
48.46 
J 
Jsoftware J64 6.02 [i] 
413 fs 
403.6 
Prolog 
SWI Prolog 5.7.15 [b] with GMP 4.3.1 
73.8 ns 
 
Haskell 
GHC 6.8.2 [c] 
 
 
Hope 
Hope by Ross Paterson, 08Dec03 [i] 
 
 




[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
Prolog usually does not have mutable variables, but some implementations do.
Haskell does not have the concept of mutable variables.
Hope does not have the concept of mutable variables.
ALGORITHM 3A: MATRIX MULTIPLICATION 
This is the most mathematically elegant way to calculate a Fibonacci number.
Complexity (fixed precision): O(log n) speed, O(1) space
Complexity (unlimited precision): O(n log^{2} n log log n) speed, O(n) space
Raising a matrix to the power of n requires only O(log n) matrix
multiplications, if n is a natural number and the repeated squaring algorithm
is used. Space requirements for repeated squaring include the product
accumulator and the last calculated square, which makes it O(1) (in fixedprecision case)
unless nontailrecursive implementation is used.
In case of unlimited precision, execution time of each 2x2 matrix multiplication
is proportionate to the execution time of one integer multiplication,
which has the complexity of O(n log n log log n) and space requirements to store
one 2x2 matrix grow as O(n).
The repeated squaring algorithm for calculation of x^{n}:
 start with product accumulator a=1 and last square accumulator s=x
 if n is odd, increase the accumulated product, a=a*s
 calculate the next square of s, s=s^{2}
 divide n by 2, n = n div 2
 repeat until n=0, at which time the final value of a is the answer
In languages providing bit operations, this algorithm can examine the bits
in the binary representation of n, proceeding from leastsignificant
to mostsignificat bit, instead of repeating division by 2. The example in J
makes good use of this approach.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  




sqrt(Execution time, seconds) vs. Fibonacci number calculated





[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
ALGORITHM 3B: FAST RECURSION 
This algorithm can be derived either from from the fact that, from definition,
F(n+m)=F(n)F(m) + F(n1)F(m1), or by rewriting the matrix equation using the
fact that A^{n+m}=A^{n}A^{m}:
This algorithm is faster than than 3A because it does not repeat calculations
of the same numbers (top right and bottom left corners of the matrix in 3A),
but it cannot be easily implemented as a tailrecursive function, resulting
in O(log n) space complexity.
I implement it this way:
 Define a recursive function λ(n) such as
 λ(n) returns a pair of numbers (0,0) if n equals 0
 λ(n) returns a pair of numbers (1,0) if n equals 1
 recursively calculate λ(n div 2) (integer division,
(n1)/2 for odd n) to obtain a pair of numbers (k_{2},k_{1})
 for even n, return a pair of numbers
(2k_{1}+k_{2})*k_{2},
k_{1}^{2}+k_{2}^{2})
 for odd n, return a pair of numbers
((k_{1}+k_{2})^{2} ,
(2k_{1}+k_{2})*k_{2})
 The end result is the first element of the result of λ(n)
Complexity (fixed precision): O(log n) speed, O(log n) space
Complexity (unlimited precision): O(n log^{2} n log log n) speed, O(n log n) space

Language  Translator [*]  t_{0}  T(1,000,000), seconds  




sqrt(Execution time, seconds) vs. Fibonacci number calculated





[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
ALGORITHM 3C: BINET'S FORMULA WITH ROUNDING 
Eigendecomposition of the matrix equation from 1A gives a formula,
published by Abraham de Moivre in 1730, Euler in 1765, and by Jacques Philippe
Marie Binet in 1843
Where φ is the golden ratio, (1+sqrt(5))/2.
Because (1phi^{n}) / sqrt(5) is less than 1/2 for all positive n, this
formula can be simplified to nearest integer to φ^{n}/sqrt(5) or,
using the floor function,
Complexity (fixed precision): O(log n) speed, O(1) space
Complexity (unlimited precision): O(n log^{2} n log log n) speed, O(n) space
Evaluation of this formula, just as evaluation of the formula in 3A, is thus
reduced to taking an integer power of a number, only this number is irrational
and must be represented as a floatingpoint value with sufficient precision to
produce the correct fibonacci number. Not many languages support arbitrary
precision floating point values, I use GNU MP library where possible.
The computational and space complexity of this algorithm are the same as 1A,
and are the same as the complexity of the repeated squaring algorithm with one
addition: time to calculate the square root of five with sufficient precision.
Where not provided by the compiler/library, I iteratively calculate inverse
square root via y(n+1) = 0.5*y(n)*[3  a*y(n)^2] because this recurrent
expression has no division.

Language  Translator [*]  t_{0}  T(1,000,000), seconds  




sqrt(Execution time, seconds) vs. Fibonacci number calculated





[*]  [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
LINKS
Wolfram's MathWorld entry
OEIS entry
