HANA Project

Summary

Inspired by .NET, we proposed and initiated a new system of library code for Science. It is emphasized that the same terminologies as we find in our textbooks should be used for class names in our library code. The language C# invented by Microsoft is adopted in this project. It is suggested that all programmers should keep in mind the idiom of Divide and conquer, and Handle all and all. As a first step, we present the library code for several subjects. We clearly distinguish between the model independent and dependent parts, while we use familiar terminologies like LinearEquation, Integration, Mass, Acceleration, Force, and NewtonEquation, etc, as class names.

Code

Science

Prologue

In Feb. 2002, Microsoft company launched a new program language named C#, which is strongly object-oriented. It is argued that C# language has three characteristic features, 1. C# is as elegance as Java, 2. C# is as powerful as C++, 3. C# is as productive as Visual Basic. Although it is still questionable whether or not C# is useful in scientific computing in academic area, we here adopt C# because of its strong character of object-oriented language.

Reusability is always one of main concerns in the area of software. Scientific computing also should seriously require reusability. We notice many efforts to improve reusability. The web site managed by Troyer is remarkable. The library code for density matrix renormalization group method invented by White is also well known. These library codes are written in C++. There are many fortran codes in netlib and CPC. It is clear that there is a general trend of transition from non object-oriented to object-oriented language.

The main advantage of C# over C++ is efficiency in making Windows applications. Since computer users are addicted by Windows operating system, user friendly programs would be Windows applications. As far as end users need more user friendly programs, C# can be useful. One good point is that so called by Mono Project  is carried on to develop the C# compiler for Linux. After successful finish of Mono project, we guess that C# will become more popular.

The purpose of this project is to propose a library system with C# as a preparation for the battle between C# and C++, which will take place soon. At present, C++ is the most powerful. However, after someone makes ease access to parallel computing in loosely connected Windows operating systems, we expect that C# can compete with C++ even in scientific computing.

Rules of Thumb

In order to improve the power of scientific computing, we have to consider the three factors: CPU time, RAM memory, and Coding time. For some complicated problem, the main factor is nothing but coding time. In this situation, someone want to use standardized reusable components. When we say Hamiltonian, everyone in physics community can understand the meaning of it. Hence, there must be a class called Hamiltonian. Here we should notice the usefulness, when we use the same terminology as we find in our textbooks. All library codes must be organized as our textbooks and our library. The system of library code should not be in simple alphabet order, but would be in subject based order as books in a library. This subject based integration of library code will result in high correlations between codes, in other words, it will need high coherence between codes. That is why we need rules of thumb to build a library code. As we write a book and stack books in a library, we make codes for a subject coherently. In consequence, this library code is called Science. As we can see in a real library, we can divide Science as Biology, Chemistry, ElectricalEngineering, Mathematics, MechanicalEngineering, Physics, etc. In a slightly different view point from .NET, some class names appear in multiple places of Science. For example, the class Momentum will be written in both GeneralPhysics and ClassicalMechanics in maybe different coding.

In order to increase reusability, we state two idioms that reflect common sense conventions in system design.  Divide and conquer. This idiom is so valid in algorithms. Clearly we must not mingle separate concepts.  Handle all and all. It is better to distinguish model-independent and model-dependent parts. The model-independent parts are properly organized and will be upgraded in the future. Numeric workers should be familiar with the whole structure of model-independent parts, and should handle model-independent and model-dependent parts separately. In order to emphasize this way of coding, we denote HANA, which is abbreviation for Handle All aNd All, where the first All means model-independent parts and the second All means model-dependent parts.  Thus there are two folders as Science and ScienceTest. Explicit numbers related with a specific model will be given in the model-dependent part, ScienceTest. We also define HANA project as efforts to make Science.

In scientific computing, programmers always want to know results as quickly as possible. Because of this rush habit, program codes become less structural, and reusability is lost. In order to enhance reusability, it is strongly proposed that scientists should make scientific codes in following a machine checked rule of physical inputs and outputs in self-closed structure. Three folds of this rule are explained as follows.

  1. Give proper and full names for public class, property and method outputs. Be careful in choosing namespace names and class names. Names for namespaces would be names of real books. An easy way to choose names of class is to use index in end pages of textbooks. Allow no space in a full name, instead use upper case characters, for instance, ProteinFolding, SpecialFunction, QuantumMechanics, etc. This rule was adopted in the textbook written by Deitel, C# How to program. Since the method names of Numerical Recipes are strange, it is better to hide the codes. Thus, we use Numerical Recipes from addint to zroots, however, as private methods or private classes.
  2. Use class type inputs for public constructor, method and delegate arguments. All input and output names must be physical quantities. This means that we can not use usual input arguments as int or double parameters for methods or constructors. All input arguments must be class types, for example, Position and Momentum in AngularMomentum(Position r, Momentum p). As an exception, we allow us to use primitive types for mathematical classes.
  3. Make self closed structure. All classes as inputs and outputs must be self closed in the base of .NET framework class library. We guess that correlation between classes in different directories definitely makes future users confused. Thus, add all codes in a single directory. As an exception, we allow us to use classes of Mathematics in other subjects. Thus, the whole structure of Science Code .Net would be given by the following figure.

When we prefer to add more rules in the future, what we have to consider is whether or not computer can verify the obedience of these rules. This consideration is essential in an internet-base posting system.

As a library stacks books with classification, any code library which satisfies the above rules will be accumulated into Science. As the size of the library increases, we have to upgrade the version of each code.

Physics

Following the above rules, we construct Physics. All files related to physics exist in a single directory named Physics, which contains many subdirectories. Usually, the namespaces are named after textbooks, while the names of classes are given by index. We make the subdirectories: ClassicalMechanics, QuantumMechanics, Electromagnetism, StatisticalMechanics, GeneralPhysics, etc.

One research oriented code for Physics is ExactDiagonalization. This object-oriented programming for physics is invented mainly only for exact diagonalization. We expect that Monte Carlo and Density Functional Theory will be included. Hence upgrade Physics should appear soon.

Epilogue

We proposed a library code system, emphasizing that class names should be the same as we find in textbooks. Like an well known idiom in algorithm, Divide and conquer, we introduced Handle all and all for this library system. The first all means library codes and the second all means our specific project codes. Hence, there are only two folders: one is a library and the other is our specific model dependent project. As the library grows, we expect that each class becomes strongly correlated with others. To change codes is not an easy task because of coherence between codes. Thus, we need experts as editors, who can handle the task of upgrading version.

One example of the general procedure for upgrading is the following. Someone downloads the code of Science or ExactDiagonalization. And, if he feel that the class Symmetry should be involved or can be improved, then he should change the codes, and post it. If this change is worthwhile for everyone, then it should be reflected in the next version with clear notification of contributors by a corresponding editor. Along this line, the library code, Science will be upgraded into higher version.

We know that nowadays many theoretical physicists spend more time in complicated computing, which is however conceptually trivial. We agree that physicists need to do philosophical and conceptual thinking more. The author likes to imagine that a higher version of Science will reduce the time of computing for all scientists.

There is a proverb in Korea, For a new wine, use a new filter. If someone wants to make a new library code system in the base of new concrete rules, then use a new computer language C#. There is another proverb in Korea, To start is to finish the half. Since we made GeneralPhysics, we finished the half of Science. However, coding Science will never end.

Acknowledgment

The author learned C# language when he stayed at Yale University. The author gratefully thanks to Professor Paul Hudak, who did an excellent job in his C# class.