Tuesday, February 26, 2008

DataObjects.Net 4.0 and Silverlight

I want to publish some info on our intention to support Silverlight in DataObjects.Net v4.0.

Facts:
-
Silverlight seems to be the future of interactive web applications. It should replace Flash in some (not very predictable, but ;) ) future. One of the greatest features of Silverlight is that it has integrated core part of .Net Framework - so Microsoft finally got a good motivation to get at least feature-limited, but portable version of it. Silverlight v1.1 alpha runs on Macs right now.
- The bad thing is that now there is quite rough information on what they're going to support. Ok, CLR... But which assemblies will be available? Silverlight installer is just ~ 4Mb, so probably there will be really quite limited set of assemblies. For now only quite brief information on it is available (btw, any help here is quite appreciated). The good thing is that they're already supporting all .Net framwork 2.0 MSIL features - i.e. generics, etc. will be working. They already declared LINQ will be integrated, that means a lot.


Let's compare this with what we have now: ok, we may expect some problems with delivering Silverlight based version of DO 4, and the
most complex of them are:

1. We use Reflection.Emit in some cases - it looks like it won't be supported. But now our code is much mode modular, i.e. this means we should provide less efficient implementation of Tuples & some generated delegates we're using. Basically, there are just several classes in Xtensive.Core (DelegateHelper type & few Tuple-related classes).

2. There is some threading-related stuff. Probably, it also will need some replacement (not sure if System.Threading will be supported).

3. Xtensive.Core.Diagnostics uses log4net & NUnit. Again, at least we can simply turn this off. Our logging is quite flexible from the point of underlying implementation - i.e. we never show log4net loggers, but use them only inside our own ones.

4. Finally, we use PostSharp. PostSharp.Public assembly is small (i.e. probably needs minimal fixes to run on Silverlight), but depends on serialization (btw, we too). So hopefully they'll support something here - otherwise we should seriously work on handling this.


Everything else seems fine. In any case, I think we should support Silverlight - new DO can play actually much more important role here: its support for non-SQL storages (i.e. a set of built-in ones, such as in-memory and in-file) seems simply what people will need: Silverlight provides access to isolated storage, where client may cache its own files. So let's provide Silverlight application with an ability to work with entities by absolutely the same way as on desktop \ server, but cache the data in memory or isolated storage - this gives an opportunity to built smart clients much easier (since they share the same BLL+DAL). An ability to synchronize storages & transparently download everything which isn't available is one more feature that may resolve lots of problems.

Finally, I really believe Silverlight is a future of interactive web (AJAX, Flash, etc. should loose in relatively short timeframe - probably, 2..5 years).

So there is simply no choice ;)

Monday, February 25, 2008

FiXml

Free FiXml tool is ready. You can download it here.

Brief summary:

FiXml is post-processor of XML documentation produced by C# \ Visual Basic.Net. It addresses some of the most annoying cases related to writing XML documentation in these languages:
- No support for inheriting the documentation from base class or interface. I.e. a documentation for any overridden member should be written from scratch, although normally it’s quite desirable to inherit at least the part of it.
- No support for insertion of commonly used documentation templates, such as “This type is singleton - use its <see cref="Instance" /> property to get the only instance of it.”, or even “Initializes a new instance of <CurrentType> class.”

To solve mentioned issues, the following additional tags are provided:
- <inheritdoc />, <inherited /> tags
- <see cref="..." copy="..." /> attribute in <see/> tag.

Please see its documentation for further information.

P.S. Quite short summary: FiXml eliminates two most frequent cases leading to copy-pasting the parts of XML documentation - of course, by our own experience.

Saturday, February 16, 2008

Still no CTP...

So... We've failed with delivering DataObjects.Net 4.0 CTP yesterday. It was actually predictable from mid-January - I planned to spent these 1.5 months after NY on completion of Xtensive.Storage assembly, but almost all this time was spent on refactoring of Xtensive.Core. There is some info on our current achievements below. Concerning the delivery date \ development: we're taking 3 more weeks, so next checkpoint is 7 of March . Not sure if it will be enough, but as I see, quite limited time helps to increase development speed. In any case I'll publish some info on what we're working \ worked each week here.

Other news:
- We're launching a new web site on the end of the next week (most likely - on weekend). Hopefully you'll like it ;)
- Today \ tomorrow I'll publish a nice tool FiXml - XML documentation post-processor (MSBuild task). It handles XML comment inheritance (<inheritdoc> tag processing), supports overriding inherited documentation parts and allows to use parts of other XML comments (<see cref="..." copy="..." /> tag). Quite helpful, if your project has large set of classes, but there is relatively small set of common base classes and interfaces (exactly our case ;) ) - so basically you should describe just their methods, rather then methods of all the descendants \ implementors (probably, 70% less work). Reusable comment parts are also a nice addition - e.g. normally any constructor is described as "Initializes a new instance of CurrentType class". Although short, the description is always different (type names differ from class to class). We use for this <see cref="ClassDocTemplate" copy="true" /> now. P.S. We've just started to use it internally, so probably it isn't solid rock stable.

Now some info on our current achievements: as I've mentioned, we were refactoring Xtensive.Core (now we're doing the same with Xtensive.Indexing). In particular, we updated:

1. Associate providers framework. We can associate late-bound "matching" handlers with any data types or their pairs and access them quite efficiently. The background is:
- We need our own Comparers (since built-in ones don't work for all data types - funny, but .Net doesn't implements IEquattable<T> e.g. for enums; moreover, we need configurable comparers utilizing provided comparison rules (~ a set of of directions and cultures), but IComparable<T> doesn't imply any possibility to configure it.
- Converters - they're used e.g. to automatically convert item values to measure values.
- Arithmetic - instances of these classes handle arithmetic operations for data types. They're used to calculate measures as well.
- Hashers - long hash calculators for Bloom filters. A single built-in Int32 hash allows to maintain just quite inefficient 512Mb Bloom filter working with 4G keys at max, which is not acceptable. Currently we can generate arbitrary number of long hashes - this allows us to maintain Bloom filters with any desirable (i.e. much higher) efficiency.
- Serializers - strongly typed serializers handling serialization much better. Although random access to tree doesn't require fast serialization (desktop HDD seek time is ~ 8..12ms), index merging (implies sequential read \ write) and SSDs (solid state drives) makes us to think about this.

So basically, we've provided an approach to locate some related handler for any data type(s). Examples:

Getting comparers:
- ComparerProvider.Default.GetComparer<int>(); // Returns AdvancedComparer<T>
- ComparerProvider.Default.GetComparer<int?>();
- ComparerProvider.Default.GetComparer<Direction>(); // comparer for enum
- ComparerProvider.Default.GetComparer<int[]>(); // works ;)
- ComparerProvider.Default.GetComparer<List<int>>(); // works ;)
- ComparerProvider.Default.GetComparer<IEnumerable<int>>(); // works ;)
- ComparerProvider.Default.GetComparer<IEnumerable<Pair<int[],int[]>>>(); // even this ;)

Getting asymmetric comparer comparing IEntire and Tuple:
- Func<IEntire<Tuple>, Tuple, int> assymmetricCompare = ComparerProvider.Default.GetComparer<IEntire<Tuple>>().GetAssymmetric<Tuple>();

Getting system comparers (based on IEquatable<T> & IComparable<T> impl.):
- ComparerProvider.System.GetComparer<int?>();

Getting comparers with applied rules :
- ComparerProvider.Default.GetComparer<Pair<int,string>>().ApplyRules(
...... new ComparisonRules(Direction.Negative, // Default (common) rule
...... Direction.Positive, // rule for int
...... new ComparisonRule(Direction.Positive, new CultureInfo("Ru-ru")))); // rule for string
Gets a comparer for pairs of (int, string) inverting the common direction to negative; int values are compared as usual; strings are compared in "Ru-ru" culture.

Hashers:
- HasherProvider.Default.GetHasher<int>();
- HasherProvider.Default.GetHasher<int?[]>(); // ~ The same sequences may go further

Converters:
- ConverterProvider.Default.GetConverter<int, double>();
- ConverterProvider.Default.GetConverter<int?, string>();
// our AdvancedConverter has IsRough property (means loss in precession may occur on conversion)
- ConverterProvider.Default.GetConverter<double?, int?>().IsRought==true

Notes:
- You're always getting strongly typed handler - i.e. no any boxing \ cast will occur on its usage. So this is fastest handler you may expect.
- Subsequent calls to GetXxx<T, ...> methods are resolved by cache. So getting first instance of some typed handler can be relatively slow (as reflection), but subsequent requests for it are extremely fast (we use our our TypeBasedDictionary for this - it resolves its items by just one virtual method call reading the static field - i.e. it is faster then Dictionary + thread-safe).

As you may expect, everything is quite flexible - i.e. you can write your own data types with your own handlers. Moreover, you can use your own XxxProvider instances (i.e. not default) with overridden search logic. Why this is important? Well, this means we don't bind ourselves by supporting just certain data types. We want to have an ability to support generally any type further; moreover, we want to work with any type natively - i.e. without boxing.

Performance we have here now:
- Simple comparers (e.g. for int) handle up to 100M comparisons per second. Tuple \ IEntire comparer can do ~ 2M to 4M comparisons per second (2M when equal; 4M when mostly different) for 5-field tuples on a single CPU thread on 2GHz Intel Core 2 Duo. It relies on other comparers (field comparers are provided for it by the provider it is bound to), so this result means other comparers should also be fast enough (actually they are normally as good as built-in .Net comparers, but understand comparison rules). This allows us to do up to ~ 300K index seeks per second on dual core CPU for 1M item index, and 200K - for ~ 1G item index (implying the data is in RAM cache), which seems quite good.
- Hashers, serializers and arithmetics behave the same (50M...100M operations per second for simple data types; ~ 10-20 times less for Tuples).

Why we care so much about this: as you see, we use a bit different approach in comparison to other databases: we store everything natively in RAM, i.e. not as serialized byte[] sequences or strings. This allows us to:
- Pass the data to the higher layers "as is" - i.e. without any transforms and intermediate objects in heap. As Tuples. Less RAM \ intermediates = better speed.
- Avoid conversion to binary \ string form on seeks as well - i.e. if you keep pages in binary form, you normally must convert the value you're looking for into the same form, and use byte-by-byte comparison. In our case we don't need this conversion, but the comparison is more complex for implementation; although more flexible as well (rules can be handled with ease).

2. Ray & Range refactoring. We've added new data type: IEntire<T>. It describes a typed n-dimensional vector with possibility to have an infinities (positive or negative) as values of its dimensions, or values with positive \ negative infinitesimal shift. Its implementors (Entire<T> for struct data types and TupleEntire<T> for tuples) are provided by nearly the same fashion as e.g. comparers (by Entire<T>.Create(...) methods). They allow to point on index item range much more precisely - earlier we didn't handle infinitesimal shifts (i.e. ~ it was possible to say ">=", but not ">" - at least for value of desirable dimension).

3. We've changed indexing interfaces & their simple implementations in Xtensive.Core (SortedIndex, etc.). Earlier unique and non-unique index interfaces were more different. Now it's implied that there will be just one non-unique index implementation - NonUniqueIndex acting as wrapper over internal unique index. The big mistake was that our BPlusTree implementation earlier supported both interfaces natively - such implementation doesn't allow to handle big non-uniqie tree serialization \ merging well (i.e. without large amount of RAM) when the amount of non-unique values is high.

4. We've completely rewritten Tuples. Current implementation is much better from the point of interface, and simply better from the point of memory usage \ speed.
- No more field names in Tuples. We noticed we anyway never use them and access fields by index rather then name (since names & mapping are anyway available on higher layers of abstraction, e.g. in RecordSets \ Records). But names were making us to provide different classes for tuples with the same structure, but different names, which isn't good, and now its gone.
- Earlier we were able to compress (i.e. transparently store a set of smaller fields in a single "compressing" field of underlying Tuple class) only the boolean fields and null \ available flags. Now we compress all the fields with the size smaller then the size of int (i.e. bool, byte, sbyte, short, etc.).

Some reuslts (DummyTuple<T> emulates old DataObjects.Net behavior (array-based storage)):

Creation, Operations\s. = Creations\s. for 10-field Tuples:
- Tuple memory usage (1000000): Time: 449,0441ms, Memory: 91999,408kb, Operations: 2,227 M/s.
- DummyTuple memory usage (1000000): Time: 1686,0642ms, Memory: 136001,304kb, Operations: 593,097 K/s.

So Tuple wins ~ 30% even on 10-field objects. Most of secondary indexes needs just 2-field Tuples. In this case they take 60% less memory - e.g. we need 20 bytes for each old collection-to-collection relationship item in index now (8 bytes for 2 integers, 4 bytes for flags; 8 more bytes is standard .Net heap object header); using array for the same purpose implies it will take 12+16*2 = 44 bytes; moreover, it will work slower:

Accessing integer field (typed, i.e. w\o boxing):
- Tuple.SetValue<T>: Operations: 58,104 M/s.
- Tuple.GetValue<T>: Operations: 34,321 M/s.
- DummyTuple.SetValue<T>: Operations: 8,523 M/s.
- DummyTuple.GetValue<T>: Operations: 22,109 M/s.
// Fastest way to deal with Tuples:
- ITupleFieldAccessor<T>.SetValue: Operations: 138,634 M/s.
- ITupleFieldAccessor<T>.GetValueOrDefault: Operations: 166,350 M/s.

Note: non-virtual empty method call: 230 M/s, virtual empty method call: 180 M/s. So performance we have here now seems very good as well.

5. We've made a set of interface changes related to performance.

We discovered calling a method by delegate is ~ 2 times faster then calling interface method. So in the vital parts of the framework (comparers, hashers, serializers, arithmetics) we migrated to "delegate representation of interface": e.g. when you ask for comparer, you get AdvancedComparer<T>, which provides a set of read-only delegate fields mapped to methods of underlying interface (IAdvancedComparer<T>). Funny, but this increases the speed of comparison for simple types by ~ 20%. There is AdvancedComparerStruct<T> also - if some object needs to store some AdvancedComparer, it's the best data type to use for this. Nearly the same as AdvancedComparer<T>, but a struct (there is an implicit conversion to it from AdvancedComparer<T>), so it requires no additional reference resolution.

Switching from IAdvancedComparer<T> \ IComparer<T> \ IEqualityComparer<T> to AdvancedComparer<T> is actually noticeable internal interface change. The same is correct about hashers (i.e. there is IHasher<T>, Hasher<T> and HasherStruct<T>), etc.

We use delegates to call genetic methods faster (e.g. generic methods for comparing Tuple - like TupleComparer.CompareStep<TFieldType>(...)). Virtual generic method is the slowest one - it is nearly 3 times slower then the call of interface method, since it needs type-to-address resolution (~ some internal dictionary lookup). But when it is bound to a delegate, it is already resolved, and thus called ~ as fast as calling a regular method.


Ok. It seems that's enough for now (for Russians it takes more time to write in English ;) ). On the next week I'm going briefly describe why we think the RDBMS engine we're building will be better when the famous existing ones. Transparent DataObjects.Net-based interface for it is one, but probably not the best its benefit ;) Or at least, I'll point on design mistakes most of existing databases have now, the effect of their presence, and how we're going to handle them.

Finally, FiXml will be published shortly.