Tuesday, April 15, 2008

New web site: launched

It's done - we've launched our new web site. Any comments, suggestions and bug reports are welcome.

Saturday, April 12, 2008

New web site: live testing

We're testing our new web site live this weekend. It will be launched officially tomorrow.

P.S. Please don't register on it during this period - the database will be recreated tomorrow. There are still some bugs as well - we know about most of them, they'll be fixed before launch.

Friday, April 04, 2008

Xtensive.Indexing architecture: pros and cons, part 2

Pros (continued):

Let's cover two more pros:

3. More efficient page placement
4. Faster random lookups for large indexes


Regular index:
- Can be fragmented, i.e. its pages aren't always written in sequential order. So sequential page read isn't always sequential - i.e. there can be random seeks on HDD with ease.
- Has less then 100% fill ratio: usually it varies from 75 to 85%.
- Has fixed record size, and thus - pages with fixed size. Usually - 4 to 16 Kb. This makes its compression problematic; moreover, usage of such small pages imply there will be >3 layers in the B+tree. Later I'll explain why this isn't good.

Our index slice:
- Isn't fragmented. They're always written in sequential order - from the leftmost page to the rightmost page. This means sequential index reading is always fast; although usually we should read several slices.
- Always has 100% fill ratio: when B+trees are serialized to disk or merged, a special page builder is used. Basically, we can build a B+tree-like (I'll explain later, why "like") structure of arbitrary size in stream with almost zero RAM usage by an ordered enumerable providing its content (acts as the data source - such ordered enumerables can be built e.g. for slice merge operations). "Like" - because all the pages of this structure except the ones on the right "edge" of the tree will be 100% filled; pages on the right "edge" will be 0...100% filled (unpredictable) - that's actually a violation of B+tree definition. Although from the point of its performance it is even better, since the numer of such pages is 3 in our worst case.
- May have arbitrary page size. We're targeting on ~ 16..32Kb compressed page size, which in turn mean uncompressed page will be ~ 64...128Kb at least, and will have ~ 1000...2000 nodes.

Let's discuss why we're going to use bigger pages.

Assumptions:
- Random read from HDD is quite slow. 100...200 read operations \ sec.
- There should be almost no difference between random reads of 4Kb blocks and 64Kb blocks, if the total amount of blocks is huge (i.e. big index): cache doesn't help; any random read means seek (~ 5...10ms) + read. HDD rotation speed is 7500...15000 RPM now, so one rotation takes at least 4ms. So to get some random byte on track read after positioning the head to it, we need ~ 2ms in average (rotate ~ a half of track). But how many bytes are on track? ~ 500Kb. So e.g. 32Kb of data is just 6% of track, although in average we rotate 50% of track. Ok, I want to say it doesn't matter much if we read 4Kb blocks or 32Kb blocks at random - the difference in [reads count per second] is just 10%.
- In case with sequential read the size of page doesn't matter much at all.
- If the index is large, most of page fetch requests will be requests to almost random pages. Since on-disk indexes are read-only, subsequent access to the same pages will almost always hit the cache.
- Page decompression & deserialization time doesn't matter much, since we anyway read ~ 100-200 pages per second in case with mostly random access. Ok, we fetch 1000+ pages per second (taking into account there are some sequential readers) - it anyway seems 1ms is more then enough to decompress and deserialize 16Kb of data.
- Compression seems simply "must be", since normally the data in a single page is quite similar - all the keys there are close to each other by their values. So we expect ~ 4x compression factor in average. Google reports they have compression factor of ~ 10 for their indexes.
- We shouldn't care about write behavior much, since as I've mentioned, any write operation implies writing of big file (i.e. index slice) sequentially.

Conclusions:
- Having bigger pages (16...32Kb) seems cheap (just ~ 10% of performance on read)
- With such pages we can reduce the amount of seeks per random index lookup to just one. How? If we maintain 2K nodes in a single page, 3-level B+tree may store up to 8G items. Ok, 40 bytes per item => 320Gb index size. Enough to fill a single HDD. But we can easily keep (1+2K) pages (root + all level 2 pages) in memory cache. So in this case we really need just one page read at max to access any index item.

Imagine the same for regular index with 4Kb pages: there are just 100 items on each page; we can cache just 3 above levels of such index in RAM; but we need 5-level tree to store ~ the same 10G items; thus accessing a random item requires 2 page fetches from HDD, or approximately twice more time.

[To be continued...]

Thursday, April 03, 2008

Status update

1. We're launching our new web site next week. New features include the announced ones, as well as few new ones. The most useful is global search: you can search on our support forum, news (blogs) and on the web site itself using a single search form on web site. Probably we'll add search in our Online help later.

2. The part of our team working on it during last 4 weeks is returning back to v4.0 now. We're targeting to deliver CTP in the beginning of May now - web site has taken 1.5 additional weeks.

3. Funny or not, but new web site development lead to appearance of few new projects, which will be publicly available ~ with release of DataObjects.Net 4.0:

Xtensive.Web

There are web crawler and e-mail sending components. Nothing very special.

So as you might expect, the whole search feature depends on this component. We simply crawl all we need to built-in database.

Xtensive.Fulltext

Here we have implemented a text analyzer (tokenizer & token processing chain) for full-text indexing & querying, highlight region selector and highlighter itself.

Usual question: why? The answer: although DO offers full-text indexing & search, it doesn't offer text highlighting (selection of "best matching" part of the text and highlighting found words in - independently of their forms). Finally, since we offer our own indexing engine in v4.0, the decision to built our own full-text indexing model there (based just on regular indexes - i.e. on our owns as well) was made far before this moment, so Xtensive.Fulltext will anyway help to solve some problems we'll get on this path further. But for now this part is used just for selecting best matching text parts and highlighting found words there.

Certainly we evaluated Lucene.Net highlighter, but found it simply not acceptable (don't remember why exactly now, but the reasons were serious - at least it couldn't handle Russian word forms and offers quite limited best matching fragment detection approach).

Token processing chain (default tokens are word, number, symbol, url, e-mail, etc.) may include arbitrary token filters, such as "stop words" eliminator, sentence detector and finally, stemmer (words to their base forms converter). Initially we tried to take Snowball stemmers from Lucene.Net, but found them worse then bad - most likely because they're generated for Java by some Snowball grammar compiler, and then - ported to .Net by some automatic tool (as the whole Lucene.Net project). Russian stemmer doesn't work at all there because of buggy encoding conversion; all stemmers work extremely inefficiently from the point of .Net code optimization rules.

So for now we're using just different implementation of Porter stemmer for English, and finishing our own implementation of Russian stemmer based on Snowball stemming algorithm for Russian language. Further plans (not "right now", of course) include implementation of either Snowball grammar to .Net compiler, or simply a set of stemmers for most common European languages.

Xtensive.Core.Serialization (namespace) and Xtensive.Xml (assembly)

It is our own serialization layer. Differences to what's offered by .Net are:
- Strongly typed. .Net serialization uses boxing even for serializing such simple type of values as Int32.
- Identifier compression (used by binary serialization only): every identifier is serialized as-is just once (on its first appearance); further a reference to it is serialized. This significantly reduces binary serializer's output size (almost any type name \ property name takes just 4 bytes).
- Arbitrary serialization formats - currently binary (fast) and XML (human readable & writeable).
- Two selectable serialization strategies: quick (no queue) for small graphs & robust (with queue) for large ones. Moreover, property serialization strategy can be selected by object serializer itself.
- Arbitrary set of value serializers (e.g. you can write native value serializer for your own Vector type - so serialize it not as an object in graph, but as a value of some property)
- Well-known objects support. .Net serialization layer requires type substitution for this (to IObjectReference implementor), which is certainly not good. We do not - i.e. we always allow a particular object serializer to make a decision if a new object should be created, or an old one should be used.
- Custom serialization queues support. This allows to serialize graphs which simply can't fit in RAM. That's probably one of the worst limitations of .Net serialization layer.
- Etc.

As I wrote before, we need this to get the "default" content (common dictionaries, such as Currency, Sex, etc...) imported to a web site database, plus handle download \ product definitions (Product, Edition, Version, etc. in web site model). Btw, for now just deserialization works (i.e. tested) - for now we need just it; but serialization will be certainly added shortly.