1. We're launching our new web site next week. New features include the announced ones, as well as few new ones. The most useful is global search: you can search on our support forum, news (blogs) and on the web site itself using a single search form on web site. Probably we'll add search in our Online help later.
2. The part of our team working on it during last 4 weeks is returning back to v4.0 now. We're targeting to deliver CTP in the beginning of May now - web site has taken 1.5 additional weeks.
3. Funny or not, but new web site development lead to appearance of few new projects, which will be publicly available ~ with release of DataObjects.Net 4.0:
Xtensive.Web
There are web crawler and e-mail sending components. Nothing very special.
So as you might expect, the whole search feature depends on this component. We simply crawl all we need to built-in database.
Xtensive.Fulltext
Here we have implemented a text analyzer (tokenizer & token processing chain) for full-text indexing & querying, highlight region selector and highlighter itself.
Usual question: why? The answer: although DO offers full-text indexing & search, it doesn't offer text highlighting (selection of "best matching" part of the text and highlighting found words in - independently of their forms). Finally, since we offer our own indexing engine in v4.0, the decision to built our own full-text indexing model there (based just on regular indexes - i.e. on our owns as well) was made far before this moment, so Xtensive.Fulltext will anyway help to solve some problems we'll get on this path further. But for now this part is used just for selecting best matching text parts and highlighting found words there.
Certainly we evaluated Lucene.Net highlighter, but found it simply not acceptable (don't remember why exactly now, but the reasons were serious - at least it couldn't handle Russian word forms and offers quite limited best matching fragment detection approach).
Token processing chain (default tokens are word, number, symbol, url, e-mail, etc.) may include arbitrary token filters, such as "stop words" eliminator, sentence detector and finally, stemmer (words to their base forms converter). Initially we tried to take Snowball stemmers from Lucene.Net, but found them worse then bad - most likely because they're generated for Java by some Snowball grammar compiler, and then - ported to .Net by some automatic tool (as the whole Lucene.Net project). Russian stemmer doesn't work at all there because of buggy encoding conversion; all stemmers work extremely inefficiently from the point of .Net code optimization rules.
So for now we're using just different implementation of Porter stemmer for English, and finishing our own implementation of Russian stemmer based on Snowball stemming algorithm for Russian language. Further plans (not "right now", of course) include implementation of either Snowball grammar to .Net compiler, or simply a set of stemmers for most common European languages.
Xtensive.Core.Serialization (namespace) and Xtensive.Xml (assembly)
It is our own serialization layer. Differences to what's offered by .Net are:
- Strongly typed. .Net serialization uses boxing even for serializing such simple type of values as Int32.
- Identifier compression (used by binary serialization only): every identifier is serialized as-is just once (on its first appearance); further a reference to it is serialized. This significantly reduces binary serializer's output size (almost any type name \ property name takes just 4 bytes).
- Arbitrary serialization formats - currently binary (fast) and XML (human readable & writeable).
- Two selectable serialization strategies: quick (no queue) for small graphs & robust (with queue) for large ones. Moreover, property serialization strategy can be selected by object serializer itself.
- Arbitrary set of value serializers (e.g. you can write native value serializer for your own Vector type - so serialize it not as an object in graph, but as a value of some property)
- Well-known objects support. .Net serialization layer requires type substitution for this (to IObjectReference implementor), which is certainly not good. We do not - i.e. we always allow a particular object serializer to make a decision if a new object should be created, or an old one should be used.
- Custom serialization queues support. This allows to serialize graphs which simply can't fit in RAM. That's probably one of the worst limitations of .Net serialization layer.
- Etc.
As I wrote before, we need this to get the "default" content (common dictionaries, such as Currency, Sex, etc...) imported to a web site database, plus handle download \ product definitions (Product, Edition, Version, etc. in web site model). Btw, for now just deserialization works (i.e. tested) - for now we need just it; but serialization will be certainly added shortly.
Thursday, April 03, 2008
Status update
Subscribe to:
Post Comments (Atom)
6 comments: