Mar
7
2008

Indexing and searching business entities using Lucene.Net Framework, part 3

Conception using generics and reflection of a search engine to index and search content in your business entities without being intrusive.

Part 1 and 2 are available following those links

  1. Indexing and searching business entities using Lucene.Net Framework, part 1
  2. Indexing and searching business entities using Lucene.Net Framework, part 2

Solution’s architecture

The main idea is to be able to define the business entity’s properties that must be indexed when this one is saved or updated in the chosen persistence system.

With the goal to be the less intrusive possible in our model we come fast to the idea that we need to extend our business entities with meta-data. The issue then is that at runtime it is needed to know which meta-data needs to searched in the entity in order to be able to index the content of the decorated property.

As one of the goal is to have a Framework which manage the indexation and the searching of whatever business entity, we might have wrote a simple class inheriting from System.Attribute in an assembly separated from our domain. That would have the drawback of behind much intrusive in our domain. Another solution was needed.

As we have seen the developed Framework needs to know the meta-data, giving it the opportunity to index the content of the property at runtime. This means that at development time it is absolutely possible to generalize this information by using the generics of the .NET Framework 2. As we are talking about meta-data the only imposed thing is that our class inherits from System.Attribute.

The choice was made then to define a utility class in the domain assembly inheriting from System.Attribute which will serve us as a decorator of the entity’s properties needing to be indexed.

On the following picture you can see an example of the domain for an application to which we have added our attribute SearchableAttribute used to decorate the Post and Page classes:

The Visual Studio solution is organized as a Domain Driven Development solution:

We have so defined the new attribute SearchableAttribute in the assembly innoveo.Blog.Domain.

Here is the description of the organization of our solution:

  • innoveo.Blog.DAL: Data access layer using Euss OR/M mapping tool
  • innoveo.Blog.Domain: Assembly containing our domain business entities
  • innoveo.Blog.Services: Layer exposing the different business services
  • innoveo.Blog.Web: Web presentation & web services layer
  • Blog: The web application

Here it is for our solution that will use our business entities indexing Framework. Let’s have a closer look now at the Framework itself!

Indexing Framework

First here is the class diagram:

The role of each class of our Framework is as following:

  • EntityIndexer manage an index and index the business entities
  • EntitySearcher let you search business entities
  • EntityDocument is used by the class EntityIndexer in order to manage Lucene.Net Document
  • IndexPath is an utility class used to specify the location of index

As you can see on the diagram we use the .NET Frameworks 2 generics this in order to allow us to search whatever attribute decorating our business entities. But also to be able to have a Framework that is not dependant of any entities. This brings a good flexibility at the usage time as it let you index whatever property of type string of whatever business entity. All of this is without being intrusive in our model.

Now that we know about the architecture of our Framework it is time to look deeper in the details of the implementation.

This post is cross-posted on innoveo blog and in French on my .NET community portal Tech Head Brothers.

Mar
7
2008

Indexing and searching business entities using Lucene.Net Framework, part 2

Conception using generics and reflection of a search engine to index and search content in your business entities without being intrusive.

Part 1 is available following this link Indexing and searching business entities using Lucene.Net Framework, part 1

Lucene.Net presentation

Lucene.Net is an open source project coming from the Java world currently incubating at the Apache Software Foundation (ASF). It is a source code port on the .NET platform using C#, done class-by-class, API-per-API, of the indexing and searching engine algorithms of Java Lucene.

Apache Lucene is an efficient indexing and searching engine for text data. However it is not offering integrated support for document like Office Word or PDF, you need to use extensions able to extract the text content of a document in order to be able index it. This is also mandatory for markup documents like HTML.

Lucene.Net follows scrupulously the APIs defined in the classes of the original Lucene Java version. The API names as well as the class names are preserved with the intention to follow naming guidelines of the C# language. For example, the method Hits.length() of the Java implementation is written Hits.Length() in its C# version.

Like the port of the APIs and the classes in C#, the algorithm of the Java version of Lucene is also ported in the C# version. This means that an index created using the Java version of Lucene is 100% compatible with it C# version, in reading, writing and updating. Therefore two processes, one written in Java and the other in C#, could achieve concurrent searches using the same index.

You might consult the documentation of the last stable version, version 2.0, on the following page. To download the last stable version browse to this page. To get more information about Lucene I recommend using the pages dedicated to the Java version of Lucene which are much more consistent.

Lucene.Net Architecture

Lucene.Net Architecture

The lower layer is the data access layer (Storage). Then, the upper layer is about accessing the index files (data access). This layer is used by the indexing system and the searching system. On top of those we find a layer for searching and a search request parser layer used by the searching part of Lucene.Net. Identically we found a parser layer and a document layer used for the indexation part of Lucene.Net.

To get more information about Lucene I recommend reading the presentation on Lucene website.

Now that we got a better view on what is Lucene.Net about we will see in the next part how we will use it to index the properties of our business entities.

This post is cross-posted on innoveo blog and in French on my .NET community portal Tech Head Brothers.

Nov
16
2007

Indexing and searching business entities using Lucene.Net Framework, part 1

Conception using generics and reflection of a search engine to index and search content in your business entities without being intrusive.

Introduction

Today, one of the functionality that almost all web sites implements is a method to index content and give it users the possibility to search that content spread into its web pages. It is one of the simplest ways to improve the user experience on your web site.

Blogs brought categories/tags giving the possibility to label the information. However this advantageous method isn’t always sufficient. It is advisable to then use a real content indexing method.

In this set of posts I propose to take a look at the indexing and searching method I implemented on the web site of innoveo solutions, my new company. I hope also to bring soon this system to my web site Tech Head Brothers.

Both web sites, innoveo solutions and Tech Head Brothers, were developed using Domain Driven Design. So, we started by defining a domain model with our business entities. In this layer we do not concentrate on technical aspects for example like persistence. On the other hand we do concentrate on the domain we want to address.

One of the main ideas is to avoid being intrusive in the domain model with any inheritance of technical classes or to link this layer with any technical frameworks.

To achieve this goal we will use an O/R mapping tool (Euss) for the business entities persistence as well as the Lucene.Net framework for the indexing part.

Following quite some discussions (Thanks Didier ;) in which we asked us if we would better use a service offered by one of the searching big players on the Internet, we finally decided to keep the control of our searching tool.

Wanting to be independent of any database and services like Full-Text indexing, or from services like Indexing Services, we decided to use Lucene.Net to avoid having to re-implement everything from scratch.

In the following posts, I will present an introduction of Lucene.Net; we will see the architecture I have chosen for the indexing and searching framework; the implementation details of that framework and finally an example of integration into a data access layer.

This post is cross-posted on innoveo blog and in French on my .NET community portal Tech Head Brothers.

Jun
22
2007

Legacy code integration using Windows Communication Foundation (WCF) and Java Axis in a Service Oriented Architecture

What are the options when you need to integrate Windows legacy code in a heterogeneous Service Oriented Architecture (SOA)?

The proposed problem was to expose a set of Windows C++ DLLs to a global SOA platform written in Java. Those DLLs would be then exposed as backend computation services.

One of the options used in some past projects was to use the Microsoft SOAP Toolkit. But your C++ DLLs needs to be COM Objects for that. I experienced it and still have production code running with it. It works quite well even with surprising complex data structures. But as today it is definitely not the way to go.

The service oriented world has evolved rapidly over the last years and using such an old deprecated technology (Microsoft SOAP Toolkit) is not really efficient for a project.

So at last it was time to check .NET in this entire Java world ;-)

The general idea was to define a layered solution. From bottom up I first defined an interoperability layer using .NET Interop to be able to call the C++ DLLs from .NET. Then on top of this first layer I added another layer exposing the whole as a web service.

Now that I had the backend web service working I had to call it from Java. So are all those promises of web service interoperability just working out of the box?

Yes and No. You need to first have a real look at the different frameworks stack you want to use. For example, when you are working with Java Axis you are tied to SOAP 1.1, so you have to take care that the other side can understand this version of the SOAP standard.

So what do I have? First I have a client that must be written in Java using Axis 1.2-1.4 using SOAP 1.1, then a C++ Windows DLL backend that must be exposed in an interoperable way. By luck I was free to choose the technology used on that backend. Both ASP.NET web services (ASMX) and WCF can interoperate with the SOAP 1.1 standard, so I decided to go on with C# and WCF.

The way I approached the problem was to go on iteratively, making a first proof of concept to integrate the C++ layer with the .NET layer, then to integrate this upper layer with WCF and a .NET client to finally finish with a Java client.

Due to the ease of the interface exposed by the web service I didn’t wrote it using data contract design first. In more complex scenarios I would definitely go on with it because it leads to better success with interoperability.

So I finally adopted the following final solution.

Adopted solution

On the backend a Windows Communication Foundation web service written in C# using the .NET Framework 3.0. The web service layer uses .NET interop to make a call to the legacy C++ Dlls.

On the client side; a Java 1.5 proxy/stub class generated using WSDL2Java out of the WSDL exposed by the service.

Hosting

The last question concerning the web service was to define the way I wanted to host it?

Two possibilities in my case: Windows Service and Internet Information Services (IIS).

After making some load test on the web service I realized that it was leaking memory so I finally went to the IIS solution because it provides you all the services for recycling a buggy process. It was also easier because some part of the deployment process for web services hosted in IIS were already implemented and tested.

Faced problems

The first minor thing I went through was to check which versions of WCF and Axis are compatible.

Then the C++ interop was the main work because it is not such an easy task to marshal between the different worlds in a correct way.

And finally the last one was that the legacy C++ Dlls where using ifstream to load configuration ini files, without a possibility to specify the full path of those files. As the web application was running in the Application Pool process (w3wp.exe), loading the ini file was using the base path of w3wp.exe; C:\WINDWS\System32\inetsrv\. That would force me to deploy the ini files in that folder. What an ugly solution. Adding the bin folder of the application to the PATH was not making the trick too. And as we couldn’t change the C++ Dlls I had to find something else.

The solution I came to, thanks to David Wang (yes you also Richard ;-), was to use this little method before calling into the legacy Dlls, changing the current directory:

        /// <summary>
        /// Sets the current directory.
        /// </summary>
        private static void SetCurrentDirectory()
        {
            string binpath = Path.Combine(HostingEnvironment.ApplicationPhysicalPath, "bin");
            Directory.SetCurrentDirectory(binpath);
        }

Conclusion

I really like the way WCF is handling the separation between the service description, the implementation and the configuration that will define how you expose the service to the world.

The other thing I still enjoy to see is complex systems using so different technologies talking to each other. Here we had a caller written in Java running on a Linux server calling a web service written in C# calling multiple C++ Dlls running on a Windows server.

I am still amazed about those little things!!!

About Laurent

Laurent Kempé

Laurent Kempé is the editor, founder, and primary contributor of Tech Head Brothers, a French portal about Microsoft .NET technologies.

He is currently employed by Innoveo Solutions since 10/2007 as a Senior Solution Architect and certified Scrum Master.

Founder, owner and Managing Partner of Jobping, which provides a unique and efficient platform for connecting Microsoft skilled job seekers with employers using Microsoft technologies.

Laurent was awarded Most Valuable Professional (MVP) by Microsoft from April 2002 to April 2012.

JetBrains Academy Member
Certified ScrumMaster
My status

Twitter

Flickr

www.flickr.com
This is a Flickr badge showing public photos and videos from Laurent Kempé. Make your own badge here.

Month List

Page List