I'm currently conducting research for the design of an application that requires to search a large number (at least 100s of millions) of documents (of various types, PDF, Word, HTML etc.) and I'm looking to identify the most optimal way to architect the Full Text Indexing components.
The current database schema contains 10 relations that store documents. Each relation has a Full Text Index defined. When performing a search operation, all documents (and therefore relations/indexes) should be considered and the search should be exhaustive. In other words, search all documents that contain the word/phrase x.
This pattern is currently implemented as a series of SELECT .... CONTAINTSTABLE queries (one for each relation) that are combined using the UNION operator.
My question then, is this the most suitable design approach for such a scenario?
Consider, what if I were to denormalize the numerous document relations into a single relation, so that a single Full Text Index could be built from it. Would this result in faster search times perhaps? Perhaps keeping separate relations could more advantageous were scale out to be a future need. What are your thoughts and experiences on such matters?
If I can provide any more details to enrich the discussion please let me know. Your advice and guidance is as always, very appreciated!
John Sansom | SQL Server DBA Blog | @SQLBrit on Twitter |SQLBrit Community Forum - A place to share all the "other stuff" there is to being a Data Professional"