Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
99 views
in Technique[技术] by (71.8m points)

full-text search sql server 2005

I have got hold of a sql server 2008 developer edition and this is my data:

 if exists (select * from dbo.sysobjects where id = object_id(N'test') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table test
    create table test
    (
    Id INT IDENTITY NOT NULL primary key,
    data NVARCHAR(255) not null
    )

    insert into test (data) values ('Hello world');
    insert into test (data) values ('Hello j-world');

I would like to find all rows which contain j-world whilst avoiding LIKE for efficiency reasons.

If I try:

select 
    * 
from test
where freetext
(
    *,
    N'j-world'
);

I get all rows which is incorrect. Do I have to implement my own word breaker or something? Can I actually use iFTS in this situation at all?

Thanks.

Christian

PS:

Let me cast my question more generically. How can I find hyphened words using FTS (j-world is just an example)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I do not quite understand why do you want FTS? If you want exact march, this is done by simply using LIKE:

  • SELECT * FROM test
    WHERE
    data LIKE '% world%'
    • --results in
      --Hello world!
  • SELECT * FROM test
    WHERE
    data LIKE '%j-world%'
    • --results in
      --Hello j-world!

If you want to play with FTS. Create and engage your own (custom) Full-text Stoplist

  • I do not have SQL Server 2005 but I checked that it works in 2008.
    Docs tell that it is possible for compatibility level 100 only (i.e. in SQL Server 2008).
    Though, try it in 2005

In SSMS DatabasesYourDatabaseNameStorageFull Text Stoplist --> right-click and choose "New Full-text StopList...". I named it vgvStoplist and made sure that default "Create an empty syoplist" radiobutton was checked.

In SSMS right-click table dbo.test ---> Full-text index --> Properties ---> Select a page: General, Full-text Index Stoplist --> enter name of created empty list (I entered vgvStoplist)

Now, the query

select * from test where contains (data, '"j-world"')

returns only 'Hello j-world' (without 'Hello world')

This also can be done through TSQL. Follow msdn


==== Update:
Well, your question showed that the notion of noise is subjective.

It worked because 'j' is system stopword (cf. it searching the system stoplist ( * ) by 'j' (3 symbols) string, see also ( ** )) and '-' is, apparently, wordbreaker.

I did not propose you to use empty stopword list. I just illustrated "how to" with a minimum of efforts from my side.
Elaboration of techniques suited for you is up to you. I am not even expert in this domain in order to give advises. I just answered you from the principle of common sense

Create your own Full Text StopList, fill it with your content.
You might want to reuse system stoplist content.
For this, you may want to create

  • (*) separate script of system stoplist
    by creating one more Full Text StopList marking it with "Create from the system stoplist" then script it (to "File..." or to "New Query Editor Window"),

then create your own script by by editing a copy of () using find-and-replace and/or copy&paste from ().

(**) Here is an excerpt from scripted copy, named by me as vgv_sys_copy, of system FT StopList :

ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'French';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Italian';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Japanese';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Dutch';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Russian';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Swedish';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'Simplified Chinese';
ALTER FULLTEXT STOPLIST [vgv_sys_copy] ADD 'j' LANGUAGE 'British English';

Update2
I posted subquestion Performace gains of searching with FTS over it with LIKE on indexed colum(s)?

I also noticed that I answered basing on features not available in SQL Server 2005
There should be MSSQLFTData oiseENG.txt and I liked answers to Noise Words in Sql Server 2005 Full Text Search

I would have removed 'j'. As a matter of fact, if I were you, I would have created noiseENG.txt from scratch. But it is your decisions depending on your context and multiple unknown to me factors.

I believe you should post it as separate question. I already was banned multiple times in StackExchange sites (and still am in SF) for discussions. This is not forum or discussion board, cf. FAQ.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...