database design - What tag schema(s) are the most efficient/effective?

Question

Welcome To Ask or Share your Answers For Others

database design - What tag schema(s) are the most efficient/effective?

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:28:26+0000

It all depends on data volumes and content to tag distribution and density ratios

If you have a low tag distribution and density ratio (typical human generated data) you can simply generate an unique id or hash for each possible collection of tags in use by the data. Associate the 'tag collection' id with each data instance with those tags

This can work surprisingly well for many forms of human generated data

e.g. Stackoverflow has ~500,000 questions, and ~20,000 tags (too many dupe-ish tags!). Most questions have less than five tags. At worst case scenario you will have 500,000 'tag collection' id's to associate , but more realistically you will have several thousand

You also will either have to have instance tracking or garbage collection on the 'tag collection' collection as specific combination of tags fall out of use

e.g.

Tag: id, tagName
TagCollection: id, instanceCount
TagCollectionTag: tagCollectionIId, tagId
Data: id, title, content, tagCollectionId

Inserting tags is fast if a hash is used (hash on all tags of the collection). Otherwise you have to search the TagCollection and TagCollectionTag collections, but this should not be too large anyway

Searching is fast; search TagCollectionTag for instances containing the specific set of tags, and then find data rows with any of those tagCollectionId's

Hope that wasn't too confusing :-)

Categories

database design - What tag schema(s) are the most efficient/effective?

database design - What tag schema(s) are the most efficient/effective?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags