I have program that loads a (sometimes) large CSV file into an array. The data cannot be sorted, and I do not know if the data is text or numbers. This is up to customers.
Example could be
1;JOHN;DOE
2;JANE;DOE;
3;BOBBY;NOTABLES
but it could also be strings
MB9384HJ;TEST1
B9284918;TEST2
The number of lines could be up to a few million.
I would like to seach for a specific value in a column (which one is known ahead of time, this is my "key index column"). Assume this is unique. Key is to find which row this column is in.
Currently the code is traversing from 1..n and comparing. This is obviously slower towards the end.
I am considering these options:
- a memory SQLite database with key index value and record number
- a TStringDictionary with key, record as the pairs
- a Hashed stringlist
My idea is: instead of traversing the array, I query the index for the key (client provides item to search for, it must be random-access). Then I immediately get the rownumber of the array, and I can fetch the data.
Which of the these (or other, if any) would be a better path to follow ?
question from:
https://stackoverflow.com/questions/65856825/index-type-for-searching-large-unsorted-array 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…