Today’s topic is a short one, inspired by this question in the forums: “Comparing methods for storing/searching through large list of words“.
Update (22 June 2014) – When I originally wrote this article, I unwittingly used two different data sources for my word lists. This has been corrected. If you previously read this article, please scan ahead to see the corrected data.
You want to make a word game (in Corona) and are wondering what the best way is to store and retrieve/search for words. Once you determine what your options are, you will want to ask these questions:
- Memory Efficiency – Of the options, which has the smallest memory footprint?
- Note: This is actually a two part question. You must account for persistent storage (the data on ‘disk’) and the in-memory storage (when you load the data for use.)
- Speed – Which is faster for lookups?
- Versatility – Which is more versatile?
Then, armed with some data, you will have to choose an option based on your design needs and trade-offs between efficiency vs. speed vs. versatility.
The two primary options available to you are:
- SQLite 3 Database (DB) – Although you may create a DB at launch time, most people pre-encode the DB and load it from a file, using the “sqlite3” Lua library.
- Table – Another choice you have is to load all of your words into a massive Lua table.
- Custom – Beyond these two basic options, you have myriad other choices available to you (depending upon your coding skills.) However, I will only be dealing with SQLite and Tables in this article.
So, let’s examine our two choices based on the aforementioned questions.
- Memory Efficiency – SQL databases are very efficient when used to store plain text.
- Persistent Storage – In my testing, the database file encoded to about half the size of the table version.
- In-Memory Storage – For lists of any reasonable length, SQL will always win this comparison. Why? Because, for SQL databases, only a very small part of the database is loaded into memory. The rest is kept in persistent storage and accessed as needed.
- Speed – SQL databases are pretty fast, especially on iOS devices. However, in some cases table lookups are 10’s if not 100’s of times faster. I’ll elaborate below.
- Versatility – SQL databases are highly versatile. If we only think about searches, we can still see that SQL allows us to do partial word searches, regular expressions, and to search for multiple matches. Although it is ‘technically’ possible to do this with tables, it is not easy.
- Memory Efficiency – Tables get no efficiency bonus. At best, the cost of storing a word indexed list of words will be two bytes per letter per word. In all likelihood, the storage size will actually be higher than this by a little bit.
- Speed – As I hinted above, if we restrict ourselves to exact word searches, tables are massively faster than SQL, hands-down.
- Versatility – As I said above, anything other than an exact word look-up using tables will be slow and technically challenging to code.
Numbers To Back Up My Claims
Everybody loves numbers, so here are some interesting ones to back up my memory and speed remarks above. Note: This is the part I corrected when I updated the article. Now, both the DB and the table use a list of 81,520 words.
- Memory Efficiency
- Persistent: 1.349 MB
- In-Memory: ~0.28 KB
- Persistent: 1.09 MB
- In Memory: ~7865.71 KB
- Speed – Search the ‘list’ for N iterations. On each iteration randomly select one of ten known to be in the list words as the search item.
- SQLite 3 – 1000 iterations == 16,249 milliseconds
- Table – 1000 iterations == <1 millisecond
- Table – 100,000 iterations == 58 milliseconds
The code from my research is too long to list, but you can get it here to try your own tests.
- SQLite Database Brower – This tool can be used to encode data into SQLite3 loadable databases.
- 12 Dicts Word Lists – Great word lists for your games.