The Data Block contains indexed data with matching token value(s) and offset(s). Future updates to SASI source code may require manual merging when applying this patch. It is quite expected because SASI index files follow SSTable life-cycle.
Terms Count : 20, Offsets [0, 9, 21, 34, 43, 54, 63, 73, 85, 99, 109, 125, 133, 143, 151, 164, 179, 193, 204, 215] Data Term (partial ? Padding is there to fill a Each entry in the Non SPARSE Term Block is composed of a Partial Bit which tells whether the current term represent the original term or is one of its suffixes. from the Root Pointer Level down to the last Pointer Level. The term itself is then written, followed by a 0x0 byte and then a Token Tree offset. From this last Pointer Level, SASI knows in which Data Block (because the Pointer Term keeps a reference to the Data Block index) it should look for the actual matched value, if any. If the index mode is CONTAINS and the user issues a prefix or equality search, SASI will only use stored terms that have their Partial Bit = false . However, you’ll be able to search user whose account has been created between a wide range of date ( is called, it will trigger the stitching of index segments together, if there are more than 1 segments. The reason is that the source code is quite abstract (frequent use of generics and polymorphism to mutualise code, which is very good) and very low level (usage of bit operators for performance). But above a LIMIT threshold, adding more predicates is beneficial because you reduce the number of returned rows thus limit Cassandra sequential scans.The stitching phase is necessary because the terms are sorted in each segment but not globally. To be able to have a clear understanding of the layout, I had to patch the source code to introduce debugging points through all the life-cycle of On Disk Index building and output the content to the file . Generally speaking, there is a limit of number of returned rows above which it is slower to query using SASI or any secondary index compared to a full table scan using ALLOW FILTERING and paging. Because reading the index files into memory has a cost and this cost only increases when the returned result set grows.