Go4Hive

Improving Hive’s semantic search performance

BY: @blocktrades | CREATED: Aug. 1, 2025, 9:58 p.m. | VOTES: 869 | PAYOUT: $166.57 | [ VOTE ]

[IMAGE: https://images.hive.blog/DQmSihw8Kz4U7TuCQa98DDdCzqbqPFRumuVWAbareiYZW1Z/blocktrades%20update.png]

HiveSense is a HAF app that creates semantic embeddings for Hive posts. These embeddings will be used by the new HiveSense API calls to find posts that are similar to each other and to find posts that match a user’s search request.

A while back, one of our devs posted on the early work we did developing a semantic search API server for hive posts called HiveSense.

Today’s post describes the follow-on work in the past 2.5 months as we prepare for an official release of HiveSense as part of the standard HAF API stack, so this post assumes you’ve previously read the original post about HiveSense.

This post is even more technical than the previous one, so the primary audience is devs interested in practical considerations associated with using semantic search in their apps or who would like to contribute to HiveSense development in the future.

Performance Optimizations

Use Ollama’s batch API to generate multiple embeddings in a single call
Add support for, and default to, using 16 bit precision floating point vectors. This reduced the 768-dimension embeddings table size from 100GB to 50GB.
Use PySBD for sentence detection instead of spacy (smaller)
Dramatically reduced size of docker images (many changes here, but biggest win was removal of pgai from default config)
Allow HiveSense to process data concurrently while hivemind is still in massive sync (originally Hivesense could only be synced after hivemind was in livesync, and hivemind takes about 2.5 days to sync on a very fast system). Since HiveSense syncs faster than hivemind, and can now sync in parallel with hivemind except for the time to create the HNSW index, HiveSense now only adds 1.5 hours to overall time to sync up a HAF API node from scratch.
Redesigned worker thread implementation

Recall Optimizations (Recall here basically is referring to the “quality” of the search results)

Chunk long posts on sentence boundaries to capture semantic meaning better in the chunks
Chunk all of a post instead of limiting a post to a max of 3 chunks. This creates more embeddings, but allows for better matching of long posts, especially if they switch topics.
Target chunk size based on number of tokens rather than raw character count
Prepend the title to the post so the title will be used when calculating the embedding of the post’s first chunk
Change the minimum word count for posts to a minimum token count, make the token count configurable, and default the minimum to 75 tokens
Add many options to ease switching to better embedding models in the future and to examine tradeoffs between models
Use query prefix to improve embeddings generated for search queries
Don’t discard non-ASCII characters (these were being removed for embedding calculations)
Improve filtering of HTML
Increased m and ef_construction to improve recall quality, especially for “small queries” that sometimes got poor search results.

Miscellaneous changes

Track the number of tokens in each post and order the embeddings generated for each post
Allow filtering out shorter posts at search time (previously this could only be done at index time)
Change API results to make paging deterministic (in progress)

Redesign of embeddings tables and indexing methodology

Originally we just used a single table with 768-dimension vectors and built a HNSW index on this table. Both the table and the index were originally 100GB each in size (e.g. 200GB total storage required by HiveSense). Our first optimization was to use 16-bit precision numbers instead of 32-bit to cut storage requirements in half (100GB in total size, which seemed like a reasonable amount of storage).

But another problem we found was that it was very time consuming to create an HNSW index this large. On systems without a LOT of memory, it would take quite a few days. On systems with 128GB of RAM installed, this time could be cut down to around 8.5 hours (the current code that computes this index really favors having sufficient memory for the index creation), but this seemed a steep requirement for most API servers (we have internal servers that have this much memory, but the cloud servers we rent only have 64GB).

The solution we arrived at was to create secondary smaller embeddings table and create an HNSW index on that smaller table, then use the larger embeddings table for the final similarity computation.

HiveSense uses principal component analysis (PCA) to generate a second embeddings table with much smaller 128-dimension vectors (table size 9GB) and a much smaller HNSW index on this table (the new HNSW index is only 16GB). This new index only takes 1.5 hours to build and only requires 28GB of memory (can be build in 4.5 hours on systems with less memory).

Storage-wise, with all the optimizations, we reduced total storage usage from 100+100=200GB down to 50+9+16=75GB.

This approach also dramatically speeds up API query time as we’re searching a much smaller index, but we don’t have full statistics for this yet (our guess is somewhere between 3x and 10x faster).

Of course, we did have one concern about this approach: we needed to ensure it didn’t negatively effect recall results. To ensure this, we compared search results for various queries between a full brute force search of the embeddings and a search using the new index to ensure the results didn’t significantly change.

New Sync Mode for HiveSense

A normal CPU is sufficient to generate embeddings for short text phrases like those used for search queries, but generating semantic embeddings for posts is too computationally intensive, so a GPU is required to generate them at a reasonable speed.

We didn’t want to force API node operators to have a GPU, so HiveSense can be configured to operate in two different modes: independent mode and sync mode.

In the independent mode, HiveSense expects to have access to one or more Ollama servers with GPUs providing computation power.

In sync mode, the embeddings for posts are fetched from another HiveSense server, so the local HiveSense server only needs to compute embeddings for user search queries (which can computed with an Ollama server powered just by a reasonable CPU).

As we don’t expect most current API nodes to have access to a GPU (our primary API node, api.hive.blog doesn’t), we expect most API node operators will configure HiveSense to operate in sync mode, sparing their server from repeating the expensive computations required for computing post embeddings.

What’s next for HiveSense?

We need to change the API to stabilize paging of search results based on our new approach: we will return 1000 results, with the first 20 results including permlink + summary results for the post, and the remaining results just providing permlinks. Client side apps will need to fetch further post summaries in case the user pages beyond the first page.

We need to update our app testing API server, api.syncad.com, with the new stack so that Hive apps can add support for HiveSense and perform “real-world” testing.

Finally, we need to officially release HiveSense along the other updated HAF apps. Currently I expect that to happen near the end of this quarter (sometime in September).

TAGS: [ #HiveDevs ] [ #hive ] [ #blockchain ] [ #software ] [ #blocktrades ]

Replies

@hatoto | Aug. 1, 2025, 10:19 p.m. | Votes: 0 | [ VOTE ]

that is a real important update. Thanks aöpt for workong on it!

@holoz0r | Aug. 2, 2025, 12:28 a.m. | Votes: 0 | [ VOTE ]

Great news. Hive search can really use these improvements for content discoverability.

@celeste413 | Aug. 2, 2025, 3:26 a.m. | Votes: 0 | [ VOTE ]

不明觉厉👍

@theguruasia | Aug. 2, 2025, 8:14 a.m. | Votes: 0 | [ VOTE ]

$WINE

@wine.bot | Aug. 2, 2025, 8:15 a.m. | Votes: 0 | [ VOTE ]

Congratulations, @theguruasia You Successfully Shared 0.300 WINEX With @blocktrades.
You Earned 0.300 WINEX As Curation Reward.
You Utilized 3/5 Successful Calls.

Swap Your Hive <=> Swap.Hive With Industry Lowest Fee or Highest Reward : Click This Link
Read Latest Updates Or Contact Us

@latinowinner | Aug. 2, 2025, 8:49 a.m. | Votes: 0 | [ VOTE ]

very technical information

@spiritabsolute | Aug. 2, 2025, 2:58 p.m. | Votes: 0 | [ VOTE ]

What would we do without you? It's good that you exist! Well done!

@amazing23 | Aug. 2, 2025, 9:18 p.m. | Votes: 0 | [ VOTE ]

This like a kind of encouraging for hive communities

@mahirv | Aug. 2, 2025, 11:22 p.m. | Votes: 0 | [ VOTE ]

This is a very important update. Thank you very much for the hard work on this.

@tallyban70 | Aug. 3, 2025, 3:04 a.m. | Votes: 0 | [ VOTE ]

I can't wait for September,
Thank you for taking this initiative
This highly creative

@marpasifico | Aug. 3, 2025, 5:58 a.m. | Votes: 0 | [ VOTE ]

Excellent, thorough explanation that provides a glimpse into how to navigate successfully.

@chewsk1 | Aug. 9, 2025, 5:30 p.m. | Votes: 0 | [ VOTE ]

Why is blocktrades.us no longer live?

@blocktrades | Aug. 11, 2025, 6:59 a.m. | Votes: 0 | [ VOTE ]

Details here: https://hive.blog/blocktrades/@blocktrades/blocktrades-ending-its-cryptocurrency-trading-service-as-of-june-30th-2023-today

@raymondelaparra | Aug. 13, 2025, 3:26 p.m. | Votes: 1 | [ VOTE ]

@blocktrades excellent information

@hivebuzz | Aug. 16, 2025, 7:45 a.m. | Votes: 0 | [ VOTE ]

Congratulations @blocktrades! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You distributed more than 200000 upvotes.Your next target is to reach 210000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

@baby1 | Aug. 18, 2025, 11:40 p.m. | Votes: 0 | [ VOTE ]

Indeed the hivesense project is a welcome development. I can't wait to see more and more good developments on Hive.

@jazlove | Aug. 30, 2025, 7:49 a.m. | Votes: 0 | [ VOTE ]

Very interesting updates. It’s really cool to see all the changes and improvements.

@vanny.vvvlog | Aug. 31, 2025, 3:56 a.m. | Votes: 0 | [ VOTE ]

Great news! Indeed, its a nonstop innovation for more efficient discovery!

@nsanwalji | Sept. 19, 2025, 12:54 a.m. | Votes: 0 | [ VOTE ]

Very important information.

@salmeron-sw | Sept. 24, 2025, 2:48 a.m. | Votes: 0 | [ VOTE ]

Interesting information, it will surely be very useful for the ecosystem.

@dorothy35 | Oct. 8, 2025, 3:04 a.m. | Votes: 0 | [ VOTE ]

Great update! I really appreciate how you keep improving Hive's performance and making things run smoother for everyone. The changes in the embeddings tables and indexing may be technical, but they show how much dedication goes into keeping Hive fast and reliable. Thanks for all your hard work!

@bilpcoinbpc | Oct. 14, 2025, 6:32 p.m. | Votes: 1 | [ VOTE ]

https://peakd.com/hive-124838/@meno/re-ackza-t43f2y

@veosine | Oct. 23, 2025, 5:09 a.m. | Votes: 0 | [ VOTE ]

“Stock exchanges India, Hong Kong, and Australia Feel the Impact of the US–China Trade War”

[ BACK TO TRENDING ] [ BACK TO MENU ]