White Hex icon
Introducing: A search & relevancy assessment from engineers, not theorists. Learn more

Aug 15, 2024

Search 101: Event Queue, Streaming and Buffering Best Practices

Nick Zadrozny

Best Practices

3

min read

Search 101: Event Queue, Streaming and Buffering Best Practices for OpenSearch and Elasticsearch

In enterprise and production-grade search applications, using event queueing and buffering is crucial for maintaining data integrity and ensuring seamless operations. Kafka, a distributed event streaming platform, plays an important role in this process. Kafka allows for maintenance and upgrades with essentially no downtime.

The Key Uses of Kafka in Enterprise Search

  1. Search Engines vs. Databases: Your search engine is not your database. Treating it as such can lead to issues. A durable system like Kafka handles failures and replays.
  2. Buffering for Maintenance and Upgrades: Using Kafka, you can manage Elasticsearch indices without downtime. Buffer incoming writes into Kafka; this allows maintenance or version upgrades on your Elasticsearch cluster without stopping data ingestion.
  3. Multiple Index Management: Kafka shines when you need to update or maintain multiple search indices concurrently. Maintaining more than one index is essential for tweaking search algorithms. Kafka allows parallel writing to multiple indexes, which allows for testing and rolling out changes without affecting user experience.
  4. Robust Error Handling: Kafka’s ability to manage multiple consumers from the same stream improves your error handling strategies. This setup provides a fail-safe mechanism, allowing for retries or delayed processing in case of transient failures, which helps in maintaining the integrity of your search operations.
  5. Scalability and Cost Efficiency: Considering Kafka's architecture, it’s not only scalable but also cost-efficient for handling large data streams with multiple consumers. Whether you’re dealing with billions of records or multi-tenant environments, Kafka provides the backbone for data handling without incurring significant overheads.

Note: Step 2 is essential for ensuring that your search applications remain online and responsive, regardless of backend operations. For enterprise and prod-grade search deployments it is table stakes to have a queuing system in place to allow for maintenance and upgrades without requiring search read down time.

Implementing Queueing and Buffering with Kafka

  1. Set Up Kafka:
    • Install and configure Kafka, creating a topic specifically for write operations.
  2. Modify Your Application:
    • Publish write operations to the Kafka topic instead of directly to OpenSearch/Elasticsearch.
  3. Create a Kafka Consumer:
    • Develop a consumer application that reads messages from the Kafka topic and writes them to your Bonsai.io hosted OpenSearch/Elasticsearch cluster.
  4. Handle Errors Gracefully:
    • Implement error handling to manage temporary failures, such as re-queuing failed operations or logging errors for further analysis.
  5. Monitor and Scale:
    • Use Kafka monitoring tools to track the performance of your queues. Adjust the number of partitions and consumers to handle increased load.
  6. Batch Processing:
    • Utilize batch processing to optimize write operations. Process messages in batches to reduce the load on your search cluster.

Best Practices

  • Idempotency: Ensure write operations are idempotent to avoid duplicate processing.
  • Buffering: Use buffering to group multiple operations, enhancing efficiency.
  • Monitoring: Monitor Kafka and adjust configurations as needed to maintain performance.
  • Ingestion Optimization: Unlock the potential of your ingestion

Managed Kafka Services Recommendations:

  • AWS Managed Kafka Service (MSK): AWS MSK is a fully managed service that makes it easy to build and run applications that use Kafka to process streaming data. MSK provides secure and highly available data streams, ideal for mission-critical applications. Bonsai runs 10s of billions of queries through this service and it performs admirably.
  • Heroku's Multitenant Kafka Managed Service: Heroku offers a multitenant Kafka service that scales with your needs, providing a cost-effective solution without requiring management of a full Kafka cluster. This service is particularly beneficial for startups and midsize companies looking for flexibility and ease of use. Heroku’s offering is scales incredibly cheaply for smaller throughput applications. Bonsai’s managed service is also available on the Heroku Addon Marketplace.

Alternative Queueing with Redis

Redis offers a simple solution for queueing compared to Kafka, especially for applications with less complex or smaller datasets. Utilizing Redis lists, you can efficiently manage queues by storing IDs or brief data snippets. Redis might be more suitable for simpler, high-speed applications, while Kafka is a better fit for large-scale, durable message processing needs. However Redis does have more limitations related to durability and availability.

Find out how we can help you.

Schedule a free consultation to see how we can create a customized plan to meet your search needs.