AI Powered Log Management
While AI has taken over the world since the release of ChatGPT, the log management space remains mostly the same. The same paradigms and offerings are used by vendors like Elastic, Splunk, and DataDog. These offerings include full-text search, field search, basic aggregations, and visualizations. The only "breakthrough" related to AI is AI-driven anomaly detection, which is not even related to the LLM revolution.
Once you find an anomaly, you'll want to investigate it using your huge dataset of logs, but that capability is extremely limited. This is where Obics comes in. We offer the capability you would expect from a post-ChatGPT world: ask questions in natural language and gain insights. That question might be related to an anomaly, like "Does the error spike correlate with increased CPU usage?" or "Do these errors happen when the new feature flag is on?" Or you might ask questions that aren't related to anomalies at all, like "Show me a time chart of user conversion rates in the last week in 1-hour intervals." Since our system is trained on your own logs and metrics, it knows how to build the right query to answer the question.
Why it Works
Obics AI uses your data and your prompt to build an SQL query that will answer your question. It's important to make the distinction between building a query rather than giving you an answer right away. An AI is never 100% correct, and if it gives you an answer, you can never be sure it is correct. A query, however, is different because a human engineer can validate and correct it. You can then be certain that the query will return the correct result for that query.
We're using ClickHouse under the hood, an extremely fast columnar database with data-warehouse-class analytics capabilities. It supports complex joins and OLAP queries, even on petabytes of data. Moreover, it's so fast that it answers queries in real time. That's something traditional log analysis solutions don't support. Traditional tools don't support joins (beyond basic lookup), full OLAP, subqueries, or multiple aggregations. That means they will never be able to produce a query that answers complex questions requiring these capabilities.
Training Using Your Data
New LLMs already know how to generate great SQL queries without any additional training—that's one of the advantages of using a popular query language like SQL. But to make it tailor-fitted to your data, we're doing some more processing. We take one of the newer models from OpenAI as the base model and give it anywhere from dozens to thousands of prompt-answer pairs to fine-tune a new dedicated model. In fact, you can assist this process by adding new queries to the training data.
Once you utter a prompt, we're using vector technology to add a context of the most relevant logs and metrics to the prompt.
This method uses both a pre-trained model and a runtime sample from the most relevant data, which could be data from as recently as 10 seconds ago.
All our trained models are private per customer. We don't use your data for any other training.