Training for Better Answers
Database schema is crucial for understanding the structure of your database, but it might not fully convey the meaning and business logic behind it. Therefore, AI sometimes requires additional "Background Knowledge" to accurately answer your questions.
Consider running a social network. When asking:
What's the total number of active users last week?
It can be challenging for AI to define "active users" without specific context. This term might refer to users who logged in, made a purchase, or visited the website last week.
Think of AI as a new intern: incredibly intelligent but needing guidance to grasp the full picture.
Another example involves managing an ad platform similar to Google Ads, where advertisers pay per click. Your "Click" table, with columns "user_id", "ad_id", "cost", "click_time", records revenue per click in the "cost" column.
Asking:
What's the average revenue per user last month?
might stump the AI without knowing that "cost" represents revenue from clicks on an ad platform.
AI requires sufficient "Background Knowledge" and clear examples to provide accurate answers when the schema alone is insufficient.
If the answers are unsatisfactory, training AI with more specific data can help improve its responses. Notably, GPT-4 has shown to be more effective with training data than GPT-3.5.
We recommend using GPT-4 to enhance AI's understanding and accuracy in responses if training with GPT-3.5 does not yield the expected improvements.
Documentation and SQL Examples
The training data is divided into two parts: documentation and SQL examples.
To add new training data, click the Train button in the top right corner (Desktop App):
Documentation
Documentation consists of a list of documents that provide background knowledge of the database.
Consider an Ad platform database, for example, you might provide the following documentation:
- This database relates to an Ad platform. The "Click" table records all ad clicks, with the "cost" column reflecting the revenue generated by each click.
- Users own multiple applications, with the Click table linking to the Application table via the "application_id" column.
- Consequently, each user's total revenue is the sum of revenues generated from clicks across their applications.
Each piece of documentation should be concise and clear, limited to 400 characters. Add more documentation by clicking the "Add" button below:
SQL Examples
SQL Examples comprise a list of SQL queries for the AI to learn from, effectively pairing "Questions" with "Answers".
For instance, to calculate a user's total revenue, you might add:
This approach helps the AI to accurately respond to similar queries, such as "Who had the highest revenue yesterday?".
Think of SQL examples as "Quick actions" for frequently asked questions. Instead of specifying tables and columns each time, adding examples allows you to simply ask the question.
For example, after adding a revenue-related example, there's no need to manually define tables and columns for similar future inquiries.
Where is the Training Data Stored?
Training data is stored and accessed solely on your local machine, ensuring privacy and security.
An import & export feature is under development to facilitate sharing of training data.
For non-technical users, consider requesting assistance from your technical team to add training data.