Preparing datasets
Learn how to prepare and format datasets for fine-tuning large language models with Second
Second supports a variety of dataset formats for fine-tuning large language models, including both conversational and completion datasets. This guide will walk you through the process of preparing your data in supported formats such as CSV, JSON, and JSONL.
Before you begin, it’s important to note that not all models are optimized for all formats. Be sure to consult the specific model documentation to verify the supported dataset formats for your chosen model.
Preparing data
Check out Magic Datasets for an easy way to generate high-quality datasets for fine-tuning large language models.
You can prepare your dataset in various formats, depending on the type of data you have and the model you plan to fine-tune. Here are some common dataset formats supported by Second, shown in JSONL (JSON Lines) format:
Uploading data
You can upload your prepared dataset directly to Second using the web interface or API. Follow these steps to upload your dataset:
- Log in to your Second account at console.usesecond.com
- Navigate to the Datasets section
- Click on “Create New Dataset”
- Follow the prompts to upload your data and configure your dataset
Fine-tuning with your dataset
Once your dataset is uploaded, you can use it to fine-tune your chosen model. Follow these steps to start fine-tuning:
- Select the model you want to fine-tune
- Choose your uploaded dataset as the training data
- Configure the fine-tuning parameters
- Start the fine-tuning process