Preparing datasets

Second supports a variety of dataset formats for fine-tuning large language models, including both conversational and completion datasets. This guide will walk you through the process of preparing your data in supported formats such as CSV, JSON, and JSONL.

Before you begin, it’s important to note that not all models are optimized for all formats. Be sure to consult the specific model documentation to verify the supported dataset formats for your chosen model.

Preparing data

Check out Magic Datasets for an easy way to generate high-quality datasets for fine-tuning large language models.

You can prepare your dataset in various formats, depending on the type of data you have and the model you plan to fine-tune. Here are some common dataset formats supported by Second, shown in JSONL (JSON Lines) format:

{"text": "<|im_start|>user\nHello, how are you?\n<|im_end|>\n<|im_start|>assistant\nI'm doing well, thank you.\n<|im_end|>"}
{"text": "<|im_start|>user\nWhat's the weather like today?\n<|im_end|>\n<|im_start|>assistant\nIt's sunny, with a high of 75 degrees.\n<|im_end|>"}

Uploading data

You can upload your prepared dataset directly to Second using the web interface or API. Follow these steps to upload your dataset:

Log in to your Second account at console.usesecond.com
Navigate to the Datasets section
Click on “Create New Dataset”
Follow the prompts to upload your data and configure your dataset

Fine-tuning with your dataset

Once your dataset is uploaded, you can use it to fine-tune your chosen model. Follow these steps to start fine-tuning:

Select the model you want to fine-tune
Choose your uploaded dataset as the training data
Configure the fine-tuning parameters
Start the fine-tuning process

Get Started

Fine-tuning

Datasets