We use a combination of the following data sources:
- Publicly available and open-source datasets, primarily for research and development of new capabilities.
- Internally curated and annotated datasets created by trained specialists to improve model quality and reliability.
- User data, including information about product feature usage and quality signals, which is used to teach AI what types of outputs are liked most by users.
- Synthetically generated data, including model-generated examples and controlled system interaction data, which is used to improve specific capabilities for which annotated datasets are expensive or difficult to create.
Comments
0 comments
Please sign in to leave a comment.