General Questions
- What kinds of data are used to train your models, and what is the purpose of each type of data?
- How much data do you use?
- What types of data points do you use (such as labels or other characteristics)?
- Does your training data include any data protected by copyright, trademark, or patent, or is it entirely in the public domain?
- How was your training data acquired?
- Does the training data include personal or sensitive information?
- Does the training data include aggregate consumer information (as defined by California Civil Code Section 1798.40b)?
- What sort of training data processing is done?
- When were the training data collected?
- When were the training datasets first used for training?
- How do you use synthetic data?