The details depend on the particular capability being targeted, but we typically filter training data based on automatic measures of writing quality, such as grammaticality, fluency, and clarity. These sorts of filtering help to increase the writing quality of text generated by our models. We have also implemented an automatic pseudonymization process for standard sensitive personal data fields in the data before training, which makes it less likely that our AI models will output sensitive fields verbatim from the training data.
Comments
0 comments
Please sign in to leave a comment.