- I don’t have direct access to my training corpus or a list of specific documents or websites.
- I was trained on a mixture of licensed data, data created by human trainers, and publicly available content (text, code, and images). I wasn’t trained on proprietary or paywalled sources unless the developers had permission.
- My training involved two main stages: pretraining on large-scale data to learn general patterns, then fine-tuning with human feedback to make responses more helpful and safe.
- I don’t retain or recall personal data about individuals unless it’s shared with me in the conversation. I can’t browse the web or access private databases.
- My knowledge is current up to October 2024, and I can’t verify whether any particular source was included in training.
