Covers: theory of Autoregressive Sequence Generation
Estimated time needed to finish: 30 minutes
Questions this item addresses:
How to train an autoregressive model on source code?
How to use this item?
This is a repo that contains the code for training the GPT-J model. It is written in JAX.
This is a computationally heavy task, but you could fine-tune GPT-J since a subset of its training data is from github. Follorw this link for more info.
Data Collection, Training, and Evaluation for Large Scale Code Generation (Copilot)
Total time needed: ~6 hours
This recipe walks you through steps necessary to build a pipeline that generates sources code. This work is an analysis of the GitHub / OpenAI "co-pilot" service and will help you understand how that works, and provides necessary steps to reproduce it