Leave Copilot Aside. Build Your Own Code Generation Web Playground

Paolo Rechia on 2023-02-18

A weekend project coming to life

Photo by Clastr Cloud Gaming on Unsplash


A few months ago, I wondered whether I could easily build an open-source Copilot clone using CodeGen. The short answer is: yes, definitely it’s possible. Will it have the same quality as the original? Certainly not.

Nonetheless, the bright side is that it doesn’t require much effort, as surprising as it may be.

First, in 2022, Salesforce released an open source language model trained for auto-completing code called CodeGen.

The models come in different pre-trained sizes; one model type is specialized in Python. The Python models tend to perform better, according to the authors. The best part is that these models are available directly in huggingface.

I’ve decided to pick the smallest model fine-tuned for Python (350M-Mono) because I wanted this to operate on a CPU.

Why? For two reasons:

  1. First, there’s already a copilot clone that requires GPU, fauxpilot
  2. Second, maybe one day I might deploy this to a serverless environment

Since developing our own VSCode extension would be time-consuming and pointless (since there’s already fauxpilot), we can instead build a web app that uses our API using a framework like React.

Here’s a screenshot of how our playground will look like, showing off some sort of function generated by the model.

Building the Backend

To get started, I downloaded the CodeGen model and wrapped it into a very simple HTTP API using FastAPI and Docker.

Our service will be simple; we just need one endpoint that receives a prompt (some code from somewhere) and feeds it into the model. Here’s how this endpoint might look in our web service:

async def prompt_handler(request: request_types.PromptRequest):
    """Prompt handler: runs a prompt through the model"""
    result = model.infer(request.prompt_text, request.max_response_length)
    return {"model_response": result}

And here is the main code for using the model:

class SalesforceCodeGen:
    """Base Salesforce CodeGen class.
    Defines the behavior but does not instruct how to load the model.
    def __init__(self) -> None:
        self.tokenizer: AutoTokenizer = None
        self.model: AutoModelForCausalLM = None
        raise NotImplementedError(
            "This is an abstract class, cannot be instantiated directly"

    def _tokenize(self, input_: str):
        # pylint: disable-next=not-callable
        return self.tokenizer(input_, return_tensors="pt").input_ids

    def _decode(self, sample: str):
        return self.tokenizer.decode(sample[0], skip_special_tokens=True)

    def infer(self, input_: str, max_length: int):
        """Infer method, takes a text prompt and returns the model response."""
        logger.info("Tokenizing... %s", input_)
        inputs = self._tokenize(input_)
        logger.info("Generating... ")
        min_ = min(max_length, CodeGenModel.MAX_LENGTH)
        sample = self.model.generate(inputs, min_)
        return self._decode(sample)

As one can see, it’s very simple, and most of this code is a ripoff from the model’s official documentation.

Now, all we need is a way to load this model. I was feeling a bit classy that day, so I defined a subclass to specify how to load the model:

models_root_dir = os.environ["OPEN_CODE_GEN_API_MODEL_PATH"]

class ModelsPathMapping:
    """Available models."""
    SALESFORCE_350M_MONO = "Salesforce/codegen-350M-mono"
    SALESFORCE_2B_MONO = "Salesforce/codegen-2B-mono"

class SalesforceCodeGen350M(SalesforceCodeGen):
    """Loads the 350M parameters mono (Python only) model."""
    def __init__(self) -> None:
        logger.info("Loading model...")
        self.tokenizer = AutoTokenizer.from_pretrained(
            ModelsPathMapping.SALESFORCE_350M_MONO, cache_dir=models_root_dir
        self.model = AutoModelForCausalLM.from_pretrained(
            ModelsPathMapping.SALESFORCE_350M_MONO, cache_dir=models_root_dir

Not the best code I’ve ever written, but it works.

And then, finally, we just need a Dockerfile to build this container:

FROM python:3.10.9-bullseye
COPY models /models
COPY requirements.txt /
RUN ["pip", "install", "-r", "/requirements.txt"]
COPY src /src
ENTRYPOINT [ "uvicorn", "open_code_gen_api.main:app", "--host", ""]

You can find the complete version of this code in this repository.

Building this container is a bit tricky, and already some people have complained about it. First, one needs to manually create the models directory locally and download the required model into it before executing the docker build command. I might add a script for this at some point.

If you don’t want to build but want to try the API, you can pull the container directly from the Docker site.

Building the Frontend

First, we would ideally use an IDE as our frontend. However, I wanted to keep this project really small, so I searched for web alternatives. Turns out, Microsoft maintains the Monaco Editor, which is basically VSCode on the web. Here’s the link to learn more.

One can also find an easy-to-use integration with React at this link.

OK, we can build a React App that uses the Monaco Editor and interacts with our backend. Let’s start with the simplest block of code, our function to call our backend:

const API_URL = "<http://localhost:8000>"
export async function sendPromptQuery(prompt) {
    const promptUrl = API_URL + "/prompt"
    const response = await fetch(promptUrl, {
        headers: {
            "Content-Type": "application/json"
        method: "POST",
        body: JSON.stringify({
            "prompt_text": prompt,
            "max_response_length": 128
    const json_response = await response.json()
    console.log("Response: ", json_response)
    return json_response["model_response"]

And then we can build our App.js. I’ll skip the boring parts and focus on the glue with the Monaco Editor. You can find the original code at this link (it’s only about 100 lines of code).

First, of course, we import the Monaco Editor library using import Editor from "@monaco-editor/react";.

Then we need to make sure it’s being rendered inside our App. We can do that with this code:

// Inside render
  <div className="editor-inner-wrapper">
      defaultValue="# some comment"

It uses a handleEditorDidMount. Let’s take a look at that:

function handleEditorDidMount(editor, monaco) {
    editorRef.current = editor; 
    editor.onDidChangeModelContent = handleEditorChange

So basically, when we mount the React component, we get a reference to the editor and can store it. We can also use this reference to register additional event handlers. I had to dig a bit through Monaco’s documentation to find onDidChangeModelContent. You can learn more at this link.

This means every change to the editor triggers an event, which calls our new function handleEditorChange now:

function handleEditorChange() {
    const text = editorRef.current.getValue()

So, this is also not complex. It just reads the current value in the editor and sets the state in React. Of course, in our app, we defined these before:

function App() {
  const editorRef = useRef(null);
  const [editorValue, setEditorValue] = useState("");

This solves the part of propagating the state of the Monaco Editor to our React App state, which we can watch and control. Now, we need to trigger our backend and react to it.

So I created a button and hooked it with this function, which calls the backend and then appends the result to the editor in a very rudimentary way:

async function sendPromptAction() {
    const result = await sendPromptQuery(editorValue)
    editorRef.current.setValue(editorValue + "\\n" + text)
    handleEditorChange() // my original code also makes this call, but it
    // it looks redundant to me as I review this

That’s it! You have a very simple playground to develop a Copilot-like tooling for the web.

Sadly, when I tested this code, I noticed a bug: the API returns the original and newly generated code. This means the app duplicates the original code whenever we prompt the backend.

Wrapping It Up and the Next Steps

I hope you enjoyed this short guide. I kept it simple, as I didn’t go deep into this project (this was a weekend project). It’s worth mentioning some things you might want to explore:

  1. Deploying this API to the cloud, maybe as a serverless service
  2. Exploring a bigger model using GPU
  3. Developing a proper frontend. Maybe as a VSCode extension, by forking fauxpilot to use with your CPU model’s backend
  4. Create your own code completion inside the Monaco Editor, which calls your model