ollama

Ollama

Overview

Run LLMs locally - ollama.ai

Install

Download from https://ollama.ai and install. Easy peasy.

Or you could use curl https://ollama.ai/install.sh | sh, but I don’t love the idea of blindly pulling and executing shell scripts.

Running the first time

Running ollama is a bit confusing in that it is super simple and requires almost nothing to do. If you’re just starting, I would do the following to get familiar

ollama run llama this will pull down and then run the base llama model. When it’s finished, you will be presented with a standard python prompt (i.e. >>>) which is a chat interface
ctrl-c to break out of that when you’re done playing around
ollama pull <modelname> a few times to get get some more models and have ollama make them available. Ollama has its own model catalogue
ollama serve which will likely throw an error because it’s already running and listening on a port, however this is the quickest way I know to get it to tell me which port is being used^[1].

Security Alert

making your ollama service available in this way could pose a security risk. Make sure you are only doing this on a trusted network or use the firewall to restrict connections to specific, trusted hosts.

Notes:

No interface, runs completely from the command-line
- Note: on Mac, there is a system tray icon used for shutting down
a good way to run LLMs for use with an #agent like AutoGen
can run multiple models at the same time (actually: they run sequentially, but all are “available”)
Provides an #OpenAI compatible API
Ollama listens on 11343/tcp as the default port
Can handle multiple, simultaneous queries

Working with Ollama

Command-line options

--model specifies the model that ollama will use. It will attempt to download if it’s not there, e.g. --model ollama/codellama
--pull will download the specified model
--api-base can be used to set the IP and port on which the service is listening.

Models location

Ollama looks for the models in the following places and also places models here when using ollama pull

MacOS
- Most likely this is in ~/.ollama/models if you execute ollama as a user
Linux
- /usr/share/ollama/.ollama/models/

Using Modelfiles

The Modelfile is a configuration file for a model run by Ollama, and this can be used to set things like the prompt that the model expects to receive, temperature, system message, etc. Their format is a bit reminiscent of Dockerfile for #Docker.

First, checkout the modelfile documentation.

The Modelfile requires the following components

FROM directive specifying the source model

Change the context window

First, obtain the current modelfile with
ollama show --modelfile modelname:tag >> modelname-custom.Modelfile
Next, add the following line, adjusting the number to fit your needs
PARAMETER num_ctx 256000

Finally, create the new model with the following command:
ollama create new-model-name -f modelname-custom.Modelfile

Accessing via the API

OpenAI-compatible API

as of v0.1.24 on 2023-02-08, Ollama has an OpenAI-compatible API interface. There is no need to invoke it; the API is accessible if Ollama is serving up LLMs. It appears to nest the endpoints under /v1/v1, so use v1 at the tail-end of wherever you define the base url for the API, e.g. http://127.0.0.1:8080/v1, whereas if LM Studiois hosting the model, you would just put the server:port combo.

Allowing Inbound Network Connections

Windows

For modifying the firewall to allow other hosts on the local network, see Allow HTTP services incoming
For using Ollama on a Windows host from WSL distributions, see Allow connections to Windows host from WSL2

Mac

On a mac, Ollama will listen only on localhost by default. To make it accessible to other hosts on your network, perform the following:

Set the following two variables
- launchctl setenv OLLAMA_HOST 0.0.0.0:11434
- launchctl setenv OLLAMA_ORIGINS "*"
Restart Ollama from the system tray
Restart your terminal

Linux

For systemd, add the environment variable to the startup definitions:

Edit the service file: sudo systemctl edit ollama.service
Uncomment or add the following: Environment="OLLAMA_HOST=0.0.0.0:11434"
Restart the service: sudo systemctl daemon-reload && sudo systemctl restart ollama

Changes not reflected

On the off-chance that the changes to the Linux systemd service configuration does not take place, double-check the location of the configuration file using systemctl status ollama.service in the output will be a field called Loaded: which points to the service configuration file.

Outside the box

This section goes beyond the immediately native capabilities in Ollama.

Running other models

While Ollama has an extensive model library, maybe there is flashy new model on HuggingFace that you just have to incorporate into your tool. Otmane Boughaba has a nice [article on how to use custom LLMs] from #HuggingFace locally with Ollama(https://otmaneboughaba.com/posts/local-llm-ollama-huggingface/).

When running arbitrary models, we’ll need to create and use a modelfile and then create an ollama-compatible (ollama-aware?) version of the model so that it’s ready to be served up by the Ollama service.

Procedure

Grab the model you want to use. You can use HuggingFace#the CLI to do this.
Create a modelfile specifying attributes of the model and how you want it to behave
Build the model: ollama create <name> -f Modelfile
Run ollama list and you should see <name> as a model available for use by ollama,
Test with ollama run <name>.

litellm API wrapper

#litellm is a python module providing an #OpenAI compatible API interface. If the module is installed, Ollama will “just know” that it is there and make use of it.

install python -m pip install litellm

This can be specified by setting the OLLAMA_HOST variable ↩︎