Using offline AI models for free in your Phyton scripts with Ollama

Unleash the power of the AI models in your app without monetary or data safety concerns

Mar 05, 2025

The Ollama project allows us to donwload and use AI models for offline usage with our computer resources. This allows us to experiment with the AI in our Python projects without any cost, an testing a lot of models to find the ideal choice for our project. It's awesome.

Installation of Ollama

The installation of Ollama in a Linux device (for MacOS and Windows, check Ollama Github page) is very, very easy. Just write this command in a terminal:

curl -fsSL https://ollama.com/install.sh | sh

After a long wait, Ollama will be fully installed and configured.

Download a model

Once installed, we can download any model to our computer for offline usage. In the Ollama library page we can read the full list of available models.

For example, to download gemma2 with 2 billions of parameters, the command will be:

ollama pull gemma2:2b

The model will be downloaded to /usr/share/ollama/.ollama/models local folder, if you are curious (as I am).

Use the downloaded model in Python

Now, we can use the downloaded Gemma model as any other cloud model, like this:

from ollama import Client, ResponseError

try:
    client = Client(
        host='http://localhost:11434',
        headers={}
    )

    response = client.chat(
        model='gemma2:2b',
        messages=[{
            'role': 'user',
            'content': 'Describe why Ollama is useful',
        }]
    )

    print(response['message']['content'])

except ResponseError as e:
    print('Error:', e.error)

The program will output our requested answer. Wonderful!

A real example: check this article about Ollama... with Ollama

We can program a very simple article checker with Ollama and Python, like this

from ollama import Client, ResponseError

try:
    client = Client(
        host='http://localhost:11434',
        headers={}
    )

    prompt  = "I am an spanish writer that is learning how to "
    prompt += "write in english. Please, review if this article "
    prompt += "is well written. Thank you!\n\n"

    with open('article.md') as f:
        prompt += f.read()

    response = client.chat(
        model='gemma2:2b',
        messages=[{
            'role': 'user',
            'content': prompt,
        }]
    )

    print(response['message']['content'])

except ResponseError as e:
    print('Error:', e.error)

When executed, Gemma will give us a detailer analysis of this article, with advices for improvement.

Awesome! The possibilities are limitless!

A lot to learn, a lot of fun

Ollama allows us to test different models with our most precious data, without any privacy concern. Ollama allows to to save costs in the initial stages of development of an AI powered application.

And you? What kind of projects will you develop witn the help of Ollama?

Happy coding!

About the list

Among the Python and Docker posts, I will also write about other related topics, like:

Software architecture
Programming environments
Linux operating system
Etc.

If you found some interesting technology, programming language or whatever, please, let me know! I'm always open to learning something new!

About the author

I'm Andrés, a full-stack software developer based in Palma, on a personal journey to improve my coding skills. I'm also a self-published fantasy writer with four published novels to my name. Feel free to ask me anything!

A Python journey to Full-Stack

Discussion about this post