Experimenting with LLMs

Using the llama-cpp inference runtime with smaller GGUF transformer models on old dusty hardware.

Context

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Build without GPU

mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=OFF

./build/bin/llama-cli \
  -m models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
  -c 1024 \
  -n 256 \
  -t 2

image info

image info

Function works but examples are wrong.