### Overview
[LLM Farm](https://apps.apple.com/us/app/llm-farm/id6461209867) is an advanced iOS application that allows users to interact with various large language models. It offers a suite of customization options for fine-tuning language generation according to specific needs.
### Getting Started
1. **Launch the App**: Open the LLM Farm app on your iOS device.
2. **Edit Chat**: Tap on ‘Edit Chat’ to configure your language model settings.
### Model Configuration
Select the language model that best suits your task from an extensive list, including options like GPT-2, LLAMA, and others.
#### Model Selection
* Navigate to the “Inference” field to choose your desired model architecture.
* Choose from pre-existing models like `gpt2`, `llama`, `replit`, and more.
### Prediction Options
Configure how the model generates predictions.
#### Context Settings
* **Context Size**: Determine how many previous tokens the model should consider (e.g., 1024 tokens).
* **N\_Batch**: Set the batch size for prompt processing to manage multiple requests simultaneously.
#### Sampling Methods
Choose how the model will sample predictions.
* **Greedy**: Selects the most likely next word.
* **Temperature**: Introduces randomness into the selection process based on a set temperature value.
* **Mirosat**: A custom sampling method based on target entropy.
#### Penalties
Adjust penalties to control repetition and enhance output quality.
* **Repeat Penalty**: Penalize repeated sequences to ensure varied responses.
* **Frequency Penalty**: Discourage frequent use of the same word to maintain diversity.
* **Presence Penalty**: Avoid reusing words that have already been mentioned.
### Advanced Settings
Delve deeper into customization with advanced settings.
#### numberOfThreads
* **Description**: Specifies the number of threads the model uses for parallel processing.
* **Acceptable Values**: Any positive integer or 0 for the maximum available.
* **Impact Example**: Increasing the number might speed up processing but could lead to higher memory usage.
#### context
* **Description**: Determines the number of tokens from the previous text the model uses to generate the next word.
* **Acceptable Values**: Typically, a power of 2, such as 512, 1024, or 2048.
* **Impact Example**: A larger context size allows the model to consider more of the previous conversation, potentially improving coherence over longer interactions.
#### n\_batch
* **Description**: The batch size for prompt processing, indicating how many prompts will be processed simultaneously.
* **Acceptable Values**: Any positive integer, commonly a power of 2 like 16, 32, 64, etc.
* **Impact Example**: A higher batch size can process more prompts at once but requires more computational resources.
### Sampling Options
#### temp (Temperature)
* **Description**: Controls the randomness in the prediction distribution.
* **Acceptable Values**: A decimal typically between 0.1 and 1.0. Values above 1.0 increase randomness, while values below 1.0 make predictions more deterministic.
* **Impact Example**: Setting `temp` to 0.5 might result in more predictable text, whereas a value of 1.5 could generate more creative and diverse outputs.
#### top\_k
* **Description**: Limits the next word predictions to the top k most likely words.
* **Acceptable Values**: Any non-negative integer. Setting to 0 disables this feature.
* **Impact Example**: Setting `top_k` to 20 restricts the model to consider only the top 20 most probable next words, which can help maintain coherence.
#### top\_p
* **Description**: Nucleus sampling parameter that chooses from the smallest set of words whose cumulative probability exceeds the threshold p.
* **Acceptable Values**: A decimal typically between 0 and 1.
* **Impact Example**: A `top_p` value of 0.9 might generate text with good variability while still being relevant to the prompt.
#### tfs\_z (Tail Free Sampling)
* **Description**: Adjusts the ‘sharpness’ of the probability distribution tail, controlling how much attention is given to less likely words.
* **Acceptable Values**: Typically a positive decimal. Higher values make the tail ‘heavier’, giving more chance to less likely words.
* **Impact Example**: Increasing `tfs_z` may result in more diverse language generation, possibly at the cost of coherence.
#### typical\_p
* **Description**: Ensures that each selected word is not too unlikely, given a threshold p.
* **Acceptable Values**: A decimal, usually close to 1.
* **Impact Example**: Setting `typical_p` to a high value like 0.95 ensures most generated words are within a typical range for the context.
### Penalty Settings
#### repeat\_penalty
* **Description**: Applies a penalty for repeating the same word or phrase to discourage redundancy.
* **Acceptable Values**: A decimal typically greater than 1.0.
* **Impact Example**: A `repeat_penalty` of 1.2 might reduce the repetition of previous words in the output.
#### repeat\_last\_n
* **Description**: Defines the number of last tokens to check for repetition.
* **Acceptable Values**: Any positive integer.
* **Impact Example**: With `repeat_last_n` set to 50, the model will look at the last 50 tokens to avoid repetition, which could help in producing more varied content.
#### frequency\_penalty
* **Description**: Decreases the likelihood of repeatedly using the same word in generation.
* **Acceptable Values**: A decimal typically between 0 and 1.
* **Impact Example**: If `frequency_penalty` is set to 0.5, the model is less likely to repeat words it has already used, encouraging more diverse vocabulary.
#### presence\_penalty
* **Description**: Discourages the model from using words that have already appeared.
* **Acceptable Values**: A decimal typically between 0 and 1.
* **Impact Example**: With a `presence_penalty` of 0.1, there is a slight discouragement from reusing words, promoting lexical diversity without significant impact on content relevance.
### Advanced Sampling
#### mirosat
* **Description**: Custom sampling based on target entropy (if the field is set to 1, it’s enabled).
* **Acceptable Values**: Typically 0 (disabled) or 1 (enabled).
* **Impact Example**: Enabling `mirosat` might adjust the word selection process to maintain a certain level of entropy, possibly leading to more unpredictable and varied text output.
#### mirosat\_tau
* **Description**: The target entropy value when `mirosat` sampling is enabled.
* **Acceptable Values**: Positive decimals, where higher values aim for higher entropy in the selection process.
* **Impact Example**: A higher `mirosat_tau`, like 5.0, encourages the model to produce more diverse and less deterministic outputs, which could be beneficial for creative writing tasks.
### Examples of Settings Impact
* **Lower** `temperature` **for factual queries**: When asking the model to provide information, a lower temperature (e.g., 0.7) can help produce more precise and relevant answers.
* **Higher `temperature` for creative tasks**: For creative writing or brainstorming, a higher temperature (e.g., 1.2) can lead to more innovative and varied ideas.
* **Moderate `top_k` for balanced output**: A mid-range `top_k` (e.g., 40) can offer a balance between creativity and relevance, useful for tasks like article writing.
* **Tailored `top_p` for nuanced control**: Adjusting `top_p` (e.g., 0.9) allows for dynamic control over the diversity of the text, suitable for scenarios where the model needs to adapt to a wide range of topics.
* **Penalty adjustments for content quality**: Fine-tuning `repeat_penalty`, `frequency_penalty`, and `presence_penalty` can drastically improve the readability and uniqueness of the content, especially in longer text generations.
* **Metal**: Toggle to enable Metal for GPU-accelerated model inference (if available).
* **MMAP**: Enable memory mapping for efficient memory usage.
#### Saving and Templates
* **Save as new template**: Save your current configuration as a new template for future use.
### Executing a Chat
1. **Enter Prompt**: Type your prompt in the designated area.
2. **Send**: Tap ‘Send’ to receive the model’s response based on the configured settings.
### Additional Features
* **Clear Chat History**: Erase the conversation to start fresh.
* **Special Tokens**: Use ‘BOS’ (Beginning of Sentence) and ‘EOS’ (End of Sentence) tokens to mark the start and end of your input.