# Sampling Config For the OpenAI protocol, **top_p** and **temperature** are specified in the outer protocol, while other sampling parameters are specified in **extra_configs**. Example: ```JSON { "model":"Qwen_14B_pressure_test", "messages":[ { "role":"system", "content":"You are a helpful assistant." }, { "role":"user", "content":"Hello, what's the weather like in Hangzhou today" } ], "stream":true, "temperature":1, "max_tokens":1024, "top_p":0.8, "extra_configs" : { "top_k": 1 } } ``` The raw protocol specifies sampling parameters through **generate_config**. Example: ```JSON { "model":"m6-13b-v1", "prompt":"Human: write a list for trip\n\nAssitant:","generate_config":{ "top_k": 1, "top_p": 0 } } ``` ## Basic Control Parameters | Parameter Name | Core Function | |--------------------|--------------------------------------------------------------------------| | temperature | Controls sampling randomness:
→ 0: Deterministic mode
→ 1: Standard random mode | | top_k | Candidate set truncation strategy:
→ 0: Disabled
→ N: Take top N high-probability tokens | | top_p | Nucleus sampling strategy:
→ 0.95: Take candidate set with cumulative probability of 95% | | max_new_tokens | Maximum generation length:
MIN(input_length + max_new_tokens, MAX_SEQ_LEN) | | min_new_tokens | Enforces minimum generation length | ## Advanced Control Parameters | Parameter Name | Function Description | |------------------------|--------------------------------------------------------------------------| | repetition_penalty | Repetition suppression factor:
→ >1.0 suppresses repetition
→ <1.0 encourages repetition |default = 1.0 | | frequency_penalty| This parameter is used to discourage the model from repeating the same words or phrases too frequently within the generated text. It is a value that is added to the log-probability of a token each time it occurs in the generated text. A higher frequency_penalty value will result in the model being more conservative in its use of repeated tokens.| default = 0.0 | |presence_penalty | This parameter is used to encourage the model to include a diverse range of tokens in the generated text. It is a value that is subtracted from the log-probability of a token each time it is generated. A higher presence_penalty value will result in the model being more likely to generate tokens that have not yet been included in the generated text. |default = 0.0 | | stop_words_list | Token ID stop words (better performance):
`[[20490,25],[1024]]` | | stop_words_str | String stop words (better compatibility):
`["","\nObservation"]` | | random_seed | Random seed control:
→ None: True random
→ Fixed value: Reproducible generation | ```python # Stop Words Configuration Example { "stop_words_str": ["<|im_end|>", "\nObservation:"], "stop_words_list": [[20490, 25], [50256]] } ``` ## Return Control Parameters | Parameter Name | Effect | Use Case | |---------------------|----------------------------------------|-----------------------------| | return_logits | Returns logits matrix for each position| Output analysis/post-processing | | return_hidden_states| Returns hidden states of transformer layers | Model debugging/feature extraction | | return_input_ids | Returns input sequence encoding result | Input validation | | return_output_ids | Returns output sequence encoding result| Output decoding validation | ## Special Mode Parameters | Mode Name | Control Parameter | Function Description | |----------------------|--------------------------|--------------------------------------------------------------------------| | Thinking Mode | in_think_mode=True | Agent-specific scenario:
Control thinking phase length with max_thinking_tokens | | Streaming Output | yield_generator=True | Enable chunked return mechanism | | Parallel Decoding | pd_separation=True | Enable parallel decoding optimization (hardware support required) | ## Environment Variable Description ```bash # Force override stop words configuration export FORCE_STOP_WORDS=true export STOP_WORDS_STR="[\"\",\"\\n\"]" # Hybrid mode (default) export FORCE_STOP_WORDS=false # union of environment variables, model defaults, and configuration parameters ``` ## Sampling Strategy Description ### Combined Strategy Example ```python { "temperature": 0.7, "top_k": 50, "top_p": 0.95, "repetition_penalty": 1.2, "top_p_decay": 0.9, # 10% decay per token "top_p_min": 0.5 # Minimum decay threshold } ``` ### Strategy Recommended Values | Scenario | temperature | top_p | top_k | |---------------|------------|-------|-------| | Code Generation| 0.2-0.4 | 0.9 | 40 | | Creative Writing| 0.7-1.0 | 0.95 | 100 | | Factual Q&A | 0.1-0.3 | 0.8 | 20 | ## Parameter Usage Notes 1. **Stop Words Selection Principles** - Performance priority: Use `stop_words_list` for high-frequency triggering scenarios - Compatibility priority: Use `stop_words_str` for complex pattern matching 2. **Length Limit Coordination** ```math Actual maximum length = min( input_token_len + max_new_tokens, MAX_SEQ_LEN ) ``` 3. **Thinking Mode Special Constraints** ```python # Need to configure simultaneously { "in_think_mode": True, "max_thinking_tokens": 512, # Control thinking phase length "max_new_tokens": 2048 # Control total output length } ``` 4. **Streaming Output Limitations** - Need to enable `yield_generator=True` simultaneously - `return_logits` only returns complete data in non-streaming mode