Inflection API gateway documentation

As a reminder, you may not (i) use the Inflection APIs in a way that infringes, misappropriates or violates any person’s rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Inflection APIs (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Inflection APIs to develop models that compete with Inflection; (iv) represent that output from the Inflection APIs was human-generated; (v) buy, sell, or transfer API keys without Inflection’s prior consent; or (vi), send Inflection any “personal information”, “personal data”, or any other information that is regulated by applicable laws.

API structure

POST https://layercake.pubwestus3.inf7ks8.com/external/api/inference

This endpoint generates a completion to a provided conversation. It requires the following headers, where <token> is the API credential you were provided by Inflection AI.

Authorization: Bearer <token>

The API expects a JSON body, which may include the following fields:

context (list, required): The past turns of conversation, as a list. Each entry in the list should be an object with the following fields:
- type (string, required): Either "AI", "Human", or "Instruction", representing the entity responsible for this message text.
- text (string, required): The text of the message.
- ts (number, optional): Seconds since the Unix epoch when the message was sent.
config (string, required): The model configuration to call; currently, the accepted values are "inflection_3_pi" and "inflection_3_productivity".
temperature (float, optional, default 1): Temperature is a value ranging from 0 to 1 inclusive which controls the randomness of the response. Higher values will make the output more random, which may be preferable when you wish to encourage creativity. Lower values will make the output more deterministic, which may be preferable in a setting where output is highly structured such as code generation. Note that this is an alternative to controlling sampling with top_p; as such we recommend modifying only one of these two parameters.
stop_tokens (list[string], optional, default None): The value of stop_tokens is a list of strings used to control the AI's text generation. When one of these strings is generated by the AI, text generation will be halted.
max_tokens (integer, optional, default 256): The value of max_tokens must be an integer greater than 0 and less than or equal to 1024 and denotes the maximum number of tokens that the AI will generate in a single turn. This parameter is helpful if you need to control costs.
top_p (float, optional, default 0.95): The value of top_p ranges from 0 to 1 inclusive and controls the sampling mechanism used by the AI. When set, the AI will consider the tokens with the top_p proportion of probability mass for eventual generation. Note that this is an alternative to controlling sampling with temperature; as such we recommend modifying only one of these two parameters.
web_search (boolean, optional, default True): By default, the AI response may include information derived from web searches if it is deemed necessary. In order to prevent the use of information retrieved from the web you should set this parameter to False.
metadata (object, optional): If specified, optional information about the user that the AI can utilize when constructing a response. It may have the following fields:
- user_firstname (string, optional): The user's first name.
- user_timezone (string, optional): The user's timezone, e.g. "America/Los_Angeles".
- user_country (string, optional): The user's country, e.g. England.
- user_region (string, optional): The user's region within their country, e.g. "CA" for California.
- user_city (string, optional): The user's city, e.g. "San Francisco".

The response will consist of a JSON object containing two fields:

created (number): The (floating point) number of seconds since the Unix epoch when the response was generated.
text (string): The AI's text generation in response to the context.

Note that this endpoint is rate limited at 100 requests/min per API key. If you need a higher rate limit, please contact us.

Request

POST /external/api/inference HTTP/1.1
Authorization: Bearer <REDACTED>
Content-Type: application/json
Host: layercake.pubwestus3.inf7ks8.com

{
    "config": "inflection_3_pi",
    "temperature": 0.8,
    "max_tokens": 256,
    "top_p": 0.85,
    "stop_tokens": ["STOP", "finished"],
    "web_search": True,
        "context": [
            {
                "type": "Human",
                "text": "What's up?",
                "ts": 1708414958660
            },
            {
                "type": "AI",
                "text": "Not much, chilling in cyberspace! How about you?"
            },
            {
                "type": "Human",
                "text": "Going to get some coffee! Have any recommendations for me?"
            }
        ],
        "metadata": {
            "user_firstname": "Alice",
            "user_timezone": "America/Los_Angeles",
            "user_country": "United States of America",
            "user_region": "CA",
            "user_city": "San Francisco"
        }
}

Response

HTTP/1.1 200 OK
Content-Type: application/json

{
    "created": 1714688002.0557644,
    "text": "Ooh, sounds like a plan! 😎 Here are some coffee spots in San Francisco that I think you'll love:\n\n- Philz Coffee: Known for their unique blends and pour-over method. They offer a wide range of flavors, so you're sure to find something you love!\n- Blue Bottle Coffee: A local favorite that takes pride in their fresh, high-quality coffee beans. Their pour-over coffees are especially delicious!\n- Sightglass Coffee: With a focus on sustainability and ethical sourcing, Sightglass offers a variety of single-origin coffees that are sure to please.\n\nDo any of these sound good to you?"
}

Streaming Request

POST https://layercake.pubwestus3.inf7ks8.com/external/api/inference/streaming

This endpoint is a streaming version of /api/inference. The request headers and body are exactly the same.

The response will consist of streamed messages containing JSON objects with the following fields:

created (number): The (floating point) number of seconds since the Unix epoch when this chunk was generated.
index (number): The index of this chunk in the stream of messages.
text (string): A chunked portion of the AI's text generation.

POST /external/api/inference/streaming HTTP/1.1
Authorization: Bearer <REDACTED>
Content-Type: application/json
Host: layercake.pubwestus3.inf7ks8.com

{
    "config": "inflection_3_pi",
    "temperature": 0.8,
    "max_tokens": 256,
    "top_p": 0.85,
    "stop_tokens": ["STOP", "finished"],
    "web_search": True,
        "context": [
            {
                "type": "Human",
                "text": "What's up?"
            },
            {
                "type": "AI",
                "text": "Not much, chilling in cyberspace! How about you?"
            },
            {
                "type": "Human",
                "text": "Going to get some coffee! Have any recommendations for me?"
            }
        ],
        "metadata": {
            "user_firstname": "Alice",
            "user_timezone": "America/Los_Angeles",
            "user_country": "United States of America",
            "user_region": "CA",
            "user_city": "San Francisco"
        }
}

Streaming Response

HTTP/1.1 200 OK
Content-Type: application/json

{"created": 1728094708.2514212, "idx": 0, "text": " Ooh, nice"}

{"created": 1728094708.5789802, "idx": 1, "text": "! What kind of coffee mood are you in today?"}

{"created": 1728094709.2364252, "idx": 2, "text": " Do you want something classic like a latte or are you feeling adventurous and wanting to try a new coffee creation? 😋"}

{"created": 1728094709.4067984, "idx": 3, "text": "☕"}