> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nugen.in/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Evaluation

> Create a new model evaluation using an existing benchmark.


This endpoint initiates a model evaluation task using a benchmark from your library. Supports both single model evaluation and comparison mode where two models are evaluated side-by-side.


**Request Body:**

- `model_id`: Primary model to evaluate (required)
- `benchmark_id`: ID of the benchmark to use for evaluation (required)
- `model_id_2` (optional): Second model ID for comparison mode
- `custom_metrics` (optional): Custom evaluation metrics configuration


**Returns:**

- `evaluation_id`: Unique identifier for tracking the evaluation
- `model_id`: The primary model being evaluated
- `status`: Initial status (always `PROCESSING`)
- `benchmark_id`: The benchmark being used
- `created_at`: Timestamp when evaluation was created
- `message`: Confirmation message


**Example Request (Single Model):**

```json
POST /api/v3/evaluations
Headers: {"Authorization": "Bearer <api_key>"}

{
  "model_id": "aligned-model-01kmqm4nrn9fw6r",
  "benchmark_id": "task-abc123",
  "custom_metrics": ["accuracy", "relevance"]
}
```


**Example Request (Comparison Mode):**

```json
POST /api/v3/evaluations
Headers: {"Authorization": "Bearer <api_key>"}

{
  "model_id": "nugen-flash-instruct",
  "model_id_2": "aligned-model-01kmqm4nrn9fw6r",
  "benchmark_id": "task-abc123"
}
```


**Example Response:**

```json
{
  "evaluation_id": "eval-xyz789",
  "model_id": "aligned-model-01kmqm4nrn9fw6r",
  "status": "PROCESSING",
  "benchmark_id": "task-abc123",
  "created_at": "2024-01-15T10:30:00Z",
  "message": "Evaluation created successfully and queued for execution"
}
```


**Notes:**

- Single model mode: Evaluates one model against the benchmark
- Comparison mode: Provide `model_id_2` to compare two models side-by-side
- Evaluation runs asynchronously - use the returned `evaluation_id` to check status
- Use `/evaluations/{evaluation_id}/status` to track progress


## OpenAPI

````yaml https://api.nugen.in/openapi-public.json post /api/v3/evaluations
openapi: 3.1.0
info:
  title: Nugen Intelligence API
  description: 'Nugen Intelligence : Powering Specialized Intelligence At Scale'
  version: 25.4.20
servers: []
security: []
paths:
  /api/v3/evaluations:
    post:
      tags:
        - Evaluations
      summary: Create Evaluation
      description: >-
        Create a new model evaluation using an existing benchmark.


        This endpoint initiates a model evaluation task using a benchmark from
        your library. Supports both single model evaluation and comparison mode
        where two models are evaluated side-by-side.


        **Request Body:**


        - `model_id`: Primary model to evaluate (required)

        - `benchmark_id`: ID of the benchmark to use for evaluation (required)

        - `model_id_2` (optional): Second model ID for comparison mode

        - `custom_metrics` (optional): Custom evaluation metrics configuration


        **Returns:**


        - `evaluation_id`: Unique identifier for tracking the evaluation

        - `model_id`: The primary model being evaluated

        - `status`: Initial status (always `PROCESSING`)

        - `benchmark_id`: The benchmark being used

        - `created_at`: Timestamp when evaluation was created

        - `message`: Confirmation message


        **Example Request (Single Model):**


        ```json

        POST /api/v3/evaluations

        Headers: {"Authorization": "Bearer <api_key>"}


        {
          "model_id": "aligned-model-01kmqm4nrn9fw6r",
          "benchmark_id": "task-abc123",
          "custom_metrics": ["accuracy", "relevance"]
        }

        ```


        **Example Request (Comparison Mode):**


        ```json

        POST /api/v3/evaluations

        Headers: {"Authorization": "Bearer <api_key>"}


        {
          "model_id": "nugen-flash-instruct",
          "model_id_2": "aligned-model-01kmqm4nrn9fw6r",
          "benchmark_id": "task-abc123"
        }

        ```


        **Example Response:**


        ```json

        {
          "evaluation_id": "eval-xyz789",
          "model_id": "aligned-model-01kmqm4nrn9fw6r",
          "status": "PROCESSING",
          "benchmark_id": "task-abc123",
          "created_at": "2024-01-15T10:30:00Z",
          "message": "Evaluation created successfully and queued for execution"
        }

        ```


        **Notes:**


        - Single model mode: Evaluates one model against the benchmark

        - Comparison mode: Provide `model_id_2` to compare two models
        side-by-side

        - Evaluation runs asynchronously - use the returned `evaluation_id` to
        check status

        - Use `/evaluations/{evaluation_id}/status` to track progress
      operationId: create_evaluation
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateEvaluationRequest'
      responses:
        '200':
          description: >-
            Returns a unique identifier for the initiated evaluation along with
            the initial evaluation status. This endpoint starts an asynchronous
            evaluation process using a specified benchmark and model(s),
            allowing users to track progress and retrieve results once
            completed.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/EvaluationResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - HTTPBearer: []
components:
  schemas:
    CreateEvaluationRequest:
      properties:
        model_id:
          type: string
          title: Model Id
          description: ID of the model to evaluate
          example: model-xyz789
        benchmark_id:
          type: string
          title: Benchmark Id
          description: ID of existing benchmark from BenchmarkTask table
          example: benchmark-abc123
        custom_metrics:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Custom Metrics
          description: Custom metrics configuration
        model_id_2:
          anyOf:
            - type: string
            - type: 'null'
          title: Model Id 2
          description: ID of second model for comparison mode (eval-compare)
          example: model-def456
      type: object
      required:
        - model_id
        - benchmark_id
      title: CreateEvaluationRequest
    EvaluationResponse:
      properties:
        evaluation_id:
          type: string
          title: Evaluation Id
          description: Unique identifier for the evaluation
          example: eval-abc123
        model_id:
          type: string
          title: Model Id
          description: ID of the model being evaluated
          example: model-xyz789
        status:
          type: string
          title: Status
          description: Current status of the evaluation
          example: PROCESSING
        benchmark_id:
          type: string
          title: Benchmark Id
          description: Benchmark ID used
          example: benchmark-abc123
        created_at:
          type: string
          title: Created At
          description: timestamp of evaluation creation
          example: '2024-02-24T10:00:00Z'
        message:
          type: string
          title: Message
          description: Status message
          example: Evaluation created successfully
      type: object
      required:
        - evaluation_id
        - model_id
        - status
        - benchmark_id
        - created_at
        - message
      title: EvaluationResponse
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````