> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nugen.in/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Evaluation Results

> Retrieve the complete results of a finished evaluation.


This endpoint returns detailed evaluation metrics and scores for a completed evaluation. Works for both single model evaluations and comparison mode evaluations.


**Path Parameters:**

- `evaluation_id`: Unique evaluation identifier


**Returns:**

- `evaluation_id`: The evaluation identifier
- `model_id`: Primary model that was evaluated
- `benchmark_id`: Benchmark that was used
- `status`: Evaluation status (should be `READY`)
- `raw_answers_count`: Number of raw answers generated during evaluation
- `completed_at`: ISO timestamp when evaluation finished
- `method` (optional): Evaluation method (`eval` for single model, `eval-compare` for comparison)
- `metrics` (optional): Evaluation metrics and scores (single model only)
- `model_id_2` (optional): Second model ID (comparison mode only)
- `base_model` (optional): Base model results (comparison mode only)
- `eval_model` (optional): Eval model results (comparison mode only)
- `comparison` (optional): Comparison results between models (comparison mode only)


**Raises:**

- `404`: If evaluation not found or doesn't belong to the authenticated user
- `400`: If evaluation is not yet completed


**Example Request:**

```json
GET /api/v3/evaluations/eval-xyz789/results
Headers: {"Authorization": "Bearer <api_key>"}
```


**Example Response (Single Model):**

```json
{
  "evaluation_id": "eval-xyz789",
  "model_id": "aligned-model-01kmqm4nrn9fw6r",
  "benchmark_id": "task-abc123",
  "status": "READY",
  "method": "eval",
  "raw_answers_count": 10,
  "completed_at": "2024-01-15T10:45:00Z",
  "metrics": {
    "accuracy": 0.92,
    "relevance": 0.88,
    "average_score": 0.90,
    "total_questions": 10,
    "correct_answers": 9
  }
}
```


**Example Response (Comparison Mode):**

```json
{
  "evaluation_id": "eval-xyz789",
  "model_id": "nugen-flash-instruct",
  "model_id_2": "aligned-model-01kmqm4nrn9fw6r",
  "benchmark_id": "task-abc123",
  "status": "READY",
  "method": "eval-compare",
  "raw_answers_count": 20,
  "completed_at": "2024-01-15T10:45:00Z",
  "base_model": {
    "model_id": "nugen-flash-instruct",
    "average_score": 0.92,
    "total_questions": 10
  },
  "eval_model": {
    "model_id": "aligned-model-123",
    "average_score": 0.85,
    "total_questions": 10
  },
  "comparison": {
    "winner": "nugen-flash-instruct",
    "score_difference": 0.07,
    "statistical_significance": true
  }
}
```


## OpenAPI

````yaml https://api.nugen.in/openapi-public.json get /api/v3/evaluations/{evaluation_id}/results
openapi: 3.1.0
info:
  title: Nugen Intelligence API
  description: 'Nugen Intelligence : Powering Specialized Intelligence At Scale'
  version: 25.4.20
servers: []
security: []
paths:
  /api/v3/evaluations/{evaluation_id}/results:
    get:
      tags:
        - Evaluations
      summary: Get Evaluation Results
      description: >-
        Retrieve the complete results of a finished evaluation.


        This endpoint returns detailed evaluation metrics and scores for a
        completed evaluation. Works for both single model evaluations and
        comparison mode evaluations.


        **Path Parameters:**


        - `evaluation_id`: Unique evaluation identifier


        **Returns:**


        - `evaluation_id`: The evaluation identifier

        - `model_id`: Primary model that was evaluated

        - `benchmark_id`: Benchmark that was used

        - `status`: Evaluation status (should be `READY`)

        - `raw_answers_count`: Number of raw answers generated during evaluation

        - `completed_at`: ISO timestamp when evaluation finished

        - `method` (optional): Evaluation method (`eval` for single model,
        `eval-compare` for comparison)

        - `metrics` (optional): Evaluation metrics and scores (single model
        only)

        - `model_id_2` (optional): Second model ID (comparison mode only)

        - `base_model` (optional): Base model results (comparison mode only)

        - `eval_model` (optional): Eval model results (comparison mode only)

        - `comparison` (optional): Comparison results between models (comparison
        mode only)


        **Raises:**


        - `404`: If evaluation not found or doesn't belong to the authenticated
        user

        - `400`: If evaluation is not yet completed


        **Example Request:**


        ```json

        GET /api/v3/evaluations/eval-xyz789/results

        Headers: {"Authorization": "Bearer <api_key>"}

        ```


        **Example Response (Single Model):**


        ```json

        {
          "evaluation_id": "eval-xyz789",
          "model_id": "aligned-model-01kmqm4nrn9fw6r",
          "benchmark_id": "task-abc123",
          "status": "READY",
          "method": "eval",
          "raw_answers_count": 10,
          "completed_at": "2024-01-15T10:45:00Z",
          "metrics": {
            "accuracy": 0.92,
            "relevance": 0.88,
            "average_score": 0.90,
            "total_questions": 10,
            "correct_answers": 9
          }
        }

        ```


        **Example Response (Comparison Mode):**


        ```json

        {
          "evaluation_id": "eval-xyz789",
          "model_id": "nugen-flash-instruct",
          "model_id_2": "aligned-model-01kmqm4nrn9fw6r",
          "benchmark_id": "task-abc123",
          "status": "READY",
          "method": "eval-compare",
          "raw_answers_count": 20,
          "completed_at": "2024-01-15T10:45:00Z",
          "base_model": {
            "model_id": "nugen-flash-instruct",
            "average_score": 0.92,
            "total_questions": 10
          },
          "eval_model": {
            "model_id": "aligned-model-123",
            "average_score": 0.85,
            "total_questions": 10
          },
          "comparison": {
            "winner": "nugen-flash-instruct",
            "score_difference": 0.07,
            "statistical_significance": true
          }
        }

        ```
      operationId: get_evaluation_results
      parameters:
        - name: evaluation_id
          in: path
          required: true
          schema:
            type: string
            title: Evaluation Id
      responses:
        '200':
          description: >-
            Returns detailed evaluation metrics and scores for a completed
            evaluation. This endpoint provides comprehensive results for a
            finished evaluation, including all relevant metrics, scores, and
            comparison data if applicable. Use this to analyze the performance
            of the evaluated model(s) against the benchmark once the evaluation
            is complete.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/EvaluationResultsResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
      security:
        - HTTPBearer: []
components:
  schemas:
    EvaluationResultsResponse:
      properties:
        evaluation_id:
          type: string
          title: Evaluation Id
          description: Unique identifier for the evaluation
          example: eval-abc123
        model_id:
          type: string
          title: Model Id
          description: ID of the model that was evaluated
          example: model-xyz789
        benchmark_id:
          type: string
          title: Benchmark Id
          description: Benchmark ID used
          example: benchmark-abc123
        status:
          type: string
          title: Status
          description: Evaluation status
          example: READY
        metrics:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Metrics
          description: Evaluation metrics and scores (single model)
        raw_answers_count:
          type: integer
          title: Raw Answers Count
          description: Number of raw answers generated
          example: 100
        completed_at:
          type: string
          title: Completed At
          description: ISO timestamp when evaluation completed
          example: '2024-02-24T10:05:00Z'
        method:
          anyOf:
            - type: string
            - type: 'null'
          title: Method
          description: 'Evaluation method: ''eval'' or ''eval-compare'''
          example: eval-compare
        model_id_2:
          anyOf:
            - type: string
            - type: 'null'
          title: Model Id 2
          description: ID of second model (for comparison)
          example: model-def456
        base_model:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Base Model
          description: Base model results (comparison mode)
          example: nugen-flash-instruct
        eval_model:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Eval Model
          description: Eval model results (comparison mode)
          example:
            accuracy: 0.85
        comparison:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Comparison
          description: Comparison results between models
          example:
            accuracy_difference: 0.05
      type: object
      required:
        - evaluation_id
        - model_id
        - benchmark_id
        - status
        - raw_answers_count
        - completed_at
      title: EvaluationResultsResponse
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````