# LabKit Field Standardization

## Overview

The LabKit Field Validator detects when your code uses deprecated logging field names and helps you migrate to standardized fields. This supports the [Observability Field Standardisation initiative](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/observability_field_standardisation/).

**Goal:** Standardize logging field names across GitLab so logs are queryable and actionable across all systems.

**How it works:** The validator intercepts logging calls during development and testing, detects deprecated fields, and compares them against a frozen baseline. New offenses fail CI; known offenses are tracked in `.labkit_logging_todo.yml`. The validator is **not** active in production environments.

For the architectural decision and rationale, see [ADR: Dynamic Runtime Linting](./architecture/decisions/001_field_standardization_dynamic_runtime_linting.md).

## Key Concepts

**Offense**
- A unique combination of [File Path] + [Deprecated Field] + [Logger Class]
- Multiple log calls in the same file using the same deprecated field = 1 offense
- Offenses exist until the deprecated field is entirely removed from the file

**TODO Baseline**
- A list of known offenses tracked in `.labkit_logging_todo.yml`
- Existing offenses in this baseline are allowed
- Any new offenses detected during development raise an error
- Prevents regression while allowing incremental cleanup

## Quick Start

### First-Time Setup

1. **Initialize the todo file:**

   ```bash
   bundle exec labkit-logging init
   ```

   This creates `.labkit_logging_todo.yml` with `skip_ci_failure: true`, which allows CI to pass while collecting the initial baseline.

2. **Commit and push:**

   ```bash
   git add .labkit_logging_todo.yml
   git commit -m "Add LabKit logging todo baseline"
   git push
   ```

3. **Wait for CI to complete**, then fetch the baseline:

   ```bash
   bundle exec labkit-logging fetch <project> <pipeline_id>
   ```

   For example:
   ```bash
   bundle exec labkit-logging fetch gitlab-org/gitlab 12345
   ```

   This fetches all detected offenses from the CI pipeline logs and populates the todo file. The `skip_ci_failure` flag is automatically removed.

4. **Commit the populated baseline:**

   ```bash
   git add .labkit_logging_todo.yml
   git commit -m "Populate LabKit logging todo baseline"
   git push
   ```

Future CI runs will now enforce this baseline—new offenses will fail the pipeline.

## Developer Workflow

### Fixing Offenses (Recommended)

Replace deprecated fields with standard constants:

```ruby
# Before
logger.info(user_id: current_user.id)

# After
logger.info(Labkit::Fields::GL_USER_ID => current_user.id)
```

When you fix an offense, it's automatically removed from the baseline on the next test run. Run your tests locally to verify:

```bash
LABKIT_LOGGING_TODO_UPDATE=true bundle exec rspec
```

### Adding Offenses Temporarily

If you can't fix an offense immediately, add it to the baseline:

```bash
LABKIT_LOGGING_TODO_UPDATE=true bundle exec rspec
```

This updates `.labkit_logging_todo.yml` with any new offenses found during the test run. Commit the updated file with your changes.

**Note:** Justify in your MR why you can't fix immediately. Keep the baseline as small as possible.

### Regenerating the Baseline

To regenerate the entire baseline from scratch:

```bash
rm .labkit_logging_todo.yml
bundle exec labkit-logging init
# Run CI, then:
bundle exec labkit-logging fetch <project> <pipeline_id>
```

## CI Behavior

### Baseline Generation Mode

When `skip_ci_failure: true` is set in the todo file:

- CI passes even when deprecated fields are detected
- Offenses are logged for collection via `labkit-logging fetch`
- Use this mode only during initial setup

### Enforcement Mode

When `skip_ci_failure` is not set (normal operation):

- **New offenses fail the pipeline** with a detailed error message
- **Known offenses** (in the baseline) are allowed
- **Fixed offenses** are automatically detected and can be removed from the baseline

Example CI failure output:

```
================================================================================
LabKit Logging Field Standardization: New Offenses Detected
================================================================================

app/services/user_service.rb:42: 'user_id' is deprecated. Use 'Labkit::Fields::GL_USER_ID' instead.
app/models/project.rb:15: 'project_id' is deprecated. Use 'Labkit::Fields::GL_PROJECT_ID' instead.

================================================================================
Total: 2 new offense(s) in 2 file(s)
================================================================================
```

### When Offenses Are Fixed

When you fix offenses that were in the baseline, you'll see a message indicating which offenses were resolved. Update the baseline locally to remove them:

```bash
LABKIT_LOGGING_TODO_UPDATE=true bundle exec rspec
git add .labkit_logging_todo.yml
git commit -m "Remove fixed logging offenses from baseline"
```

## CLI Reference

The `labkit-logging` command provides subcommands for managing the field validator.

```bash
bundle exec labkit-logging <command> [options]
```

### labkit-logging init

Creates a new `.labkit_logging_todo.yml` file with `skip_ci_failure: true`.

```bash
bundle exec labkit-logging init
```

### labkit-logging fetch

Fetches offense logs from a GitLab CI pipeline and updates the todo file.

```bash
bundle exec labkit-logging fetch <project> <pipeline_id>
```

**Arguments:**
- `project` - GitLab project ID or path (e.g., `278964` or `gitlab-org/gitlab`)
- `pipeline_id` - CI pipeline ID number

**Environment Variables:**
- `GITLAB_TOKEN` - GitLab API token (required)
- `CI_API_V4_URL` - GitLab API URL (default: `https://gitlab.com/api/v4`)

**Examples:**

```bash
# Using project path
bundle exec labkit-logging fetch gitlab-org/gitlab 12345

# Using project ID
bundle exec labkit-logging fetch 278964 12345
```

## Environment Variables

| Variable | Description |
|----------|-------------|
| `LABKIT_LOGGING_TODO_UPDATE=true` | Update the baseline with new offenses (local development) |
| `GITLAB_TOKEN` | GitLab API token for fetching CI logs |
| `CI_API_V4_URL` | GitLab API URL (defaults to gitlab.com) |

## Troubleshooting

**"New Offenses Detected" in CI**
- Fix the deprecated fields in your code, or
- Update the baseline locally: `LABKIT_LOGGING_TODO_UPDATE=true bundle exec rspec`
- Commit the updated `.labkit_logging_todo.yml`

**Offenses not detected**
- Ensure `.labkit_logging_todo.yml` exists in your project root
- Verify you're using `Labkit::Logging::JsonLogger`
- Check you're logging with a Hash (not String)
- Verify the code path is executed during tests

**Pipeline not found when fetching**
- Verify the project path/ID is correct
- Ensure the pipeline has completed (not still running)
- Check your `GITLAB_TOKEN` has read access to the project

**No offenses found in pipeline**
- Ensure `skip_ci_failure: true` was set during the CI run
- Verify the pipeline ran tests that exercise the logging code
- Check job logs are accessible with your token

## TODO File Format

```yaml
# LabKit Logging Field Standardization TODO
# AUTO-GENERATED FILE. DO NOT EDIT MANUALLY.

offenses:
  - logger_class: "Labkit::Logging::JsonLogger"
    callsite: "app/services/user_service.rb"
    deprecated_field: "user_id"
    standard_field: "Labkit::Fields::GL_USER_ID"
  - logger_class: "Labkit::Logging::JsonLogger"
    callsite: "app/models/project.rb"
    deprecated_field: "project_id"
    standard_field: "Labkit::Fields::GL_PROJECT_ID"
```

## References

- [ADR: Dynamic Runtime Linting](./architecture/decisions/001_field_standardization_dynamic_runtime_linting.md)
- [Observability Field Standardisation](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/observability_field_standardisation/)
- [Quality Epic](https://gitlab.com/groups/gitlab-org/quality/-/epics/235)
