# drop — Full Reference for AI Agents ## Project - **Name**: drop - **Language**: Go 1.26.0 - **Module**: github.com/corewire/drop - **API Group**: drop.corewire.io/v1alpha1 - **Scope**: All CRDs cluster-scoped - **License**: MIT - **Framework**: Kubebuilder / controller-runtime ## CRD Field Reference ### CachedImage CachedImage ensures a single container image is pre-cached on cluster nodes. Controller: internal/controller/cachedimage_controller.go | Test: internal/controller/cachedimage_controller_test.go #### Spec | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Image | `image` | `string` | ✓ | | Image is the fully qualified image reference without tag or digest. Example: "docker.io/library/nginx", "registry.example.com/team/app" | | Tag | `tag` | `string` | — | | Tag to pull. Mutually exclusive with Digest. Example: "1.25-alpine", "v2.4.1", "latest" | | Digest | `digest` | `string` | — | | Digest to pull as an immutable reference. Mutually exclusive with Tag. Use this for reproducible deployments where the exact image layer matters. Example: "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" | | ImagePullPolicy | `imagePullPolicy` | `corev1.PullPolicy` | — | `Always` | ImagePullPolicy controls when kubelet pulls the image on each node. - Always (default): check the registry for a newer digest even if the tag exists locally. - IfNotPresent: skip the registry check when the tag already exists on the node. - Never: never pull (only useful for pre-loaded images). Enum: `Always`,`IfNotPresent`,`Never` | | ImagePullSecrets | `imagePullSecrets` | `[]corev1.LocalObjectReference` | — | | ImagePullSecrets are references to Secrets in the namespace where Drop creates pull Pods. The default namespace is "drop-system" unless the controller is started with a different --pod-namespace. The Secret must contain a .dockerconfigjson key. Example: [{name: "ghcr-creds"}, {name: "ecr-creds"}] | | NodeSelector | `nodeSelector` | `map[string]string` | — | | NodeSelector restricts which nodes to cache the image on. Only nodes matching ALL key-value pairs will be targeted. Example: {"node-role.kubernetes.io/build": "true"} | | Tolerations | `tolerations` | `[]corev1.Toleration` | — | | Tolerations allow the pull pod to be scheduled on tainted nodes. Example: [{key: "node-role.kubernetes.io/build", operator: "Exists", effect: "NoSchedule"}] | | Priority | `priority` | `*int32` | — | | Priority is a pull ordering hint. Lower values are pulled first. Images with the same priority are pulled in alphabetical order. Default: 0 (no priority). Example: 10 (low priority), -10 (high priority) | | PolicyRef | `policyRef` | `*PolicyReference` | — | | PolicyRef references a PullPolicy resource that controls pacing (concurrency, backoff, delays). If unset, the operator uses built-in defaults (1 concurrent node, 10s delay, 30s initial backoff). Example: {name: "conservative"} | #### Status | Field | JSON | Type | Description | |-------|------|------|-------------| | ObservedGeneration | `observedGeneration` | `int64` | ObservedGeneration is the last generation reconciled. | | Phase | `phase` | `string` | Phase summarizes the overall state. | | Ready | `ready` | `string` | Ready is a human-readable "nodesReady/nodesTargeted" fraction for display. | | ResolvedDigest | `resolvedDigest` | `string` | ResolvedDigest is the sha256 digest of the image as reported by the container runtime after pull. | | NodesTargeted | `nodesTargeted` | `int32` | NodesTargeted is the number of nodes that should have this image. | | NodesReady | `nodesReady` | `int32` | NodesReady is the number of nodes that have successfully pulled the image. | | NodesPulling | `nodesPulling` | `int32` | NodesPulling is the number of nodes currently pulling the image. | | CachedNodes | `cachedNodes` | `[]string` | CachedNodes is the list of node names that have successfully cached the image. | | ConsecutiveFailures | `consecutiveFailures` | `int32` | ConsecutiveFailures counts sequential reconcile failures for backoff calculation. | | LastPulledAt | `lastPulledAt` | `*metav1.Time` | LastPulledAt is the timestamp of the most recent successful pull. | | LastAttemptedAt | `lastAttemptedAt` | `*metav1.Time` | LastAttemptedAt is the timestamp of the most recent pull attempt (success or failure). | | Conditions | `conditions` | `[]metav1.Condition` | Conditions represent the latest available observations. Condition types: Ready, PullProgress. | ### CachedImageSet CachedImageSet manages a group of images to cache, optionally backed by a DiscoveryPolicy. Controller: internal/controller/cachedimageset_controller.go | Test: internal/controller/cachedimageset_controller_test.go #### Spec | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | PolicyRef | `policyRef` | `*PolicyReference` | — | | PolicyRef references a PullPolicy for pacing controls. Propagated to all child CachedImages. Example: {name: "conservative"} | | DiscoveryPolicyRef | `discoveryPolicyRef` | `*DiscoveryPolicyReference` | — | | DiscoveryPolicyRef references a DiscoveryPolicy that provides a dynamic image list. When set, the operator reads status.discoveredImages from the referenced DiscoveryPolicy and creates/deletes child CachedImages accordingly. Can be combined with static images. Example: {name: "popular-build-images"} | | ImagePullPolicy | `imagePullPolicy` | `corev1.PullPolicy` | — | `Always` | ImagePullPolicy controls when kubelet pulls images. Propagated to all child CachedImages. Default: "Always". See CachedImage.spec.imagePullPolicy for details. Enum: `Always`,`IfNotPresent`,`Never` | | ImagePullSecrets | `imagePullSecrets` | `[]corev1.LocalObjectReference` | — | | ImagePullSecrets for private registries. Propagated to all child CachedImages. Secrets must exist in the namespace where Drop creates pull Pods (default: "drop-system"). Example: [{name: "ghcr-creds"}] | | NodeSelector | `nodeSelector` | `map[string]string` | — | | NodeSelector restricts which nodes to cache images on. Propagated to all child CachedImages. Example: {"node-role.kubernetes.io/build": "true"} | | Tolerations | `tolerations` | `[]corev1.Toleration` | — | | Tolerations for tainted nodes. Propagated to all child CachedImages. Example: [{key: "node-role.kubernetes.io/build", operator: "Exists", effect: "NoSchedule"}] | | Images | `images` | `[]ImageEntry` | — | | Images is a static list of images to cache. Each entry creates one child CachedImage. Can be used alone or combined with discoveryPolicyRef (both lists are merged). | #### Status | Field | JSON | Type | Description | |-------|------|------|-------------| | ObservedGeneration | `observedGeneration` | `int64` | ObservedGeneration is the last generation reconciled. | | Phase | `phase` | `string` | Phase summarizes the overall state. | | ImagesManaged | `imagesManaged` | `int32` | ImagesManaged is the number of CachedImage children managed by this set. | | ImagesReady | `imagesReady` | `int32` | ImagesReady is the number of children in Ready phase. | | Conditions | `conditions` | `[]metav1.Condition` | Conditions represent the latest available observations. | ### DiscoveryPolicy DiscoveryPolicy automatically discovers images from registries or Prometheus metrics. Controller: internal/controller/discoverypolicy_controller.go | Test: internal/controller/discoverypolicy_controller_test.go #### Spec | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Sources | `sources` | `[]DiscoverySource` | ✓ | | Sources is the list of discovery backends to query. At least one source is required. Multiple sources are merged and ranked together before maxImages is applied. | | ImageFilter | `imageFilter` | `string` | — | | ImageFilter is a regex applied to discovered image references. Only matching images are kept. Example: "registry.example.com/team/.*" (only keep images from that registry path) | | SyncInterval | `syncInterval` | `metav1.Duration` | — | `30m` | SyncInterval is how often the operator re-queries all sources and updates status.discoveredImages. Default: "30m". Example: "1h", "15m" | | MaxImages | `maxImages` | `int32` | — | `50` | MaxImages caps the total number of images stored in status.discoveredImages. Images are ranked by score; lowest-scoring images are dropped when the cap is exceeded. Default: 50. Example: 30, 100 | #### Status | Field | JSON | Type | Description | |-------|------|------|-------------| | LastSyncTime | `lastSyncTime` | `*metav1.Time` | LastSyncTime is the timestamp of the last successful sync. | | DiscoveredImages | `discoveredImages` | `[]DiscoveredImage` | DiscoveredImages is the list of discovered images from all sources. | | ImageCount | `imageCount` | `int32` | ImageCount is the number of discovered images. | | SourceCount | `sourceCount` | `int32` | SourceCount is the number of configured sources. | | Conditions | `conditions` | `[]metav1.Condition` | Conditions represent the latest available observations. | ### PullPolicy PullPolicy controls the pacing and retry behavior for image pulls across cluster nodes. It is a configuration-only resource with no status. #### Spec | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | MaxConcurrentNodes | `maxConcurrentNodes` | `int32` | — | `1` | MaxConcurrentNodes is the maximum number of nodes pulling simultaneously for images that reference this policy. Increase for large clusters; keep low for bandwidth-constrained nodes. Default: 1. Example: 3 (pull on up to 3 nodes at once) | | MinDelayBetweenPulls | `minDelayBetweenPulls` | `metav1.Duration` | — | `10s` | MinDelayBetweenPulls is the minimum wait time between starting a pull on one node and starting the next pull on another node. Prevents burst traffic to the registry. Default: "10s". Example: "30s", "1m" | | FailureBackoff | `failureBackoff` | `*BackoffConfig` | — | | FailureBackoff configures exponential retry delays when a pull fails. If unset, defaults to initial=30s, max=5m. | | RepullInterval | `repullInterval` | `*metav1.Duration` | — | | RepullInterval defines how often to re-pull already-cached images to pick up digest changes. Unset or zero means never re-pull (rely on imagePullPolicy=Always on the CachedImage instead). Example: "24h" (re-pull daily), "6h" | | NodeSelector | `nodeSelector` | `map[string]string` | — | | NodeSelector scopes this policy to a specific node pool. Only relevant when the same PullPolicy should only pace pulls on a subset of nodes. Example: {"node-role.kubernetes.io/build": "true"} | | Tolerations | `tolerations` | `[]corev1.Toleration` | — | | Tolerations allow the pull pods created under this policy to schedule on tainted nodes. Example: [{key: "dedicated", value: "ci", effect: "NoSchedule"}] | ## Helper Types ### BackoffConfig BackoffConfig defines exponential retry backoff behavior for failed pulls. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Initial | `initial` | `metav1.Duration` | — | `30s` | Initial delay before the first retry attempt after a failure. Default: "30s". Example: "1m" | | Max | `max` | `metav1.Duration` | — | `5m` | Max is the upper bound on backoff delay. Retries will never wait longer than this. Default: "5m". Example: "10m" | ### DiscoveredImage DiscoveredImage represents a single discovered image with metadata. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Image | `image` | `string` | ✓ | | Image is the fully qualified image reference. | | Score | `score` | `int64` | ✓ | | Score is the ranking score from the source (higher = more relevant). | | Source | `source` | `string` | ✓ | | Source identifies which discovery source produced this image. | ### DiscoveryPolicyReference DiscoveryPolicyReference is a reference to a DiscoveryPolicy resource. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Name | `name` | `string` | ✓ | | Name of the DiscoveryPolicy resource. | ### DiscoverySource DiscoverySource defines a single discovery backend. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Type | `type` | `string` | ✓ | | Type identifies the discovery backend. Must be "prometheus" or "registry". Enum: `prometheus`,`registry` | | Prometheus | `prometheus` | `*PrometheusSource` | — | | Prometheus contains the configuration when type=prometheus. | | Registry | `registry` | `*RegistrySource` | — | | Registry contains the configuration when type=registry. | | SecretRef | `secretRef` | `*corev1.LocalObjectReference` | — | | SecretRef references a Secret in the namespace where Drop creates pull Pods. The default namespace is "drop-system" unless the controller is started with a different --pod-namespace. Supported Secret keys: token, username, password, ca.crt, tls.crt, tls.key, headers.. Example: {name: "prometheus-creds"} | ### ImageEntry ImageEntry defines a single image to include in a set. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Image | `image` | `string` | ✓ | | Image is the fully qualified image reference without tag or digest. Example: "docker.io/library/nginx", "registry.example.com/team/app" | | Tag | `tag` | `string` | — | | Tag to pull. Mutually exclusive with Digest. Example: "1.25-alpine", "v2.4.1" | | Digest | `digest` | `string` | — | | Digest to pull as an immutable reference. Mutually exclusive with Tag. Example: "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" | ### PolicyReference PolicyReference is a reference to a PullPolicy resource. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Name | `name` | `string` | ✓ | | Name of the PullPolicy resource. | ### PrometheusSource PrometheusSource defines Prometheus query configuration for image discovery. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | Endpoint | `endpoint` | `string` | ✓ | | Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics). Example: "http://prometheus.monitoring.svc:9090", "https://mimir.example.com" | | Query | `query` | `string` | ✓ | | Query is the PromQL expression. It MUST return results with an "image" label — that label value is used as the discovered image reference. The query result value is used as the ranking score (higher = more relevant). Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image) | | Lookback | `lookback` | `*metav1.Duration` | — | | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and sums all returned values per image to produce a score. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" | | Step | `step` | `string` | — | `5m` | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate sums but higher Prometheus load. Default: "5m". Example: "1m", "15m" | ### RegistrySource RegistrySource defines OCI registry tag listing configuration for image discovery. | Field | JSON | Type | Required | Default | Description | |-------|------|------|----------|---------|-------------| | URL | `url` | `string` | ✓ | | URL is the registry base URL (without repository path). Example: "https://registry.example.com", "https://ghcr.io" | | Repositories | `repositories` | `[]string` | ✓ | | Repositories is the list of repository paths to list tags from. Example: ["team/app", "team/worker", "infra/tools"] | | TagFilter | `tagFilter` | `string` | — | | TagFilter is a regex applied to tag names. Only matching tags are discovered. Example: "^v[0-9]+\\." (semver tags only), "^main-" (main branch builds) | | TopX | `topX` | `int32` | — | | TopX limits the number of tags kept per repository after tagFilter is applied. The registry API does not provide creation timestamps here; Drop keeps the last N tags returned by the registry. Example: 3 (keep the last 3 matching tags returned per repo) | | ImageTemplate | `imageTemplate` | `string` | — | | ImageTemplate is a Go text/template for constructing the full image reference from discovered tags. Available variables: {{.Registry}}, {{.Repository}}, {{.Tag}} Default (when unset): "{{.Registry}}/{{.Repository}}:{{.Tag}}" Example: "{{.Registry}}/{{.Repository}}@{{.Tag}}" (if tags are actually digests) | ## Relationships ```mermaid graph LR CachedImage -->|references| PullPolicy CachedImageSet -->|references| PullPolicy CachedImageSet -->|references| DiscoveryPolicy ``` ## Status Conditions & Error Reasons | Reason | Controller | Meaning | Troubleshooting | |--------|-----------|---------|-----------------| | Cached | CachedImage | Image cached on all N target nodes | | | Complete | CachedImage | All pulls complete | | | Idle | CachedImage | Waiting to start pulls | | | InProgress | CachedImage | N/N nodes ready | | | PullFailed | CachedImage | N/N nodes ready | | | Pulling | CachedImage | Actively pulling on N node(s), N/N complete | | | Stalled | CachedImage | Pull stalled: N/N nodes ready, retrying with backoff | | | Degraded | CachedImageSet | N/N images cached, failing: N | | | Progressing | CachedImageSet | N/N images cached | | | Ready | CachedImageSet | All N images are cached | | | AllSourcesHealthy | DiscoveryPolicy | All discovery sources responded successfully | | | ConnectionRefused | DiscoveryPolicy | | | | DNSError | DiscoveryPolicy | | | | PartiallyFailed | DiscoveryPolicy | Discovered N images, but some sources failed: N | | | SourceError | DiscoveryPolicy | One or more sources failed to respond | | | SyncFailed | DiscoveryPolicy | | | | Synced | DiscoveryPolicy | Discovered N images | | ## Metrics | Name | Type | Description | |------|------|-------------| | `drop_images_cached_total` | counter | Total number of images successfully cached on nodes. | | `drop_pull_duration_seconds` | histogram | Duration of image pull operations in seconds. | | `drop_pull_errors_total` | counter | Total number of failed image pull attempts. | | `drop_discovery_images_found` | gauge | Number of images found by a discovery policy. | | `drop_active_pulls` | gauge | Current number of active image pull Pods. | | `drop_reconcile_total` | counter | Total number of reconciliation attempts. | | `drop_discovery_source_health` | gauge | Whether a discovery source is reachable and queryable (1=healthy, 0=unhealthy). | | `drop_discovery_source_latency_seconds` | histogram | Latency of discovery source queries in seconds. | | `drop_nodes_targeted` | gauge | Number of nodes targeted by each CachedImage resource. | | `drop_nodes_cached` | gauge | Number of nodes where the image is successfully cached. | | `drop_consecutive_failures` | gauge | Current number of consecutive pull failures for a CachedImage. | ## Sample CRs ```yaml # Dev samples: deployed by Tilt for interactive testing --- # === PullPolicy === apiVersion: drop.corewire.io/v1alpha1 kind: PullPolicy metadata: name: dev-conservative spec: maxConcurrentNodes: 1 minDelayBetweenPulls: 5s repullInterval: 1h failureBackoff: initial: 30s max: 5m --- # === CachedImage: healthy === apiVersion: drop.corewire.io/v1alpha1 kind: CachedImage metadata: name: dev-nginx spec: image: docker.io/library/nginx tag: "1.25-alpine" policyRef: name: dev-conservative --- apiVersion: drop.corewire.io/v1alpha1 kind: CachedImage metadata: name: dev-redis spec: image: docker.io/library/redis tag: "7-alpine" policyRef: name: dev-conservative --- # === CachedImage: broken (DNS failure → ImagePullBackOff) === apiVersion: drop.corewire.io/v1alpha1 kind: CachedImage metadata: name: test-invalid-image spec: image: registry.invalid.local:9999/does-not-exist tag: "nope" policyRef: name: dev-conservative --- # === CachedImageSet: healthy (static images) === apiVersion: drop.corewire.io/v1alpha1 kind: CachedImageSet metadata: name: dev-set spec: policyRef: name: dev-conservative images: - image: docker.io/library/alpine tag: "3.19" - image: docker.io/library/busybox tag: "1.36" --- # === CachedImageSet: dynamic (backed by DiscoveryPolicy) === apiVersion: drop.corewire.io/v1alpha1 kind: CachedImageSet metadata: name: dev-set-discovered spec: policyRef: name: dev-conservative discoveryPolicyRef: name: dev-registry --- # === DiscoveryPolicy: healthy (Prometheus range query) === apiVersion: drop.corewire.io/v1alpha1 kind: DiscoveryPolicy metadata: name: dev-prometheus spec: sources: - type: prometheus prometheus: endpoint: "http://prometheus.e2e-infra.svc.cluster.local:9090" query: 'count(container_memory_working_set_bytes{container!="", container!="POD", namespace="build-stuff", pod=~"runner-.*"}) by (image)' lookback: 24h step: 5m syncInterval: 30s maxImages: 10 --- # === DiscoveryPolicy: healthy (registry tag listing) === apiVersion: drop.corewire.io/v1alpha1 kind: DiscoveryPolicy metadata: name: dev-registry spec: sources: - type: registry registry: url: "http://registry.e2e-infra.svc.cluster.local:5000" repositories: - "test/myapp" topX: 3 syncInterval: 30s maxImages: 10 --- # === DiscoveryPolicy: broken (DNS error → DNSError) === apiVersion: drop.corewire.io/v1alpha1 kind: DiscoveryPolicy metadata: name: test-broken-prom spec: sources: - type: prometheus prometheus: endpoint: "http://nonexistent-prometheus:9090" query: "up{}" syncInterval: 30m maxImages: 10 --- # === DiscoveryPolicy: broken (DNS error → DNSError) === apiVersion: drop.corewire.io/v1alpha1 kind: DiscoveryPolicy metadata: name: test-broken-registry spec: sources: - type: registry registry: url: "http://nonexistent-registry:5000" repositories: - "test/nope" syncInterval: 30m maxImages: 10 --- # === DiscoveryPolicy: broken (repo doesn't exist → NotFound) === apiVersion: drop.corewire.io/v1alpha1 kind: DiscoveryPolicy metadata: name: test-notfound-repo spec: sources: - type: registry registry: url: "http://registry.e2e-infra.svc.cluster.local:5000" repositories: - "this/does-not-exist" syncInterval: 30m maxImages: 10 ``` ## Build & Test ``` make help # Display this help. make build # Build manager binary. make run # Run controller from your host. make fmt # Run go fmt. make vet # Run go vet. make lint # Run golangci-lint. make lint-fix # Run golangci-lint with auto-fix. make generate # Generate DeepCopy methods. make manifests # Generate CRD and RBAC manifests. make sync-crds # Sync generated CRDs into Helm chart templates. make codegen # Run all code generation (deepcopy + CRDs + docs). make test # Run unit tests. make test-e2e # Run Chainsaw E2E tests (requires kind cluster). make kind-create # Create kind cluster for development. make kind-delete # Delete the kind cluster. make install # Install CRDs into cluster. make uninstall # Uninstall CRDs from cluster. make e2e-infra # Deploy Prometheus + Registry for E2E/dev. make docker-build # Build docker image. make docker-push # Push docker image. make kind-load # Build and load image into kind. make helm-lint # Lint the Helm chart. make helm-template # Render Helm templates locally. make docs-serve # Serve Hugo docs locally. make docs-gen # Regenerate AI agent docs (llms.txt, instructions, etc.) from source. make docs-gen-check # Verify generated AI docs are up to date. make tools # Install local tooling and check optional docs/chart binaries. ```