UNPKG

aiwg

Version:

Cognitive architecture for AI-augmented software development with structured memory, ensemble validation, and closed-loop correction. FAIR-aligned artifacts, 84% cost reduction via human-in-the-loop, standards adopted by 100+ organizations.

302 lines (217 loc) 9.8 kB
# REF-062: W3C PROV - The PROV Data Model ## Citation W3C (2013). PROV-DM: The PROV Data Model. W3C Recommendation 30 April 2013. **W3C Recommendation**: https://www.w3.org/TR/prov-dm/ **Overview**: https://www.w3.org/TR/prov-overview/ **Primer**: https://www.w3.org/TR/prov-primer/ ## Document Profile | Attribute | Value | |-----------|-------| | Year | 2013 | | Type | W3C Recommendation (Web Standard) | | Status | Stable, widely adopted | | AIWG Relevance | **Medium** - Provides standard provenance vocabulary for research artifact tracking | ## Executive Summary W3C PROV is a family of specifications for expressing provenance—the record of who created or modified data, when, and how. It defines three core types (Entity, Activity, Agent) and their relationships. For AIWG, PROV provides the standard vocabulary for tracking research artifact provenance. ### Key Insight > "Use of W3C PROV has been previously demonstrated as a means to increase reproducibility and trust of computer-generated outputs." **AIWG Implication**: Rather than inventing a provenance schema, AIWG should use PROV's proven vocabulary for tracking research operations. --- ## Core Concepts ### Three Core Types | Type | Definition | AIWG Examples | |------|------------|---------------| | **Entity** | Physical, digital, or conceptual thing | PDF, REF-XXX.md, citable claim | | **Activity** | Something that occurs over time | Acquisition, documentation, integration | | **Agent** | Something that bears responsibility | Research agent, human reviewer | ### Relationship Model ``` ┌─────────────┐ Entity │◄────────── Things (data, documents, artifacts) └─────────────┘ wasGeneratedBy / used ┌─────────────┐ Activity │◄────────── Actions (processes, transformations) └─────────────┘ wasAssociatedWith ┌─────────────┐ Agent │◄────────── Actors (people, software, organizations) └─────────────┘ ``` --- ## Core Relations ### Entity Relations | Relation | Meaning | AIWG Example | |----------|---------|--------------| | `wasDerivedFrom` | Entity created from another | Summary derived from PDF | | `wasGeneratedBy` | Entity produced by activity | REF-XXX.md generated by documentation | | `wasAttributedTo` | Entity attributed to agent | Document attributed to research-agent | | `alternateOf` | Different aspects of same thing | PDF vs. markdown versions | | `specializationOf` | More specific version | AIWG analysis vs. generic summary | ### Activity Relations | Relation | Meaning | AIWG Example | |----------|---------|--------------| | `used` | Activity used an entity | Documentation used PDF | | `wasInformedBy` | Activity used output of another | Integration informed by documentation | | `wasAssociatedWith` | Activity associated with agent | Acquisition associated with research-agent | ### Agent Relations | Relation | Meaning | AIWG Example | |----------|---------|--------------| | `actedOnBehalfOf` | Agent delegated by another | Research-agent acts on behalf of user | --- ## Key Findings for AIWG ### 1. Standard Vocabulary Enables Interoperability PROV provides a standard vocabulary that tools can understand. Custom provenance schemas create translation problems. **AIWG Implication**: Use PROV relation names even if not using full PROV serialization. ### 2. Entity-Activity-Agent Triangle Every provenance question maps to this pattern: - What was created? (Entity) - How was it created? (Activity) - Who created it? (Agent) **AIWG Implication**: Every provenance record should answer all three questions. ### 3. Derivation Chains PROV supports multi-step derivations: A B C D. **AIWG Implication**: Track the full chain from original paper to citable claim, not just direct relationships. --- ## AIWG Implementation Mapping | PROV Concept | AIWG Implementation | Rationale | |--------------|---------------------|-----------| | **Entity** | REF-XXX documents, PDFs, claims | Things we track | | **Activity** | Acquisition, documentation, integration | Operations we perform | | **Agent** | research-agent, human reviewers | Who performs operations | | `wasDerivedFrom` | Track summary PDF relationship | Enables verification | | `wasGeneratedBy` | Track document activity link | Enables audit | | `wasAssociatedWith` | Track activity agent link | Enables attribution | | `used` | Track activity input entities | Enables reproducibility | | **PROV-N notation** | Human-readable provenance logs | Debugging and audit | | **PROV-JSON** | Machine-readable provenance export | Interoperability | --- ## Specific AIWG Design Decisions Informed by PROV ### 1. Provenance Record Format **Decision**: Use PROV vocabulary in provenance records: ```yaml # .aiwg/research/provenance/op-2026-01-25-001.yaml provenance: entities: - id: "ref:REF-056" type: research-paper - id: "ref:REF-056-summary" type: summary wasDerivedFrom: "ref:REF-056" activities: - id: "act:documentation-001" type: documentation used: ["ref:REF-056"] generated: ["ref:REF-056-summary"] agents: - id: "agent:research-acquisition" type: software-agent wasAssociatedWith: ["act:documentation-001"] ``` ### 2. Human-Readable Provenance Log **Decision**: Maintain `.aiwg/research/provenance/operations.log` in PROV-N style: ``` 2026-01-25T10:00:00Z entity(ref:REF-056) wasAttributedTo(agent:acquisition) 2026-01-25T10:05:00Z activity(act:documentation) used(ref:REF-056) 2026-01-25T10:05:30Z entity(ref:REF-056-summary) wasGeneratedBy(act:documentation) 2026-01-25T10:05:30Z entity(ref:REF-056-summary) wasDerivedFrom(ref:REF-056) ``` **PROV Justification**: PROV-N is designed to be human-readable while remaining machine-parseable. ### 3. Derivation Chain Tracking **Decision**: Track full derivation chains, not just immediate relationships. Example chain: ``` Original Paper (REF-056) PDF (via download) REF-XXX.md summary (via documentation) Citable claim (via integration) Project document (via citation) ``` **PROV Justification**: PROV's `wasDerivedFrom` supports transitive derivation queries. ### 4. Agent Attribution **Decision**: Every operation records the responsible agent, including: - Agent type (software-agent, human) - Agent version (for software) - Agent identity (for humans) **PROV Justification**: `wasAssociatedWith` and `actedOnBehalfOf` enable full attribution. ### 5. Activity Timestamps **Decision**: All activities record start and end timestamps. **PROV Justification**: PROV activities have duration (startedAtTime, endedAtTime). --- ## Research Framework Application ### Provenance Query Examples PROV enables questions like: | Question | PROV Query Pattern | |----------|-------------------| | "What was this summary derived from?" | Find entity `wasDerivedFrom` | | "Who generated this artifact?" | Find entity `wasGeneratedBy` activity `wasAssociatedWith` agent | | "What activities used this paper?" | Find all activities where entity appears in `used` | | "Show the complete transformation chain" | Follow `wasDerivedFrom` transitively | ### Example: Research Acquisition Provenance ```yaml # Complete provenance for REF-056 acquisition entities: - id: "ref:wilkinson-2016-pdf" type: pdf location: "pdfs/full/REF-056-wilkinson-2016-fair.pdf" - id: "ref:REF-056-doc" type: reference-document location: "docs/references/REF-056-fair-guiding-principles.md" wasDerivedFrom: "ref:wilkinson-2016-pdf" activities: - id: "act:acquisition-2026-01-25" type: paper-acquisition startedAtTime: "2026-01-25T10:00:00Z" endedAtTime: "2026-01-25T10:02:00Z" used: - source_url: "https://www.nature.com/articles/sdata201618.pdf" generated: - "ref:wilkinson-2016-pdf" - id: "act:documentation-2026-01-25" type: reference-documentation startedAtTime: "2026-01-25T10:02:00Z" endedAtTime: "2026-01-25T10:15:00Z" wasInformedBy: "act:acquisition-2026-01-25" used: - "ref:wilkinson-2016-pdf" generated: - "ref:REF-056-doc" agents: - id: "agent:research-acquisition" type: software-agent version: "1.0.0" wasAssociatedWith: - "act:acquisition-2026-01-25" - "act:documentation-2026-01-25" ``` --- ## PROV Serializations | Format | Use Case | AIWG Usage | |--------|----------|------------| | **PROV-N** | Human documentation | Operations log | | **PROV-JSON** | API exchange | Export/import | | **PROV-O** | Semantic web | Not planned | | **PROV-XML** | Enterprise systems | Not planned | AIWG focuses on PROV-N (readable) and PROV-JSON (machine-processable). --- ## Key Quotes ### On reproducibility: > "Use of W3C PROV has been previously demonstrated as a means to increase reproducibility and trust of computer-generated outputs." ### On the data model: > "PROV-DM is the conceptual data model that forms a basis for the W3C provenance family of specifications." --- ## Cross-References | Paper | Relationship | |-------|-------------| | **REF-056** | FAIR R1.2 requires provenance; PROV provides implementation | | **REF-061** | OAIS PDI-Provenance category; PROV provides vocabulary | | **REF-058** | R-LAM recommends PROV for workflow reproducibility | --- ## Revision History | Date | Author | Changes | |------|--------|---------| | 2026-01-25 | Research Acquisition | Initial AIWG-specific analysis document |