Table of contents
- Introduction
- What is Provenance in C2PA?
- Design Goals of the C2PA Specification
- How it works
- How is Trust Established in Digital Assets Exactly
- Tech used for the C2PA
- C2PA Cryptography
- How can the C2PA specifically address the use of AI/ML for content creation artificially?
- Usecases of the C2PA
- The Caveats
- Sources
The C2PA, also known as the Content Provenance and Authenticity Standard, is gaining popularity. It is a set of guidelines that can be used to track the source of images and other online content. This standard is mainly used in the form of metadata, which is information that content can carry. Companies that provide Text to Image, Text to Video Generation software, and more are updating this metadata to show that the content has been generated using AI.
This standard could potentially play a major role in safeguarding end users from fake content and AI-driven content that is being rapidly iterated upon and published online and could potentially govern the Internet in the future.
This specification is already in widespread use across countries in the West and could soon be used by other countries in the East as well.
Introduction
The C2PA is a Linux Foundation Joint Development Foundation Project whose main motive is to introduce technical specifications and rules that can be used to set up provenance and authenticity at scale which proves to be crucial to allow the creators, publishers, and consumers of digital media to be able to track back its origin. A lot of companies, organizations, and companies are a part of the standard along with similar organizations to the Linux Foundation as well and the organizations include Sony, Microsoft, Adobe, Intel, and a lot more. All of these members, work together to establish these technical specifications under the C2PA Standard. It is important to note that the C2PA Specification does not provide a way to detect if a piece of content has the C2PA Specification or not.
The C2PA establishes guidelines for different companies to verify and track the authenticity of data being spread online, especially deepfakes. It is being used to address the challenge of trusting online media that are often manipulated or generated through artificial means.
The C2PA provides details on the authenticity of the digital media that users are viewing and provides a way for them to know the source of the content they are seeing which can help them distinguish between AI-generated content and Real content.
What is Provenance in C2PA?
Provenance stands for the data that is recorded when the tools used to synthetically modify or generate data are utilized. Provenance essentially refers to the facts about the history of a digital piece of content asset such as an image, video, audio recording, and more. This specification provides a way for the authors of the content online to securely bind statements or agreements of Provenance to their piece of content which is called assertions that includes information about who created the data and its history of edits throughout its shelf life.
The provenance data is cryptographically bound to the asset, making it practically impossible to alter or tamper with, and it can be used to establish trust, transparency, and authenticity by providing a verifiable record of the content's origin, authorship, and creation process.
Design Goals of the C2PA Specification
Some essential design goals were put in place while developing this sophisticated piece of standard that will possibly be responsible for the detection of a lot of fake content mainly generated through AI and other means online. Some of the goals erected were -
Maintain the metadata details about the asset across multiple tools that are responsible for both creation and modification.
Support all standard formats of digital content like images, videos, audio, documents, 3D, and more.
Create only the bare minimum piece of tech needed that is novel by relying on very well-established techniques
No need for any form of cloud storage
Allow a ton of flexibility in how the metadata details about the piece of content is shared
Allow the metadata information to be redacted by the author
How it works
The C2PA provides a label and a way to help consumers discern if the content that they are seeing has been generated using AI or not. Techniques like watermarking, cryptography, and mainly metadata [which is used a lot for images] are employed to verify the authenticity and to get the details about the source from where the data has been created/generated.
C2PA will provide you with a way to view the history of the changes that content has undergone through an AI system and also carries text that clearly states if a piece of content has been AI-generated or not.
How is Trust Established in Digital Assets Exactly
In this specification, the trust decisions are made by the consumer of the asset based on the content and the metadata they can view which has been exposed to the public from the source creator of the software that has been used to generate the piece of content. The C2PA emphasizes how crucial it is for every application capable of producing digital images to carry metadata stating that it has been created artificially and this is done through the use of a Certification Authority [These are similar to the Lock icon used on famous browsers to verify if a website uses HTTPS].
The use of a certification authority (CA) is important for ensuring the authenticity and provenance of digital assets. In simple terms, a CA acts like a trusted third party that verifies the identity of individuals or organizations on the internet. It does this by issuing digital certificates that confirm the ownership or control of a specific domain or digital resource. For example, before a secure website can be set up, the CA checks that the person or organization requesting the website's certificate owns or controls the site's domain name. This helps prevent unauthorized or fake websites from obtaining a secure certificate, thus enhancing online security and trust.
Tech used for the C2PA
The C2PA does not inherently use a lot of advanced technology like cloud and blockchain software and it instead mainly uses a trust model that is a model which provides a way for accessing cryptographically verifiable and tamper-evident information which is essential data that can be used to establish the 'trustworthiness' of a piece of content online.
Now, we will be discussing about the core components of the C2PA
The C2PA contains a bunch of statements, claims, credentials, and assertions which together form a verifiable piece of unit known as the Manifest which is used by a Claim Generator to generate the metadata required containing essential details about the source from where the content had been generated which contains the Claim Cryptographic Signature as well.
The C2PA Manifest in general includes information about the following -
Basic Media Information: File name, size, and format.
Creation and Edit History: A log about the software used, edit actions performed, and a brief history of all of the ways the piece of content has been modified.
Cryptographic Signatures: The C2PA uses cryptographic signing keys which prove to be the backbone of their trust model.
C2PA Cryptography
The C2PA uses cryptographic signing keys to sign the claim on an active manifest which is the main piece of component storing all of the metadata about the artifically generated content.
The identity of the source signing the signature is not necessarily a human and the identity presented can always be completely anonymous or can also often be from a hardware device taking on the device's identity and can also be a service or a trusted hardware component.
The C2PA Manifests can be validated as many times as required regardless of whether the cryptographic credentials have remained valid or not or have expired.
The C2PA Manifests are stored in the JUMBF ISO 19566-5 containers which lets you work with images using the common formats and many video file formats as well. This allows for the storage of multiple manifests referencing individual elements through the use of URIs, identifying parts of elements that can be hashed, and storing various types of content with different data formats.
How can the C2PA specifically address the use of AI/ML for content creation artificially?
Every single action that is performed by an actor who is an AI/ML model is recorded in the Asset's content credentials and this recording can also be performed through manual human intervention or an ML system of its own as well. And, after this, it is clearly labeled through a digitalSourceTypeField that has been recommended to be used by Google to label AI-generated content which essentially contains metadata about the source and provides additional information about this synthetically produced data. The C2PA can be used to analyze and identify even specific parts of text content that have been generated using Generative AI with clear watermarking around the specific piece of content as well.
Usecases of the C2PA
The C2PA has a variety of use cases, some of which can be,
Helping consumers verify the authenticity of the media they are consuming
For example, let's say user X shares a video with user Y containing alarming and potentially fake rumors that could prove dangerous if spread very rapidly. Y can then check the authenticity of this data through the Content Credentials of the data that will be readily available by any piece of software that has pledged to use the C2PA specification standard which he/she can then use to find out if the piece of content is legitimate or not.
Enhancing Clarity Around Provenance and Edits for Journalistic Work
A photojournalist could potentially use a special type of camera during a newsworthy event. The photos and videos taken can then be brought into a special editing program, and after editing, they are sent to a photo editor who makes more changes using the same special program. The final photos and videos are moved into the news organization's system before being posted on social media. All of the above steps will contain a step of logging essential details about the piece of content, like the amount of edits performed as well as the source of the edits using the Content Credentials enabled system which can in the form of metadata provide all of these details about the image that has been artificially edited.
- Allowing Publishers to Boost their Brand Value
The C2PA specification can be a boon for news publishers to establish trust between themselves and their consumers. Through the use of this specification, news outlets can guarantee their end readers that any piece of content news that they are publishing online is legitimate and hasn't been tampered with by any AI tools whatsoever. This proves to be very important, especially in the age where rumors due to the wide nature of the Internet can spread literally like wildfires, and having deepfaked news content can prove to be extremely detrimental and dangerous which this specification can safeguard news publishers from.
- Providing Quality Data for Data Content Decisions
A News video posted to social media that utilizes the Content Credentials enabled permission in the video will provide a way for the platform to verify if the content is truly legitimate or not, making it easier for the platform to take down any form of fake news and to preserve the integrity of the content posted.
- Preventing Body Shaming and Other Stereotypes through Images Published Artificially Online
The Content Credentials can also be used to track the number of edits that have been made to any form of content that has been published online. In recent times, there has been a lot of body shaming being done through the use of deepfakes attacking famous personnel by morphing and changing their body features. This Credential can help platforms and law organizations online to take down any fake content that is being used to propagate any form of attack or hate speech against a particular community.
The Caveats
While the C2PA standard will prove to be a boon in the age of fake content circulation which has only been accelerated with the advent of AI-based content software that can be used to set up content farms to generate content en masse, this standard is a step forward to transparency and is not a solution that works to solve all problems as companies will still need to adopt the metadata/watermarking provided by this standard into their software. The organizations partnered with the C2PA include some major partnerships and they hope to make the Internet a safe and secure place safeguarding you from any unauthentic content.
Right now, the C2PA is being used mainly for images/videos but, in the future, it will be adopted for text-based content as well.