IPFS itself does not use many new technologies, most of which are existing technologies and ideas, but it cannot be said that there is no innovation, similar to Bitcoin.
dht
Ideas: KadDHT's node addressing and content addressing are isomorphic. On the basis of KAD calculating logical distance based on XOR, the specific physical distance such as delay (Coral DHT) is added.S-KadDHT adds some resistance to witch attacks because generating a public-private key requires calculating the difficulty value.
Git
The underlying data structure and organization of IPFS are strongly influenced by Git. First, Git is a content-based addressing file system with four objects: blob, tree, commit, tag.Blob objects store various types of files. When you add a blob, its hash is calculated by the sha1 algorithm. The first two are folders, followed by the file name.
Tree can point to other trees or blob s, etc. Links to relationships are made with DAG, and there is no reason to have rings. It is not possible to modify git and find that you have changed to the original version.
//Verify Add blob $echo "hello,world" > 1.txt $git hash-object -w 1.txt 2d832d9044c698081e59c322d5a2a459da546469 $echo "hello,world." > 1.txt $git hash-object -w 1.txt ec03970d236871dd491c61683f3074d3038b4f6e $find .git/objects -type f //View stored files.git/objects/2d/832d9044c698081e59c322d5a2a459da546469.git/objects/ec/03970d236871dd491c61683f3074d3038b4f6e #Check Add tree $git update-index --add --cacheinfo 100644 ec03970d236871dd491c61683f3074d3038b4f6e test $git write-tree b0de348e9cb3a01b47154a4331678787e29afdff #Add a new file $echo "new file">new.txt $git hash-object -w new.txt fa49b077972391ad58037050f2a75f74e3671e92 $git update-index --add --cacheinfo 100644 fa49b077972391ad58037050f2a75f74e3671e92 test1 $git write-tree 7b452990f69f7751ba805a291549c32dbfeb8716 #Compare the contents of two trees $git cat-file -p b0de348e9cb3a01b47154a4331678787e29afdff 100644 blob ec03970d236871dd491c61683f3074d3038b4f6e test $git cat-file -p 7b452990f69f7751ba805a291549c32dbfeb8716 100644 blob ec03970d236871dd491c61683f3074d3038b4f6e test 100644 blob fa49b077972391ad58037050f2a75f74e3671e92 test1
IPFS also designed the corresponding objects, blob, list, tree, commit, and also organized with DAG.
SFS Self-Authentication File System
In order to publish dynamic content, a mapping mechanism is required to dynamically map a name to an IPFS hash.In the world of HTTP, PKI needs a certificate issued by a trusted authority to maintain, but because IPFS's ambition is to be decentralized, it does not design a trusted authority, but uses the SFS scheme, which puts information such as public key and private key signature into this mapping, that is, the file name itself contains public key information, so it can be trusted.Present key distribution.
IPFS designed IPNS based on this idea.
Block
Block is a very basic abstraction in IPFS
// ipfs/go-block-format/blocks.go // Block provides abstraction for blocks implementations. type Block interface { RawData() []byte Cid() cid.Cid String() string Loggable() map[string]interface{} } // A BasicBlock is a singular block of data in ipfs. It implements the Block interface. type BasicBlock struct { cid cid.Cid data []byte } // ipfs/go-ipfs/core/coreapi/block.go //The basic put block method simply takes a hash and saves it. func (api *BlockAPI) Put(ctx context.Context, src io.Reader, opts ...caopts.BlockPutOption) (coreiface.BlockStat, error) { settings, pref, err := caopts.BlockPutOptions(opts...) if err != nil { return nil, err } data, err := ioutil.ReadAll(src) if err != nil { return nil, err } bcid, err := pref.Sum(data) if err != nil { return nil, err } b, err := blocks.NewBlockWithCid(data, bcid) if err != nil { return nil, err } if settings.Pin { defer api.blockstore.PinLock().Unlock() } err = api.blocks.AddBlock(b) if err != nil { return nil, err } if settings.Pin { api.pinning.PinWithMode(b.Cid(), pin.Recursive) if err := api.pinning.Flush(); err != nil { return nil, err } } return &BlockStat{path: path.IpldPath(b.Cid()), size: len(data)}, nil }
Object
type Link struct { Name, Hash string Size uint64 } type Node struct { Links []Link Data string }
# Test that data is a folder with two subfolders $ipfs add data -r added QmXkiQUQ5xkC471K29MsL3tKHb9dsFgXernV9hbr4vDP3C data/1/testobject added QmREfzcCsuHpzUvWUEzgiqHSNpiKnc18s4RfQXqvW669E8 data/2/1.txt added QmZFYyMVQMXbis5spJkifRtoPoPz9pNo5NyiVSo9UhWTLa data/2/11.pdf added QmcN88vyVJwfLRreojfT7qdVezShxF1JMCLQBxd3TwL3uz data/1 added QmV14YGVhYS1WyCvmmTnnFzJF4zC587LMCnufjox5r7zHd data/2 added QmdMwuSVU3NKQUy4YgxKWcAm9z6fWXYb6ampbWC442nBgE data 8.99 MiB / 9.00 MiB [====================================================] 99.96% // Looking at links, you can see that it points to two subfolders $ipfs object links QmdMwuSVU3NKQUy4YgxKWcAm9z6fWXYb6ampbWC442nBgE QmcN88vyVJwfLRreojfT7qdVezShxF1JMCLQBxd3TwL3uz 82 1 QmV14YGVhYS1WyCvmmTnnFzJF4zC587LMCnufjox5r7zHd 9432929 2
DAG
As mentioned above, ipfs have data types such as blob, list, tree, commit, and so on.Or take the folder add ed above for example.
$ipfs dag get QmdMwuSVU3NKQUy4YgxKWcAm9z6fWXYb6ampbWC442nBgE {"data":"CAE=", "links": [ {"Cid":{"/":"QmcN88vyVJwfLRreojfT7qdVezShxF1JMCLQBxd3TwL3uz"},"Name":"1","Size":82}, {"Cid":{"/":"QmV14YGVhYS1WyCvmmTnnFzJF4zC587LMCnufjox5r7zHd"},"Name":"2","Size":9432929} ] } $ipfs dag get QmcN88vyVJwfLRreojfT7qdVezShxF1JMCLQBxd3TwL3uz {"data":"CAE=", "links": [{"Cid":{"/":"QmXkiQUQ5xkC471K29MsL3tKHb9dsFgXernV9hbr4vDP3C"},"Name":"testobject","Size":26}]}
Code
// Link represents an IPFS Merkle DAG Link between Nodes. type Link struct { // utf string name. should be unique per object Name string // utf8 // cumulative size of target object Size uint64 // multihash of the target object Cid cid.Cid }
The DAG recursively generates a node and becomes a link to the upper node until the end.
ipfs support two types of dags, one balanced dag for random access and the other trickledag for sequential access
var nd ipld.Node if adder.Trickle { nd, err = trickle.Layout(db) } else { nd, err = balanced.Layout(db) }
For example, balanced divides the data recursively into nodes on the DAG, and the fillNodeRec function below is recursive.
// Each time a DAG of a certain `depth` is filled (because it // has reached its maximum capacity of `db.Maxlinks()` per node) // extend it by making it a sub-DAG of a bigger DAG with `depth+1`. for depth := 1; !db.Done(); depth++ { // Add the old `root` as a child of the `newRoot`. newRoot := db.NewFSNodeOverDag(ft.TFile) newRoot.AddChild(root, fileSize, db) // Fill the `newRoot` (that has the old `root` already as child) // and make it the current `root` for the next iteration (when // it will become "old"). root, fileSize, err = fillNodeRec(db, newRoot, depth) if err != nil { return nil, err } }
IPNS
$ipfs add helloworld added QmVtZPoeiqpREqkpTTNMzXkUt74SgQA4JYMG8zPjMVULby helloworld 12 B / ? [-----------------------------------------------------] $ipfs name publish QmVtZPoeiqpREqkpTTNMzXkUt74SgQA4JYMG8zPjMVULby Published to QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6: /ipfs/QmVtZPoeiqpREqkpTTNMzXkUt74SgQA4JYMG8zPjMVULby #Resolve IPNS $ipfs name resolve QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6 /ipfs/QmVtZPoeiqpREqkpTTNMzXkUt74SgQA4JYMG8zPjMVULby #View the content of IPNS $ipfs cat /ipns/QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6 hello,world #Modify file content to republish $echo "ipfs" >> helloworld $ipfs add helloworld added Qmb8sAnXHQYpuAwCTFmHvgz2P5AGJH4y5uVosSdKzcHSbN helloworld 17 B / 17 B [=====================================================] 100.00% $ipfs name publish Qmb8sAnXHQYpuAwCTFmHvgz2P5AGJH4y5uVosSdKzcHSbN Published to QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6: /ipfs/Qmb8sAnXHQYpuAwCTFmHvgz2P5AGJH4y5uVosSdKzcHSbN #View the content of IPNS $ipfs cat /ipns/QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6 hello,world ipfs //It's actually the node ID $ipfs id -f="<id>\n" QmdcFVhY4m3NyYfiMbosr3ZbW9EHCkcBk4xySt4uQozXb6
Take a look at the code implementation, but it's still understandable
// Create creates a new IPNS entry and signs it with the given private key. // // This function does not embed the public key. If you want to do that, use // `EmbedPublicKey`. func Create(sk ic.PrivKey, val []byte, seq uint64, eol time.Time) (*pb.IpnsEntry, error) { entry := new(pb.IpnsEntry) entry.Value = val typ := pb.IpnsEntry_EOL entry.ValidityType = &typ entry.Sequence = &seq entry.Validity = []byte(u.FormatRFC3339(eol)) //autograph sig, err := sk.Sign(ipnsEntryDataForSig(entry)) if err != nil { return nil, err } entry.Signature = sig return entry, nil } //Verify IPNS Entry // Validates validates the given IPNS entry against the given public key. func Validate(pk ic.PubKey, entry *pb.IpnsEntry) error { // Check the ipns record signature with the public key if ok, err := pk.Verify(ipnsEntryDataForSig(entry), entry.GetSignature()); err != nil || !ok { return ErrSignature } eol, err := GetEOL(entry) if err != nil { return err } if time.Now().After(eol) { return ErrExpiredRecord } return nil } //Embedding public key information // EmbedPublicKey embeds the given public key in the given ipns entry. While not // strictly required, some nodes (e.g., DHT servers) may reject IPNS entries // that don't embed their public keys as they may not be able to validate them // efficiently. func EmbedPublicKey(pk ic.PubKey, entry *pb.IpnsEntry) error { // Try extracting the public key from the ID. If we can, *don't* embed // it. id, err := peer.IDFromPublicKey(pk) if err != nil { return err } if _, err := id.ExtractPublicKey(); err != peer.ErrNoPublicKey { // Either a *real* error or nil. return err } // We failed to extract the public key from the peer ID, embed it in the // record. pkBytes, err := pk.Bytes() if err != nil { return err } entry.PubKey = pkBytes return nil }
Kad Addressing
kad's ideas were discussed earlier, so let's look at the core code.
//Functions for calculating distances func CommonPrefixLen(a, b ID) int { return ks.ZeroPrefixLen(u.XOR(a, b)) } // Find the nearest node information in the bucket // NearestPeers returns a list of the 'count' closest peers to the given ID func (rt *RoutingTable) NearestPeers(id ID, count int) []peer.ID { cpl := CommonPrefixLen(id, rt.local) // It's assumed that this also protects the buckets. rt.tabLock.RLock() // Get bucket at cpl index or last bucket var bucket *Bucket if cpl >= len(rt.Buckets) { cpl = len(rt.Buckets) - 1 } bucket = rt.Buckets[cpl] pds := peerDistanceSorter{ peers: make([]peerDistance, 0, 3*rt.bucketsize), target: id, } pds.appendPeersFromList(bucket.list) if pds.Len() < count { // In the case of an unusual split, one bucket may be short or empty. // if this happens, search both surrounding buckets for nearby peers if cpl > 0 { pds.appendPeersFromList(rt.Buckets[cpl-1].list) } if cpl < len(rt.Buckets)-1 { pds.appendPeersFromList(rt.Buckets[cpl+1].list) } } rt.tabLock.RUnlock() // Sort by distance to local peer pds.sort() if count < pds.Len() { pds.peers = pds.peers[:count] } out := make([]peer.ID, 0, pds.Len()) for _, p := range pds.peers { out = append(out, p.p) } return out } //Update Node Information // Update adds or moves the given peer to the front of its respective bucket func (rt *RoutingTable) Update(p peer.ID) (evicted peer.ID, err error) { peerID := ConvertPeerID(p) cpl := CommonPrefixLen(peerID, rt.local) rt.tabLock.Lock() defer rt.tabLock.Unlock() bucketID := cpl if bucketID >= len(rt.Buckets) { bucketID = len(rt.Buckets) - 1 } bucket := rt.Buckets[bucketID] if bucket.Has(p) { // If the peer is already in the table, move it to the front. // This signifies that it it "more active" and the less active nodes // Will as a result tend towards the back of the list bucket.MoveToFront(p) return "", nil } //Guaranteed delay requirement if rt.metrics.LatencyEWMA(p) > rt.maxLatency { // Connection doesnt meet requirements, skip! return "", ErrPeerRejectedHighLatency } // We have enough space in the bucket (whether spawned or grouped). if bucket.Len() < rt.bucketsize { bucket.PushFront(p) rt.PeerAdded(p) return "", nil } if bucketID == len(rt.Buckets)-1 { // if the bucket is too large and this is the last bucket (i.e. wildcard), unfold it. rt.nextBucket() // the structure of the table has changed, so let's recheck if the peer now has a dedicated bucket. bucketID = cpl if bucketID >= len(rt.Buckets) { bucketID = len(rt.Buckets) - 1 } bucket = rt.Buckets[bucketID] if bucket.Len() >= rt.bucketsize { // if after all the unfolding, we're unable to find room for this peer, scrap it. return "", ErrPeerRejectedNoCapacity } bucket.PushFront(p) rt.PeerAdded(p) return "", nil } return "", ErrPeerRejectedNoCapacity }
Bitswap
Since IPFS is a global seed, there are naturally some strategies to prevent people from hitching too heavily, so there is bitswap, which maintains ledger s, records the reception and reception of neighboring nodes (wantlist and havelist), and makes decisions (engine under bitswap).
The code is more complex, there is too much asynchronous logic, so roughly speaking, there is no code to paste. You can talk about the code at that time.
The logic is as follows: when a node wants to get a heap of blocks, it first looks at it locally to see if there is a session created without entering bitswap, and then subscribes to the block in the pub/submechanism of bitswap (which writes in a block channel).Then the cid of these blocks is added to the wantlist and the worker of bitswap starts to work. Many things are done asynchronously during this time. First, FindProviderAsync of the contentRouting of the p2p layer is used periodically to find the node that owns the block. Then bitswap constantly fetches the ID of the provider and sends a request to it. If the block returns, checkCheck the integrity with the hash value, then publish the block through the pub/sub channel, publish the past through the channel, and the upper logic, the other end of the channel, returns the result.
Beyond IPFS-Block Chain Storage
IPFS can not solve the problem, a small number of nodes use love to generate electricity, most nodes take a ride, without economic incentives, bitswap mechanism alone can not be truly complete.
P2P is stored in an age without block chains and can only be used for file sharing, as well as for CDN s built in by some video companies (note that video websites account for your upload bandwidth, but they don't pay you a fee, and you have to pay for watching videos instead).
Thus, the protocol lab proposed the Filecoin project, which aims to make all participants profitable, both in terms of storage space and bandwidth, and to provide users with safer services at lower prices.
Block chain storage is the best place to land relative to other projects, because the content stored and distributed, in a sense, is numeric and can be mathematically and cryptographically validated, while other block chain items, as long as they are linked to the real economy, are difficult to interact with.
IPFS | Filecoin | |
---|---|---|
category | Decentralized file system (no motivation) | Decentralized storage network |
function | Content Addressing Based Distributed Storage Infrastructure | Incentive Layer Over IPFS Networks Provides a Free Trade Market in Cloud Storage |
Storage Permissions | Have storage rights on IPFS nodes that have ownership | 1. In addition to having storage rights on IPFS nodes with ownership 2. You can also have storage rights on top of their vendor's nodes by means of payment |
Read permissions | ALL (as long as you know cid) | ALL (as long as you know cid) |
Two Markets
Storage Market
Essentially, it's a de-centralized cloud storage where users rent storage space for miners through Filecoin's Token and miners keep submitting storage certificates where they sell and purchase storage space.
Retrieve Market
It can be understood as a de-centralized CDN service where users retrieve data from miners and return it through Token, where bandwidth and traffic are purchased for sale.
Storage certificate
How do I prove that miners store data?
How do you prove that miners store redundant data?
How do you consistently prove that miners store data?
What is the consensus on block chains?
Basic Storage Certificate (see PPT)
1, local memory slice hash
2, store the Merkle tree root locally, and distribute the file fragmentation and Merkle tree to the storage party.
3. Use the homologous Hash function to generate a homologous hash of a data slice and send it to the storage party.
Why don't the three above apply to Filecoin?
Consider dimensions: communication complexity?Computing complexity?Can you resist three attacks (witch attack, generate attack, foreign attack)?Whether the user and the storage party can conspire, such as using homologous hash, then the storage party can arbitrarily brush the data to generate proof to seize the block rights.
Filecoin uses a storage side to encode a sector (sector) when the storage space has a sector size. The zigzag depth robust graph shows that the encoding of a file block within a sector may be related to several other file blocks at the same time. Then VDE is done once, so that from left to right, from right to left encoding multiple times, one at a time.Layer, z-glyph, becomes zigzag, if one of them is deleted, and many other blocks are needed at the same time, resulting in a high cost of attack generation, where VDE (Verifiable Delay Encoding) is used, and finally zksnark zero knowledge proof is used.
Continuous replication proof implements space-time proof, so space-time proof and replication proof actually work together.
However, at this point, the space-time proof is only proof of storage space, no block rules have been designed, and filecoin uses a VRF-like consensus algorithm (originally the white paper was POW).
Consensus uses VRF-like secret leader elections, calculates hash of an input less than a target, but different storage miners need different targets, which are proportional to the storage power of the entire network, and the higher the proportion of storage certificates generated, the more likely the burst is.So in the early days, when filecoin was in short supply, miners could brush the data themselves to increase the blasting rate.
$$ H(<t||rand(t)>_{M_i})/2^L \leq p_i^t/ \sum_j p_j^t $$
H-hash function
rand(t): a random number at time t, a sequence of continuous random numbers can be generated using VDF.
$M_i$: Sign it
$2^L$: Keep the final value between 0 and 100%
The other side is the ratio of storage computing power to total network computing power, that is, the greater the proportion, the greater the opportunity, a certain POS.
After the code is abstracted, as follows,
func Mine(minerKey PrivateKey) { for r := range rounds { // for each round bestTipset, tickets := ChainTipsMgr.GetBestTipsetAtRound(r - 1) ticket := GenerateTicket(minerKey, bestTipset) tickets.Append(ticket) // Generate an ElectionProof and check if we win // Note: even if we don't win, we want to make a ticket // in case we need to mine a null round win, proof := CheckIfWinnerAtRound(minerKey, r, bestTipset) if win { GenerateAndBroadcastBlock(bestTipset, tickets, proof) } else { // Even if we don't win, add our ticket to the tracker in // case we need to mine on it later. ChainTipsMgr.AddFailedTicket(bestTipset, tickets) } } }
Check to see if you are selected.
const RandomnessLookback = 1 // Also referred to as "K" in many places func CheckIfWinnerAtRound(key PrivateKey, n Integer, parentTipset Tipset) (bool, ElectionProof) { lbt := ChainTipsMgr.TicketFromRound(parentTipset, n-RandomnessLookback) eproof := ComputeElectionProof(lbt, key) minerPower := GetPower(miner) totalPower := state.GetTotalPower() if IsProofAWinner(eproof, minerPower, totalPower) return True, eproof else return False, None }
Question: Because it is a secret leader election, this consensus is the expected consensus. It is possible that no one will be selected in one round or that many people will be selected (tipset will be packaged in multiple blocks). At present, it is not perfect.