tag:blogger.com,1999:blog-8082954141980125536.post6537452186467222437..comments2008-04-15T00:20:43.149-04:00Comments on research!rsc: Backups, heal thyselfrschttp://www.blogger.com/profile/06357099531993534337noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-8082954141980125536.post-72408571396225131462008-04-15T00:20:00.000-04:002008-04-15T00:20:00.000-04:002008-04-15T00:20:00.000-04:00I understand the birthday problem, but it could be...I understand the birthday problem, but it could be easily solved by moving from sha1 to sha256. This would be less space efficient when you were storing the trees, but that's not going to be the majority of your storage cost anyway. Doing this would make the chance of a collision insignificant for even the largest datasets. <BR/><BR/>As far as the overhead of using cryptographically strong hashes is concerned - I'm a lot more concerned about data integrity than a slightly slow backup, and it's always possible to move the hash calculation into hardware if it becomes advantageous to do so.jricherhttp://www.blogger.com/profile/04108383538978020010noreply@blogger.comtag:blogger.com,1999:blog-8082954141980125536.post-54697504074611255052008-04-14T21:51:00.000-04:002008-04-14T21:51:00.000-04:002008-04-14T21:51:00.000-04:00EMC's Centera uses the same concept, though it use...EMC's Centera uses the same concept, though it used MD5 hashes.Karthikhttp://www.blogger.com/profile/17888392276064618168noreply@blogger.comtag:blogger.com,1999:blog-8082954141980125536.post-80600946044177162442008-04-14T17:50:00.000-04:002008-04-14T17:50:00.000-04:002008-04-14T17:50:00.000-04:00the 2^-160 is a bit misleading. as the first pape...the 2^-160 is a bit misleading. as the first paper points out (search for "birthday") you're going to start getting collisions when you have around 2^80 blocks. that's still a lot of blocks, but *significantly* less than the 2^160 you might infer from what you wrote.andrew cookehttp://www.blogger.com/profile/11760508644619954982noreply@blogger.comtag:blogger.com,1999:blog-8082954141980125536.post-32410670292955144232008-04-14T15:34:00.000-04:002008-04-14T15:34:00.000-04:002008-04-14T15:34:00.000-04:00The Git version control system also stores content...The Git version control system also stores content as blobs, indexed by SHA-1 hashes.francois.beausoleilhttp://www.blogger.com/profile/17140800379422429193noreply@blogger.comtag:blogger.com,1999:blog-8082954141980125536.post-36800683443262267062008-04-14T14:16:00.000-04:002008-04-14T14:16:00.000-04:002008-04-14T14:16:00.000-04:00That's also how the StarTeam Native-II file reposi...That's also how the StarTeam Native-II file repository works. Stores full images of each version, compressed and indexed with MD5 hashes. So no reverse-delta penalty for diffing back versions of files.Craighttp://www.blogger.com/profile/02430615130349705538noreply@blogger.com