|
‽
|
So I'm running a Wiki, and a lot of the information-collection obviously relies on quoting other websites. But websites can go down, and WayBackMachine, Google's cache and such aren't particularly reliable or thorough. So what I'm looking at is a way to create such a mirror myself, of a particular section of a webpage relevant to an assertion on the Wiki. However, this poses a few problems on its own:
1) Copyright. I'm a bit on a loss on this. For example, DuggMirror merely says "Mirrored by [..] at [..]", with links to the original story and such. Similarly, Google's cache just says "This is G o o g l e's cache of [..] as retrieved on 17 Oct 2006 09:48:00 GMT. [..]'s cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time. Google is neither affiliated with the authors of this page nor responsible for its content." And Coral CDN prefixes absolutely nothing. Is there no need? Does mirroring something verbatim, leaving copyright notices and such intact, constitute no such problem at all? 2) Format. WebKit's preferred archive format is .webarchive, which is a binary property list concatenating all related files, which is sorta neat but incompatible with virtually every non-WebKit browser out there, so it's not very useful. Likewise, IE has a similar format, .mht (a faux e-mail using MIME multi-part concatenation — clever, but still unsupported by most other browsers). Gecko uses a bunch of folders and such, which is probably the best solution, aside from requiring multiple files. There's also, of course, third-party tools that make downloading such a page easier. But what I'd kind of prefer is some web tool that handles this, essentially offering the user, say, a zip archive of the result. Any such thing? Better ideas? |
|
|
quote |
|
‽
|
Hi, chucker, I found something!
Oh, cool, what is it? It's called hanzo:web (I have no idea what the colon is doing there either), and it uses Heritrix, the same tool also used by the Wayback Machine, with the differences that its archiving is on-demand, and can be enhanced with tags and such. Hm, that sounds good enough. Any responses about it on the web? Well, there's this jackass who apparently thinks that the point of putting something on the web is to keep it for yourself. Uh, that doesn't make any sense. Anyone else? Well, Wikipedia lists three different solutions in their link rot article, and this seems to be the only fitting one. Huh. So there isn't all that much feedback? Curiously enough, no. It doesn't even have its own Wikipedia page. And no known cases of Wiki-esque sites using it, either? Nope, nothing. Time to give it a try, then? Yup! |
|
|
quote |
|
Magnificent Basturd™ ![]() Join Date: May 2004
Location: Atlanta
|
I love Chucksterpiece Theatre.
As far as citation, Google's seems the most CYA of the bunch. As far as that guy being a Jackass, I see his concern for someone scraping and reusing his content without his permission. I think we're all guilty of doing that with funny pictures we find on the web.... but for someone to use YOUR content on THEIR page and to make money from it is galling. The idea of preserving a "snapshot" of the web is a good one - the trick is how you achieve it without stealing intellectual property for-profit. Interesting topic. I wonder what Chucker has to say about it? ![]() |
|
|
quote |
|
‽
|
Quote:
Not that anyone has an obligation to answer. But I'd really appreciate even the tiniest bits of help. Yeah, but they don't keep snapshots, and they occasionally break a page's layout to the point where it becomes unusable. (Google's markup is a extraordinarily horrifying, too.) Quote:
As for whether hanzo:web honors the no-archive meta tag, I'm unsure about that; their FAQ needs some work. The blogger addresses that hanzo:web ignores robots.txt; indeed it does. As the terms state: Quote:
WebCite's FAQ has a section on this intellectual property matter. Last edited by chucker : 2006-10-21 at 08:43. |
|||
|
|
quote |
|
Magnificent Basturd™ ![]() Join Date: May 2004
Location: Atlanta
|
Quote:
Dig? Edit: just saw the link to WebCite.... reading now. |
|
|
|
quote |
|
‽
|
Quote:
(I guess you could argue that it's a question of trust, but a company that appears at an O'Reilly conference is certainly very unlikely to risk such unethical practices. They'd be flamed by geeks around the world like there's no tomorrow.) |
|
|
|
quote |
|
‽
|
I suppose the big flaw, at least right now, is the backlog they appear to have, as they admit on their news section.
I hope it doesn't take all too long. |
|
|
quote |
|
New Member
Join Date: Oct 2006
|
Quote:
What part of copyright eludes you? Whether it's reposted for profit or not, it's MINE, and has NOARCHIVE and NOCACHE on every page and only 5 search engines are even allowed to crawl the site and they aren't allowed to cache the pages either, nor is the Internet Archive. Just because I put something online publicly doesn't give anyone entitlement to violate my copyright and copy my material to any other website without permission, plain and simple. It's posted for people to read, and I assume a file might be saved or printed for personal use, but reposting on another website without permission will earn you a DMCA notice. Besides, since when was asking permission for using something so hard? I've been known to grant permission when asked and I've been known to send a lawyer when taken without asking a few times as well. If a website says their content is free to use with a GPL like the Wiki or the article farms, then feel free to do with as you want. If the website says "Copyright (c), All Rights Reserved", then expect trouble if you violate it. That's simple now isn't it? |
|
|
|
quote |
|
‽
|
You registered here just so you could respond to some obscure forum post?Quote:
The whole point of running a website (or really any other kind of publication) is to share information, i.e. to partially surrender the notion of "it's MINE". Quote:
What are you going to do if I save a webpage of yours, burn it on CD and distribute it at a parking lot? How about, um, be happy that I've found your information so valuable that I went through the hurdles of distributing it? Quote:
Quote:
|
||||
|
|
quote |
|
Formerly Roboman, still
![]() Join Date: Jul 2004
Location: on twitter! @werejack
|
Methinks IncrediBILL should probably focus on having something to say worth stealing before he worries about people stealing it.
![]() |
|
|
quote |
|
New Member
Join Date: Oct 2006
|
When did I say GPL'ing relinquished copyright? I said if the website offers unlimited usage freely for any piurpose, then go for it. If they don't, you're just playing with fire.
Regardless of the location of the host, many of the search engines indexing that content ARE located in the US, such as Google, and they have to comply with a DMCA request. Furthermore, most countries have reciprocal copyright agreements with the US and will comply with requests to remove unauthorized material. Quote:
Try taking a book or magazine, which shares information, from a bookstore without paying for it and see if you don't end up in jail for shoplifting. Try plagiarizing a published book and slapping your own name on it and republishing it as your own and see what happens, which is nothing different than lifting content from a website and republishing it. The point I made which you overlooked is PERMISSION, asking PERMISSION can be an amazing thing and keep people out of court. Last edited by IncrediBILL : 2006-10-22 at 02:15. |
|
|
|
quote |
|
‽
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||||
|
|
quote |
|
Not a tame lion...
|
How very interesting...
Obviously the question of attribution is a no-brainer, you can't take someone elses work and call it your own. But what about properly attributed reproduction without permission? Technically, the information is re-copied every time it hits a router or switch on its way to the consumer, and I doubt the owners of that hardware have personally asked anyones permission before reproducing their protected material. EDIT: Thanks for that link to the Google case Chucker, answered a couple of questions. |
|
|
quote |
|
Formerly Roboman, still
![]() Join Date: Jul 2004
Location: on twitter! @werejack
|
Wouldn't suing people who mirrored your website to make it more accessible defeat the purpose of putting it on the internet in the first place?
|
|
|
quote |
|
‽
|
|
|
|
quote |
|
Formerly Roboman, still
![]() Join Date: Jul 2004
Location: on twitter! @werejack
|
I know. But I guess I'm arguing less of a "Can you sue mirror-ers?" debate and more of a "Why the hell would you want to?" one.
If any of my sites had so much traffic that I needed people to mirror it, I would thank my lucky stars. And I'd thank the mirror-ers, too. I wouldn't sue them. ![]() cue the lights and dim the stars |
|
|
quote |
|
BANNED
I am worthless beyond hope. Join Date: Jul 2004
Location: Washington, DC
|
I'm pretty sure it's been said already, but nobody here has advocated claiming the mirror as their own website.
|
|
|
quote |
|
is the next Chiquita
Join Date: Feb 2005
|
OT
Observe inverse relationship between a perception of greatness in one's screenname and one's post quality. /OT |
|
|
quote |
|
BANNED
I am worthless beyond hope. Join Date: Jul 2004
Location: Washington, DC
|
Okay. Comparing this to purchasing books is not a valid comparison. Purchasing books means that someone pays money. No money changes hands to view IncrediBILL's website, so no money is being lost if it were hosted somewhere else.
This is more like handing out free pamphlets, then somebody coming along and publishing the exact same pamphlet, credits and all, and also distributing it for free, all for the purpose of preserving its information and spreading the word. I think it would at least be common courtesy to ask for permission. What if the original author can't be reached, though? |
|
|
quote |
|
Member
Join Date: Sep 2006
Location: Washington, D.C.
|
Quote:
|
|
|
|
quote |
|
BANNED
I am worthless beyond hope. Join Date: Jul 2004
Location: Washington, DC
|
Banner ads? I forgot that they even existed...
I take it that those ads trace their clicks to the server that's hosting the site? |
|
|
quote |
|
Member
Join Date: Sep 2006
Location: Washington, D.C.
|
Quote:
I understand there are exceptions for caches, search engines, etc., but I'm responding more to the assertion that people give away their exclusive distribution rights altogether when they put something up for free on the web. It's just not true. |
|
|
|
quote |
|
‽
|
Quote:
Fine, call it common courtesy, but if I were to ask someone for permission if I could help their campaign's cause by distributing the free pamphlets, using my own resources (by printing them and handing them out, which costs money and time), how could they possibly have any other answer than "um, yeah? Go ahead, duh?" Quote:
Quote:
|
|||
|
|
quote |
|
‽
|
Quote:
1) ads being presented 2) users actually seeing (not filtering) the ads 3) users actually responding (clicking) the ads (because, otherwise, the ads couldn't possibly be a sustainable model!), then the site can't possibly survive in this day and age anyway. Even the most mainstream of browsers has a popup filter, thereby effectively removing a vast majority of ads. For virtually every browser, there's ad filters. Not to mention browser-unrelated filters, as well as those provided through channels such as the ISP or the router. If a site is that complex, one should use a subscription-based model, with guaranteed and honest (the users actually care to give the site money, rather than merely tolerating and trying hard to ignore the ads) money coming in. Others' finances aren't my business, of course, but ads have never, ever been something that can single-handedly be relied upon. People ignore, people filter, people are annoyed. There are no people that actually think "hey, an ad! cool, I will support the site by clicking on it, and actually be interested in the advertised product as well!". It doesn't happen. At best, an ad is funny and intriguing and thus worth watching, but that doesn't necessite interest in the actual product. Quote:
|
||
|
|
quote |
|
New Member
Join Date: Oct 2006
|
Quote:
Quote:
User-agent: * Disallow: / That means I don't even need to know Hanzo exists as I've already stated that my content is off limits. Quote:
Quote:
Archiving has caused all sorts of issues where people have been sued to remove trademarks from their website after getting a demand letter from a corporation, yet still being sued after complying because the trademarks still existed in their content on the Internet Archive and Google cache as you were technically still infringing although removing the material in a timely manner was beyond your ability to control. The bottom line is still the fact that other than my website, nobody has the right to distribute the material without consent unless the website states otherwise and limited access to some companies is stated in robots.txt. Companies or anyone else that deploys automated tools that ignore robots.txt deserve to get whatever pain may come their way. Beyond copyright violation and blatant disregard for robots.txt, they are also violating my site usage license which clearly states that automation may not be used to collect and redistribute the information so I don't even need to prove copyright infringement. I can just produce a log file showing a few thousand pages were pulled down in a few minutes, something a human at a browser can't do, to prove that the site license was violated, a simple violation of terms and conditions and general purpose contract law to the rescue ![]() Last edited by IncrediBILL : 2006-10-22 at 20:23. |
||||
|
|
quote |
|
Veteran Member
Join Date: May 2004
Location: Tejas
|
So, IncrdiBILL, why don't you want anyone archiving you're site? What do you gain from not allowing that?
|
|
|
quote |
|
‽
|
Quote:
Quote:
Quote:
Anyway, let me know when you can give me a good case where you have a website that you want publicly available, yet not archived. For there isn't one. It's completely absurd. |
|||
|
|
quote |
|
New Member
Join Date: Oct 2006
|
Quote:
What do they hope to gain? Actually I think I addressed one part, the legal ramifications, in my last post as people have sued over content that was corrected on their website yet remained visible in the Wayback Machine and Google cache. If you want to achive it on your personal computer for your person use, I can deal with that. If you publish that archive or make it otherwise publicly accessible, then we have issues as once you lose control over your copyight, it's harder to defend it down the road. Try some of your arguments out archiving some website like Corbis or Getty image banks and then make it publicly accessible. Sit back and see how long it takes before a swat team of lawyers doesn't own whoever does it in no time at all. Basically, it's my intellectual property to do with as I please and making it freely available on the internet doesn't give anyone else the authority to reproduce it, even in a free archive or mirror site, is that so hard to comprehend? Perhaps those of you that haven't invested many years developing an online property don't get it but you can rest assured many others that invested time on their intellectual property feel the same. If I want the site publicly archived and mirrored, I'll personally submit it to be archived, If I don't want it publicly archived, then respect my decision as it's not anyone else's decision to make but mine. How simple is that? |
|
|
|
quote |
|
New Member
Join Date: Oct 2006
|
Quote:
You can say whatever you want but it doesn't make your point valid as there has never been any law that makes intellectual property theft valid, for any reason, ever, and thinking otherwise is absurd. FWIW, Hanzo web was blocked in my firewall the minute I figured out they ignored robots.txt. Problem solved. |
|
|
|
quote |
|
Mother Father Gentleman
Join Date: Oct 2005
Location: Xenia, Ohio
|
Did you overdose on Viagra or something? Because you seem to be a big dick.
|
|
|
quote |
| Posting Rules | Navigation |
|
|
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Go Spain: new attitude towards female models | Moogs | AppleOutsider | 53 | 2006-09-18 12:50 |
| Some websites stop working in Safari, not in FF | newt | Genius Bar | 2 | 2005-03-16 19:52 |