User Name
Password
AppleNova Forums » General Discussion »

Mirroring websites at risk of dying


Register Members List Calendar Search FAQ Posting Guidelines
Mirroring websites at risk of dying
Page 1 of 2 [1] 2  Next Thread Tools
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-20, 09:46

So I'm running a Wiki, and a lot of the information-collection obviously relies on quoting other websites. But websites can go down, and WayBackMachine, Google's cache and such aren't particularly reliable or thorough. So what I'm looking at is a way to create such a mirror myself, of a particular section of a webpage relevant to an assertion on the Wiki. However, this poses a few problems on its own:

1) Copyright. I'm a bit on a loss on this. For example, DuggMirror merely says "Mirrored by [..] at [..]", with links to the original story and such. Similarly, Google's cache just says "This is G o o g l e's cache of [..] as retrieved on 17 Oct 2006 09:48:00 GMT.
[..]'s cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time. Google is neither affiliated with the authors of this page nor responsible for its content." And Coral CDN prefixes absolutely nothing. Is there no need? Does mirroring something verbatim, leaving copyright notices and such intact, constitute no such problem at all?

2) Format. WebKit's preferred archive format is .webarchive, which is a binary property list concatenating all related files, which is sorta neat but incompatible with virtually every non-WebKit browser out there, so it's not very useful. Likewise, IE has a similar format, .mht (a faux e-mail using MIME multi-part concatenation — clever, but still unsupported by most other browsers). Gecko uses a bunch of folders and such, which is probably the best solution, aside from requiring multiple files. There's also, of course, third-party tools that make downloading such a page easier. But what I'd kind of prefer is some web tool that handles this, essentially offering the user, say, a zip archive of the result. Any such thing? Better ideas?
chucker is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-21, 07:10

Hi, chucker, I found something!

Oh, cool, what is it?

It's called hanzo:web (I have no idea what the colon is doing there either), and it uses Heritrix, the same tool also used by the Wayback Machine, with the differences that its archiving is on-demand, and can be enhanced with tags and such.

Hm, that sounds good enough. Any responses about it on the web?

Well, there's this jackass who apparently thinks that the point of putting something on the web is to keep it for yourself.

Uh, that doesn't make any sense. Anyone else?

Well, Wikipedia lists three different solutions in their link rot article, and this seems to be the only fitting one.

Huh. So there isn't all that much feedback?

Curiously enough, no. It doesn't even have its own Wikipedia page.

And no known cases of Wiki-esque sites using it, either?

Nope, nothing.

Time to give it a try, then?

Yup!
chucker is offline   quote
drewprops
Magnificent Basturd™
 
Join Date: May 2004
Location: Atlanta
 
Old 2006-10-21, 07:55

I love Chucksterpiece Theatre.




As far as citation, Google's seems the most CYA of the bunch.

As far as that guy being a Jackass, I see his concern for someone scraping and reusing his content without his permission. I think we're all guilty of doing that with funny pictures we find on the web.... but for someone to use YOUR content on THEIR page and to make money from it is galling.

The idea of preserving a "snapshot" of the web is a good one - the trick is how you achieve it without stealing intellectual property for-profit.

Interesting topic.
I wonder what Chucker has to say about it?


Steve Jobs ate my cat's watermelon.
Captain Drew on Twitter
drewprops is online now   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-21, 08:37

Quote:
Originally Posted by drewprops View Post
I love Chucksterpiece Theatre.



[..]

Interesting topic.
I wonder what Chucker has to say about it?

Heh. Well, I have an extremely low response ratio to my questions here, and that can be rather frustrating. Perhaps they're just too specialized or something; I don't know

Not that anyone has an obligation to answer. But I'd really appreciate even the tiniest bits of help.

Quote:
Originally Posted by drewprops View Post
As far as citation, Google's seems the most CYA of the bunch.
Yeah, but they don't keep snapshots, and they occasionally break a page's layout to the point where it becomes unusable. (Google's markup is a extraordinarily horrifying, too.)

Quote:
Originally Posted by drewprops View Post
As far as that guy being a Jackass, I see his concern for someone scraping and reusing his content without his permission. I think we're all guilty of doing that with funny pictures we find on the web.... but for someone to use YOUR content on THEIR page and to make money from it is galling.
But they're not making money from it. Visitors are being presented with a static snapshot. This is no different than Google's cache, Wayback Machine's archive, etc. I don't really understand the notion of putting something publicly online, without any restrictions (no need for a subscription, an account, a password, etc.), yet wanting to control how people access it. It seems self-contradictory to me.

As for whether hanzo:web honors the no-archive meta tag, I'm unsure about that; their FAQ needs some work.

The blogger addresses that hanzo:web ignores robots.txt; indeed it does. As the terms state:
Quote:
If the original publisher
provided restrictions on crawling, either
through robots.txt directives or other
detectable methods, then the archive will be
private to you and not automatically sharable.
Sounds reasonable to me.

WebCite's FAQ has a section on this intellectual property matter.

Last edited by chucker : 2006-10-21 at 08:43.
chucker is offline   quote
drewprops
Magnificent Basturd™
 
Join Date: May 2004
Location: Atlanta
 
Old 2006-10-21, 08:47

Quote:
Originally Posted by chucker View Post
But they're not making money from it. Visitors are being presented with a static snapshot. This is no different than Google's cache, Wayback Machine's archive, etc. I don't really understand the notion of putting something publicly online, without any restrictions (no need for a subscription, an account, a password, etc.), yet wanting to control how people access it. It seems self-contradictory to me.
I believe that he speaks toward the practice of scraping without naming that concern specifically. You can see the problem he'd be having if his content was scraped, stuck on somebody else's website and peppered with AdSense ads that allowed the scraper to profit from your original content. That's his concern, he just didn't do a very good job of voicing that concern.

Dig?


Edit: just saw the link to WebCite.... reading now.

Steve Jobs ate my cat's watermelon.
Captain Drew on Twitter
drewprops is online now   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-21, 08:59

Quote:
Originally Posted by drewprops View Post
I believe that he speaks toward the practice of scraping without naming that concern specifically. You can see the problem he'd be having if his content was scraped, stuck on somebody else's website and peppered with AdSense ads that allowed the scraper to profit from your original content. That's his concern, he just didn't do a very good job of voicing that concern.
Right, but if he had done a few minutes of looking up hanzo:web, he would have figured out that that is not their goal. I think it's quite obvious from their website that their business model is in high archiving quotas, not in making money off the archive's contents.

(I guess you could argue that it's a question of trust, but a company that appears at an O'Reilly conference is certainly very unlikely to risk such unethical practices. They'd be flamed by geeks around the world like there's no tomorrow.)
chucker is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-21, 09:18

I suppose the big flaw, at least right now, is the backlog they appear to have, as they admit on their news section.

I hope it doesn't take all too long.
chucker is offline   quote
IncrediBILL
New Member
 
Join Date: Oct 2006
 
Old 2006-10-21, 23:30

Quote:
Originally Posted by chucker View Post
Hello, Jackass here.

What part of copyright eludes you?

Whether it's reposted for profit or not, it's MINE, and has NOARCHIVE and NOCACHE on every page and only 5 search engines are even allowed to crawl the site and they aren't allowed to cache the pages either, nor is the Internet Archive.

Just because I put something online publicly doesn't give anyone entitlement to violate my copyright and copy my material to any other website without permission, plain and simple.

It's posted for people to read, and I assume a file might be saved or printed for personal use, but reposting on another website without permission will earn you a DMCA notice.

Besides, since when was asking permission for using something so hard? I've been known to grant permission when asked and I've been known to send a lawyer when taken without asking a few times as well.

If a website says their content is free to use with a GPL like the Wiki or the article farms, then feel free to do with as you want.

If the website says "Copyright (c), All Rights Reserved", then expect trouble if you violate it.

That's simple now isn't it?
IncrediBILL is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 00:11

Quote:
Originally Posted by IncrediBILL View Post
Hello, Jackass here.
You registered here just so you could respond to some obscure forum post?

Quote:
What part of copyright eludes you?

Whether it's reposted for profit or not, it's MINE
Go ahead, shut down your website and keep your preciousness.

The whole point of running a website (or really any other kind of publication) is to share information, i.e. to partially surrender the notion of "it's MINE".

Quote:
Just because I put something online publicly doesn't give anyone entitlement to violate my copyright and copy my material to any other website without permission, plain and simple.
They aren't violating your copyright, nor are they "copying" material; they are merely archiving it.

What are you going to do if I save a webpage of yours, burn it on CD and distribute it at a parking lot?

How about, um, be happy that I've found your information so valuable that I went through the hurdles of distributing it?

Quote:
It's posted for people to read, and I assume a file might be saved or printed for personal use, but reposting on another website without permission will earn you a DMCA notice.
Doubtful, as Hanzo is not situated in the US.

Quote:
If a website says their content is free to use with a GPL like the Wiki or the article farms, then feel free to do with as you want.

If the website says "Copyright (c), All Rights Reserved", then expect trouble if you violate it.

That's simple now isn't it?
GPL'ing doesn't relinquish copyright, so you're meddling two unrelated issues (copyright and licensing) together.
chucker is offline   quote
Robo
Formerly Roboman, still
awesome
 
Join Date: Jul 2004
Location: on twitter! @werejack
 
Old 2006-10-22, 00:40

Methinks IncrediBILL should probably focus on having something to say worth stealing before he worries about people stealing it.
Robo is offline   quote
IncrediBILL
New Member
 
Join Date: Oct 2006
 
Old 2006-10-22, 02:09

When did I say GPL'ing relinquished copyright? I said if the website offers unlimited usage freely for any piurpose, then go for it. If they don't, you're just playing with fire.

Regardless of the location of the host, many of the search engines indexing that content ARE located in the US, such as Google, and they have to comply with a DMCA request. Furthermore, most countries have reciprocal copyright agreements with the US and will comply with requests to remove unauthorized material.


Quote:
The whole point of running a website (or really any other kind of publication) is to share information, i.e. to partially surrender the notion of "it's MINE".
Sharing information doesn't mean allowing it's to be stolen and republished without permission unless it falls under 'fair use' which has it's limits. The copyright law allows up to $150K in statutory damages, assuming you file a legal copyright on your material, for serious infringement. I've already collected a few hefty fees from people trying to avoid court that had your attitude about 'sharing information' and those websites no longer exist as part of the agreement to stop proceedings.

Try taking a book or magazine, which shares information, from a bookstore without paying for it and see if you don't end up in jail for shoplifting.

Try plagiarizing a published book and slapping your own name on it and republishing it as your own and see what happens, which is nothing different than lifting content from a website and republishing it.

The point I made which you overlooked is PERMISSION, asking PERMISSION can be an amazing thing and keep people out of court.

Last edited by IncrediBILL : 2006-10-22 at 02:15.
IncrediBILL is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 02:20

Quote:
Originally Posted by IncrediBILL View Post
Regardless of the location of the host, many of the search engines indexing that content ARE located in the US, such as Google, and they have to comply with a DMCA request.
Really? (Hint: Google won.)

Quote:
Sharing information doesn't mean allowing it's to be stolen
It isn't stolen.

Quote:
and republished without permission unless it falls under 'fair use' which has it's limits.
You could argue it's republished, and you could also argue it goes beyond fair use, but fair use laws are for an entirely different context anyway.

Quote:
I've already collected a few hefty fees from people trying to avoid court that had your attitude about 'sharing information' and those websites no longer exist as part of the agreement to stop proceedings.
Good for you.

Quote:
Try taking a book or magazine, which shares information, from a bookstore without paying for it and see if you don't end up in jail for shoplifting.
As you say yourself, I would go in jail for shoplifting. I.e., for theft. Not for copyright infringement. Completely different situation.

Quote:
Try plagiarizing a published book and slapping your own name on it and republishing it as your own and see what happens, which is nothing different than lifting content from a website and republishing it.
Hugely different. hanzo:web doesn't slap their own name, doesn't republish as their own, doesn't pretend in any way it's theirs. Therefore, it is not plagiarism either.

Quote:
The point I made which you overlooked is PERMISSION, asking PERMISSION can be an amazing thing and keep people out of court.
And why would I ask permission to access something that I can already access anyway?
chucker is offline   quote
AsLan^
Not a tame lion...
 
Join Date: May 2004
Location: Narnia
Send a message via MSN to AsLan^  
Old 2006-10-22, 02:29

How very interesting...

Obviously the question of attribution is a no-brainer, you can't take someone elses work and call it your own. But what about properly attributed reproduction without permission?

Technically, the information is re-copied every time it hits a router or switch on its way to the consumer, and I doubt the owners of that hardware have personally asked anyones permission before reproducing their protected material.

EDIT: Thanks for that link to the Google case Chucker, answered a couple of questions.
AsLan^ is offline   quote
Robo
Formerly Roboman, still
awesome
 
Join Date: Jul 2004
Location: on twitter! @werejack
 
Old 2006-10-22, 03:28

Wouldn't suing people who mirrored your website to make it more accessible defeat the purpose of putting it on the internet in the first place?
Robo is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 03:30

Quote:
Originally Posted by Roboman View Post
Wouldn't suing people who mirrored your website to make it more accessible defeat the purpose of putting it on the internet in the first place?
That would be what I've been trying to argue all along.
chucker is offline   quote
Robo
Formerly Roboman, still
awesome
 
Join Date: Jul 2004
Location: on twitter! @werejack
 
Old 2006-10-22, 03:35

Quote:
Originally Posted by chucker View Post
That would be what I've been trying to argue all along.
I know. But I guess I'm arguing less of a "Can you sue mirror-ers?" debate and more of a "Why the hell would you want to?" one.

If any of my sites had so much traffic that I needed people to mirror it, I would thank my lucky stars. And I'd thank the mirror-ers, too. I wouldn't sue them.

cue the lights and dim the stars
Robo is offline   quote
BarracksSi
BANNED
I am worthless beyond hope.
 
Join Date: Jul 2004
Location: Washington, DC
 
Old 2006-10-22, 08:55

Quote:
Originally Posted by IncrediBILL View Post
Try plagiarizing a published book and slapping your own name on it and republishing it as your own and see what happens, which is nothing different than lifting content from a website and republishing it.
I'm pretty sure it's been said already, but nobody here has advocated claiming the mirror as their own website.
BarracksSi is offline   quote
Banana
is the next Chiquita
 
Join Date: Feb 2005
 
Old 2006-10-22, 09:01

OT

Observe inverse relationship between a perception of greatness in one's screenname and one's post quality.

/OT
Banana is offline   quote
BarracksSi
BANNED
I am worthless beyond hope.
 
Join Date: Jul 2004
Location: Washington, DC
 
Old 2006-10-22, 09:02

Okay. Comparing this to purchasing books is not a valid comparison. Purchasing books means that someone pays money. No money changes hands to view IncrediBILL's website, so no money is being lost if it were hosted somewhere else.

This is more like handing out free pamphlets, then somebody coming along and publishing the exact same pamphlet, credits and all, and also distributing it for free, all for the purpose of preserving its information and spreading the word.

I think it would at least be common courtesy to ask for permission. What if the original author can't be reached, though?
BarracksSi is offline   quote
Shades of Blue
Member
 
Join Date: Sep 2006
Location: Washington, D.C.
 
Old 2006-10-22, 09:15

Quote:
Originally Posted by BarracksSi View Post
Okay. Comparing this to purchasing books is not a valid comparison. Purchasing books means that someone pays money. No money changes hands to view IncrediBILL's website, so no money is being lost if it were hosted somewhere else.

This is more like handing out free pamphlets, then somebody coming along and publishing the exact same pamphlet, credits and all, and also distributing it for free, all for the purpose of preserving its information and spreading the word.
What if money is changing hands between IncrediBILL and the owners of his ad banners? Now people are getting the benefit of his content without the financial benefits going to him. Just because you give something away for free doesn't sacrifice your distribution rights... you could have an event where you gave away 50,000 copies of your new book, but that doesn't give the recipients the legal right to start xeroxing it and giving it to all their friends.
Shades of Blue is offline   quote
BarracksSi
BANNED
I am worthless beyond hope.
 
Join Date: Jul 2004
Location: Washington, DC
 
Old 2006-10-22, 09:43

Banner ads? I forgot that they even existed...

I take it that those ads trace their clicks to the server that's hosting the site?
BarracksSi is offline   quote
Shades of Blue
Member
 
Join Date: Sep 2006
Location: Washington, D.C.
 
Old 2006-10-22, 09:51

Quote:
Originally Posted by BarracksSi View Post
Banner ads? I forgot that they even existed...

I take it that those ads trace their clicks to the server that's hosting the site?
Or maybe they're generated using server-side code, something that wouldn't be replicated on the mirror. My point is that he's not necessarily giving away anything free; there is some money involved, and even if there wasn't, giving it away free doesn't sacrifice his future right to stop giving it away free.

I understand there are exceptions for caches, search engines, etc., but I'm responding more to the assertion that people give away their exclusive distribution rights altogether when they put something up for free on the web. It's just not true.
Shades of Blue is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 10:24

Quote:
Originally Posted by BarracksSi View Post
This is more like handing out free pamphlets, then somebody coming along and publishing the exact same pamphlet, credits and all, and also distributing it for free, all for the purpose of preserving its information and spreading the word.

I think it would at least be common courtesy to ask for permission.
But why? The pamphlet's message isn't altered at all. Nor any of its Metadata — the attribution is left intact.

Fine, call it common courtesy, but if I were to ask someone for permission if I could help their campaign's cause by distributing the free pamphlets, using my own resources (by printing them and handing them out, which costs money and time), how could they possibly have any other answer than "um, yeah? Go ahead, duh?"

Quote:
What if the original author can't be reached, though?
Indeed, this is also stated as undisputed fact(!) in the Google case:

Quote:
14
15
16
Given the breadth of the Internet, it is not possible for Google (or other search
engines) to personally contact every Web site owner
to determine whether the owner wants the
pages in its site listed in search results or accessible through “Cached” links. See Brougher Decl.
¶18; see also Levine Report ¶25.
The exact same applies for hanzo. They couldn't possibly manually contact webmasters before actually fulfilling on-demand requests. They have enough of a backlog as it is, due to technical resource limitations. Imagine having human resource limitations added to that.
chucker is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 10:30

Quote:
Originally Posted by Shades of Blue View Post
Or maybe they're generated using server-side code, something that wouldn't be replicated on the mirror. My point is that he's not necessarily giving away anything free; there is some money involved, and even if there wasn't, giving it away free doesn't sacrifice his future right to stop giving it away free.
If sources of income are so tight that they rely completely on
1) ads being presented
2) users actually seeing (not filtering) the ads
3) users actually responding (clicking) the ads (because, otherwise, the ads couldn't possibly be a sustainable model!),
then the site can't possibly survive in this day and age anyway. Even the most mainstream of browsers has a popup filter, thereby effectively removing a vast majority of ads. For virtually every browser, there's ad filters. Not to mention browser-unrelated filters, as well as those provided through channels such as the ISP or the router.

If a site is that complex, one should use a subscription-based model, with guaranteed and honest (the users actually care to give the site money, rather than merely tolerating and trying hard to ignore the ads) money coming in.

Others' finances aren't my business, of course, but ads have never, ever been something that can single-handedly be relied upon. People ignore, people filter, people are annoyed. There are no people that actually think "hey, an ad! cool, I will support the site by clicking on it, and actually be interested in the advertised product as well!". It doesn't happen. At best, an ad is funny and intriguing and thus worth watching, but that doesn't necessite interest in the actual product.

Quote:
I understand there are exceptions for caches, search engines, etc., but I'm responding more to the assertion that people give away their exclusive distribution rights altogether when they put something up for free on the web. It's just not true.
Not altogether, but virtually completely in the context of that medium (here, the web).
chucker is offline   quote
IncrediBILL
New Member
 
Join Date: Oct 2006
 
Old 2006-10-22, 19:54

Quote:
Technically, the information is re-copied every time it hits a router or switch on its way to the consumer, and I doubt the owners of that hardware have personally asked anyones permission before reproducing their protected material.
Good example, but it's only temporary, it doesn't appear on someone else's website beyond the control of the author. Assuming the service honors the no-caching requirements of my server, it's never even cached in AOL.

Quote:
The exact same applies for hanzo. They couldn't possibly manually contact webmasters before actually fulfilling on-demand requests.
Wrong, same applies to Google, robots.txt dictates whether inclusion is allowed or not and Google does honor this but at the time I reviewed Hanzo it did not. No manual contact is necessary as my robots.txt has a list of allowed agents and beyond that list says quite plainly:

User-agent: *
Disallow: /

That means I don't even need to know Hanzo exists as I've already stated that my content is off limits.

Quote:
ads have never, ever been something that can single-handedly be relied upon.
I'm sure TV, radio, newspapers and magazines would completely disagree.

Quote:
What if money is changing hands between IncrediBILL and the owners of his ad banners? Now people are getting the benefit of his content without the financial benefits going to him. Just because you give something away for free doesn't sacrifice your distribution rights... you could have an event where you gave away 50,000 copies of your new book, but that doesn't give the recipients the legal right to start xeroxing it and giving it to all their friends.
Money isn't even the issue, like you rightly pointed out, just because the medium I choose to publish on is the internet, instead of pen and ink, doesn't give others the right to decide how to redistribute my content other than getting it either distributed via my website or what I personally determine to distribute via an RSS feed for general purpose distribution.

Archiving has caused all sorts of issues where people have been sued to remove trademarks from their website after getting a demand letter from a corporation, yet still being sued after complying because the trademarks still existed in their content on the Internet Archive and Google cache as you were technically still infringing although removing the material in a timely manner was beyond your ability to control.

The bottom line is still the fact that other than my website, nobody has the right to distribute the material without consent unless the website states otherwise and limited access to some companies is stated in robots.txt. Companies or anyone else that deploys automated tools that ignore robots.txt deserve to get whatever pain may come their way.

Beyond copyright violation and blatant disregard for robots.txt, they are also violating my site usage license which clearly states that automation may not be used to collect and redistribute the information so I don't even need to prove copyright infringement. I can just produce a log file showing a few thousand pages were pulled down in a few minutes, something a human at a browser can't do, to prove that the site license was violated, a simple violation of terms and conditions and general purpose contract law to the rescue

Last edited by IncrediBILL : 2006-10-22 at 20:23.
IncrediBILL is offline   quote
Ryan
Veteran Member
 
Join Date: May 2004
Location: Tejas
 
Old 2006-10-22, 20:34

So, IncrdiBILL, why don't you want anyone archiving you're site? What do you gain from not allowing that?
Ryan is offline   quote
chucker
 
Join Date: May 2004
Location: near Bremen, Germany
Send a message via ICQ to chucker Send a message via AIM to chucker Send a message via MSN to chucker Send a message via Yahoo to chucker Send a message via Skype™ to chucker 
Old 2006-10-22, 20:49

Quote:
Originally Posted by IncrediBILL View Post
Wrong, same applies to Google, robots.txt dictates whether inclusion is allowed or not and Google does honor this but at the time I reviewed Hanzo it did not. No manual contact is necessary as my robots.txt has a list of allowed agents and beyond that list says quite plainly:

User-agent: *
Disallow: /

That means I don't even need to know Hanzo exists as I've already stated that my content is off limits.
I have already cited above why Hanzo archives content even when it's disallowed in robots.txt. I don't need to explain it twice. Suffice to say it does honor robots.txt by not making such content publicly available.

Quote:
I'm sure TV, radio, newspapers and magazines would completely disagree.
I suppose you've never heard of newspapers and magazines that you pay a subscription fee for despite having ads.

Quote:
Beyond copyright violation and blatant disregard for robots.txt, they are also violating my site usage license which clearly states that automation may not be used to collect and redistribute the information so I don't even need to prove copyright infringement. I can just produce a log file showing a few thousand pages were pulled down in a few minutes, something a human at a browser can't do, to prove that the site license was violated, a simple violation of terms and conditions and general purpose contract law to the rescue
You are, of course, assuming that "general purpose contract law" applies. That's what people who write EULAs assume as well. How wrong they can be.

Anyway, let me know when you can give me a good case where you have a website that you want publicly available, yet not archived. For there isn't one. It's completely absurd.
chucker is offline   quote
IncrediBILL
New Member
 
Join Date: Oct 2006
 
Old 2006-10-22, 20:57

Quote:
Originally Posted by Ryan View Post
So, IncrdiBILL, why don't you want anyone archiving you're site? What do you gain from not allowing that?
Why don't publishers want Google scanning their books?

What do they hope to gain?

Actually I think I addressed one part, the legal ramifications, in my last post as people have sued over content that was corrected on their website yet remained visible in the Wayback Machine and Google cache.

If you want to achive it on your personal computer for your person use, I can deal with that.

If you publish that archive or make it otherwise publicly accessible, then we have issues as once you lose control over your copyight, it's harder to defend it down the road.

Try some of your arguments out archiving some website like Corbis or Getty image banks and then make it publicly accessible. Sit back and see how long it takes before a swat team of lawyers doesn't own whoever does it in no time at all.

Basically, it's my intellectual property to do with as I please and making it freely available on the internet doesn't give anyone else the authority to reproduce it, even in a free archive or mirror site, is that so hard to comprehend?

Perhaps those of you that haven't invested many years developing an online property don't get it but you can rest assured many others that invested time on their intellectual property feel the same.

If I want the site publicly archived and mirrored, I'll personally submit it to be archived,

If I don't want it publicly archived, then respect my decision as it's not anyone else's decision to make but mine.

How simple is that?
IncrediBILL is offline   quote
IncrediBILL
New Member
 
Join Date: Oct 2006
 
Old 2006-10-22, 21:01

Quote:
Anyway, let me know when you can give me a good case where you have a website that you want publicly available, yet not archived. For there isn't one. It's completely absurd.
It's not absurd at all as it's my property and I'm allowed to share without allowing it to be distributed by other means.

You can say whatever you want but it doesn't make your point valid as there has never been any law that makes intellectual property theft valid, for any reason, ever, and thinking otherwise is absurd.

FWIW, Hanzo web was blocked in my firewall the minute I figured out they ignored robots.txt. Problem solved.
IncrediBILL is offline   quote
PKIDelirium
Mother Father Gentleman
 
Join Date: Oct 2005
Location: Xenia, Ohio
 
Old 2006-10-22, 21:06

Did you overdose on Viagra or something? Because you seem to be a big dick.
PKIDelirium is offline   quote
Posting Rules Navigation
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Page 1 of 2 [1] 2  Next

Reply

Forum Jump
Thread Tools
Similar Threads
Thread Thread Starter Forum Replies Last Post
Go Spain: new attitude towards female models Moogs AppleOutsider 53 2006-09-18 12:50
Some websites stop working in Safari, not in FF newt Genius Bar 2 2005-03-16 19:52


« Previous Thread | Next Thread »

All times are GMT -5. The time now is 01:52.


Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Copyright ©2004 - 2012, AppleNova
AppleNova Slim