Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"before swearing at me."

Followed by shitty" and "web crawlers" in all caps ^^ Someone ate a clown for breakfast I see.

"No, we are not going to ban all the links from GitHub"

What? Slow down there -- why would you care about invalid links? Did you just say that you can't possibly allow users to post links, as long as you don't know they work for automatic crawlers, not just for human visitors? And someone else chimed in saying you give arguments? Heh.

Well, you give an attempt of one, with "we are not crawling in the sense of", and then refute it with the bit you quoted: "web crawlers and other web robots". It's not called webscraper.txt, it's robots.txt period.

So how then would a website determine a rogue user agent? You dress up like the slimy guys, you get the banhammer -- what do you expect? If you care so much about Facebook and Twitter "content" that it is worth it for you to be undistuingishable from attackers, then just cope with it. But don't pout at me, just eat up what you ordered.

And what delusions about the internet? You just beat around the bush and then finish with that strawman? And what is an "imaginary delusion", by the way? The one you imagine I have? Now that's a Freudian slip if I ever saw one ^^

"Completely evil, wrong and backwards"... so... You're entitled to know the validity of links posted on your site, but website owners aren't allowed to care about their resources and who they offer them to? Who's deluded?



Slow down there -- why would you care about invalid links?

Because Stack Overflow is a site whose purpose is to answer questions. People may provide links when asking or answering a question, and those links may be important in understanding either the question or the answer. So invalid links degrade the value of the site.

What they're doing is fundamentally different than web crawling. Web crawlers are about discovering content. That means starting at a root and crawling out to see what you can find. One URL can spawn many more URLs to look at. They are starting with a known URL, and seeing if they can visit that URL. They have one URL, and only visit one URL.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: