Well i will start a new programming project and i it will be a search engine. Any ideas how to do code a web crawler/spider ? and not just the regular way but instead, i would like to manually add websites to crawl and index content even if the forum is protected for registered users only.
Help would be appreciated
Searching engine coding
- floodhound2
- ∑lectronic counselor
- Posts: 2117
- Joined: 03 Sep 2006, 16:00
- 17
- Location: 127.0.0.1
- Contact:
I can imagine you could go about this in a few different ways. The most logical choice in my opinion would to run a server that has a program stored on it. This program (you write) would have a link placed in its textbox. It would then begin searching the link looking for other links, sites and data.
In VB you would use that object "inet" - as an example.
Your project is a nice one, but you'll need to specify a programming language. I could supply you some code, but you should write it on your own. First pick a language and then ask more about where and how you are doing / having difficulties.
As a heads up this would be a string heavy project. Depending on the "level" of the programming language it will either be a nightmare or a slight torture if you are not familiar with strings, data types etc.
In VB you would use that object "inet" - as an example.
Your project is a nice one, but you'll need to specify a programming language. I could supply you some code, but you should write it on your own. First pick a language and then ask more about where and how you are doing / having difficulties.
As a heads up this would be a string heavy project. Depending on the "level" of the programming language it will either be a nightmare or a slight torture if you are not familiar with strings, data types etc.
Well i saw some open source web spiders written in C and other in PHP and thats why im kind of confused.
+How to code a web crawler/spider that can bypass the registration limits to index files and i want to manually add specific sites.
Note: i dont think data types, strings and arrays are much of a prob
+How to code a web crawler/spider that can bypass the registration limits to index files and i want to manually add specific sites.
Note: i dont think data types, strings and arrays are much of a prob
- floodhound2
- ∑lectronic counselor
- Posts: 2117
- Joined: 03 Sep 2006, 16:00
- 17
- Location: 127.0.0.1
- Contact:
I don't think, and I could be wrong but web crawlers cant bypass registration limits unless it is given access in advance. I base this on the fact that we here at suck-o wont allow Google to access the threads.3XTORTION wrote:Well i saw some open source web spiders written in C and other in PHP and thats why im kind of confused.
+How to code a web crawler/spider that can bypass the registration limits to index files and i want to manually add specific sites.
Note: i dont think data types, strings and arrays are much of a prob
As far as it written in C, PHP or any language; its all up to the programmer and any one should do fine. Some will have benefits that others wont. Like C will work on just about on any computer i.e iphone etc. where VB wont.
- floodhound2
- ∑lectronic counselor
- Posts: 2117
- Joined: 03 Sep 2006, 16:00
- 17
- Location: 127.0.0.1
- Contact:
Did you search wiki for bots? They got some open source code already for web crawlers. The good thing its written in C.
Code: Select all
http://en.wikipedia.org/wiki/Web_crawler
- bad_brain
- Site Owner
- Posts: 11636
- Joined: 06 Apr 2005, 16:00
- 19
- Location: In your eye floaters.
- Contact:
you can also have a look at Nutch, checking the source will surely help you to build your own:
http://lucene.apache.org/nutch/
http://lucene.apache.org/nutch/