Prototype Bot-Hunter List
Moderator: Moderators
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
Prototype Bot-Hunter List
This thread is being linked to as part of a reply for those interested in it to avoid spamming the topic thread.
TrueBots List
(List has been regenerated on Page 2)
MaybeBots List
(List has been regenerated on Page 2)
TrueBots List
(List has been regenerated on Page 2)
MaybeBots List
(List has been regenerated on Page 2)
It seems the program needs a little tweaking, with the first name on the list being not-a-bot...
o.o
But still, it is impressive...what kinda parameters were used? Post count? Keywords?
Now, if only there was a program that would delete all of the bots...I hate to think of all the hours it would take to manually delete all of those accounts...
o.x
o.o
But still, it is impressive...what kinda parameters were used? Post count? Keywords?
Now, if only there was a program that would delete all of the bots...I hate to think of all the hours it would take to manually delete all of those accounts...
o.x
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
Edit*
It retreives information from member profiles like email,msn messenger,icq,posts,location,occupation, and interests. I had found a pattern in occupation and interests in many of the bots starting from around 2005. although there are a few other patterns that I have noticed I haven't had the time to code in the needed logic to catch those patterns as of yet. The case with LadyWarrior is that she posted a website in here Location field and the Prototype currently assumes such placement to be behavior of a bot, it'll take time to refine it as I go along.
These are just the results from the prototype, it isn't going to be perfect right away because it is still being worked on, feedback will fortunately help reduce the chance of errors, there are still more filter methods that can be added to improve the number of actual bot accounts and reduce it mistaking a real member's profile as a bot profile.
Any ideas for making it more accurate are appreciated.
It retreives information from member profiles like email,msn messenger,icq,posts,location,occupation, and interests. I had found a pattern in occupation and interests in many of the bots starting from around 2005. although there are a few other patterns that I have noticed I haven't had the time to code in the needed logic to catch those patterns as of yet. The case with LadyWarrior is that she posted a website in here Location field and the Prototype currently assumes such placement to be behavior of a bot, it'll take time to refine it as I go along.
These are just the results from the prototype, it isn't going to be perfect right away because it is still being worked on, feedback will fortunately help reduce the chance of errors, there are still more filter methods that can be added to improve the number of actual bot accounts and reduce it mistaking a real member's profile as a bot profile.
Any ideas for making it more accurate are appreciated.
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
That's one of the ideas that I've had but I wasn't sure exactly if a time-based method would be appropriate or not since someone said before that the inactive members aren't the problem. But what about members with no posts or any other information besides perhaps an email? Should they be included after so many months?
Some of our most active members provide little if no info in their profiles, so that would be casting your net a bit too widely, so to speak...
If your program has the ability to cross-reference, then that would be a good element to include, though, as long as it is qualified by something else...
:3
If your program has the ability to cross-reference, then that would be a good element to include, though, as long as it is qualified by something else...
:3
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
- FoobyKamikaze
- Foodophile
- Posts: 6092
- Joined: Tue Jul 15, 2008 6:37 am
Considering that (as far as I know) you're limited to information available on profiles, the fact that you've generated this list is pretty amazing.
I would check for the presence of an avatar. At least it's shown on the user profile page.
By publicly available, I refer to the fact that the database stores such nice stuff as the last time the user visited, the time the user registered, etc. (and in epoch time, so you can easily do a delete where $last_visit-$registered < 3600).
I would check for the presence of an avatar. At least it's shown on the user profile page.
By publicly available, I refer to the fact that the database stores such nice stuff as the last time the user visited, the time the user registered, etc. (and in epoch time, so you can easily do a delete where $last_visit-$registered < 3600).
y̸̶o͏͏ų̕ sh̡o̸̵u̶̕l̴d̵̡n̵͠'̵́͠t͜͢ ̀͜͝h̶̡àv̸e͡ ̛d̷̨͡o͏̀ne ̶͠͡t҉́h̕a̧͞t̨҉́.̵̧͞.͠͞.͟avwolf wrote:"No dating dog-girls, young man, your father is terribly allergic!"
Not if you combine it with other criteria...
Personally, I think that if a user hasn't been seen for 1 year+ (this is the tricky one that requires direct db access) or hasn't been seen since registration+1 hour, and doesn't have any posts, and it's been at least a week since they registered (or longer), and matches whatever criteria that KitWiz has managed to find should be considered candidates for deletion.
In pseudo-code for anyone who wants it,
Considering that with a quick look, I can only see a random occupation and 2 interests so far, so I'm not going to attempt to find out his criteria, mainly because I'm supposed to be studying right now for upcoming exams. I would love to see the code used though. (And then it becomes valid studying, because I take Com Sci as a subject. )
Pattern matching would definitely be necessary, though how is it implemented? Parsing out the interests (for example) by finding the line in the HTML source, reading until <br>, then counting the number of commas? (Based on the 2 interests theme that I see so far...)
I'm also for adding recaptcha to the registration page in an effort to stop the bots from getting in the first place.
Personally, I think that if a user hasn't been seen for 1 year+ (this is the tricky one that requires direct db access) or hasn't been seen since registration+1 hour, and doesn't have any posts, and it's been at least a week since they registered (or longer), and matches whatever criteria that KitWiz has managed to find should be considered candidates for deletion.
In pseudo-code for anyone who wants it,
Code: Select all
//Check if the user hasn't visited for the last year OR hasn't visited since the user was registered other than in the first hour since the account was registered AND the account is more than a week old.
if(((time_now - lastvisit >31536000)||(lastvisit-registration<3600))&&(time_now-registration<604800)){
//And check that the number of posts is more than 0
if(posts==0){
//Delete the bot
}
else {
//Probably real, there's an actual post from this account
}
}
else {
//Probably real, they've visited in the past year or they visited within the week since they've registered, and the account is more than a week old to allow for the newbs.
}
Pattern matching would definitely be necessary, though how is it implemented? Parsing out the interests (for example) by finding the line in the HTML source, reading until <br>, then counting the number of commas? (Based on the 2 interests theme that I see so far...)
I'm also for adding recaptcha to the registration page in an effort to stop the bots from getting in the first place.
y̸̶o͏͏ų̕ sh̡o̸̵u̶̕l̴d̵̡n̵͠'̵́͠t͜͢ ̀͜͝h̶̡àv̸e͡ ̛d̷̨͡o͏̀ne ̶͠͡t҉́h̕a̧͞t̨҉́.̵̧͞.͠͞.͟avwolf wrote:"No dating dog-girls, young man, your father is terribly allergic!"
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
Back from work, I've also noticed that I can access the last post time(if applicable) as well as review the content of the posts made by that member. OH, I've also cleaned up and somewhat refined the Logic for detecting bots and possible annoyances.
I'll be adding the new results in a bit once they get finished. ^_^
I'll be adding the new results in a bit once they get finished. ^_^
- KitWiz4687
- Merchant
- Posts: 194
- Joined: Wed Sep 17, 2008 3:27 am
Well I've added a few things and removed a few, so it's now mistaking people as bots far less than it did, but there is still room for even more improvement. Post here to inform me of any false-positives that you people may find so that I can improve the person-exclusion parameters.
I've also added a snippet of code so that the lists also show which bot parameter selected an included name to make it easier to debug false-positives, the information is given as the 3-digit number after each name. Also, any additional ideas for new parameters are gladly welcomed.
*Lists have been regenerated
I've also added a snippet of code so that the lists also show which bot parameter selected an included name to make it easier to debug false-positives, the information is given as the 3-digit number after each name. Also, any additional ideas for new parameters are gladly welcomed.
*Lists have been regenerated
- FoobyKamikaze
- Foodophile
- Posts: 6092
- Joined: Tue Jul 15, 2008 6:37 am