TwoKinds [of] data

Message

Technic[Bot] · #1 Post by **Technic[Bot]** » Thu Jul 05, 2018 3:11 am

Probably you just clicked on this thread thinking: "What is.. even...?"
Well let me explain. In my line of work one of the things I do is analyze data. Mostly simply measurements like distances and whatnot. But i also enjoy working with any kind of data, for fun and sometimes profit.
So after being here for a bit over 5 months i decided to do some simple analysis of any TwoKinds related data that I could find. At first i just wanted to satisfy my own curiosity and, as I i said, I find this to be fun. But after a few days of work I decided to share some of what i found with everyone in the forum! Hopefully someone will find it interesting and we'll all learn something new.

A fair warning thought: if not obvious by now this post will be number heavy so if that makes you uncomfortable. Well i did warned you, in any casei plan to stuff all my math heavy stuff in this thread so you can easily ignore it. If that makes you happy.
I will organize this in spoiler tags, as not to make a massive wall of text and to have some sort of order to this little project of mine.

Also, feedback is welcome, if any graph or part of this is not clear or if you know some part is wrong don't hesitate to tell me. Also if you have any questions, i am all ears. Finally if for some reason this is not considered kosher for the forum, just shoot me a message and i will remove it.

And without any further ado lets cut to the meat of the business here, hope you find some of this insightful:

Comic Schedule

Spoiler!

Twokinds popularity

Spoiler!

Twokinds fandom

Spoiler!

FAQ

Spoiler!

NuclearBird · #2 Post by **NuclearBird** » Thu Jul 05, 2018 6:49 am

Hmmmm, statistics. Always so nice.
Tbh, the searches coming from Hungary surprise me, all things considered.

Vintage · #3 Post by **Vintage** » Mon Jul 09, 2018 5:31 am

This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more

Ddraig · #4 Post by **Ddraig** » Mon Jul 09, 2018 9:28 pm

Vintage wrote: ↑Mon Jul 09, 2018 5:31 am This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more

There does seem to be a bit of an explosion of interest there, doesn't there?

Vintage · #5 Post by **Vintage** » Tue Jul 10, 2018 12:28 am

Ddraig wrote: ↑Mon Jul 09, 2018 9:28 pmThere does seem to be a bit of an explosion of interest there, doesn't there?

Makes me wonder if this data can be cross-referenced with the data that Google's indexed for the forum here. We still have the mystery of why we had our peak activity in August 2013.

Technic[Bot] · #6 Post by **Technic[Bot]** » Tue Jul 10, 2018 2:30 am

Vintage wrote: ↑Mon Jul 09, 2018 5:31 am This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more

Thanks! it was also fun to put together.
The mean is actually 5.2 days is the green dashed line on top of that mess of a graph.

Vintage wrote: ↑Tue Jul 10, 2018 12:28 am
Ddraig wrote: ↑Mon Jul 09, 2018 9:28 pmThere does seem to be a bit of an explosion of interest there, doesn't there?
Makes me wonder if this data can be cross-referenced with the data that Google's indexed for the forum here. We still have the mystery of why we had our peak activity in August 2013.

As much as it is useful and interesting Google trends data is a bit superficial at the end. Really nuanced data, like who, when and why someone queried a term Is indirectly sold by google to third parties via ads. What I am trying to say is that information regarding the specifics of this forum is unlikely to be public.
That being said if i could get some info about the posts frequency in this site cross referencing the information would not be that hard. I could, theoretically, scrape all the necessary information from the forum, doing it by hand would be impossible. But that would like painfully close to a DDoS so i would rather not do it without asking for clearance to the administration.

In any case my opinion about why the Bastitin Isles arc was so popular:

Spoiler!

Technic[Bot] · #7 Post by **Technic[Bot]** » Sat Jul 14, 2018 4:09 am

When you look at this comic you can see at least two different kinds of data (pun intended). First we have the meta-data, as publishing date, interest curves, geolocalization etcetera. Most of the previous post was based on that, metadata. But there is more information in the comic, namely the dialogue and the pages themselves. The pages, or images, are highly dimensional data so it is pretty hard to get something out of that. But thankfully we have the full transcript of the comic in plain text, sort of more on that later, so we can crunch some data about the trasncriot or as i like to call it the [n]"Twokinds dialogue/play"[/n] simply because that is how i stored it on my computer as a theather play.
So now this post is gonna deal mainly about the Twokinds transcript, how many lines and words, how many characters and all that good stuff. Again spoilered so the post looks nice and clean:

Something Meta first

Spoiler!

Now to the topic at hand

Spoiler!

Who speaks the most?

Spoiler!

A better visualization

Spoiler!

Random Curio

Spoiler!

And that is all for the moment, these things do take more time to make than I expected. There are a few more things i was planing to do with the data i have, you can expect it sometime on the future.
Also if you have any query about the data, something you want to know about the comic or so, feel free to leave a suggestion, one problem with datasets is that sometimes you have no idea what to do with them.

Also sorry for the double post but enough time has passed right?...

Dadrobit · #8 Post by **Dadrobit** » Sat Jul 14, 2018 5:59 am

Keep it up,this stuff is fun! Did a bit of posting history sleuthing once or twice myself, so it's cool to see all of this laid out like this.

Technic[Bot] · #9 Post by **Technic[Bot]** » Thu Jul 19, 2018 1:04 am

Now to something a bit different. This will be a bit of a shorter post. More on that at the end.
In the las post we dealt with the Twokinds Transcript but we looked mostly at the question: Who said what? Or as it turn out How much each character spoke? This is interesting and all but it dealt mostly with information we can take from the structure of the text in this case: a theater play.
This is interesting and gave us a lot of insights but does not directly deals with the content of the dialogue. Wich i find a bit more interesting. So what can we do with that?
A kind cloud

Spoiler!

That is all she wrote

Spoiler!

Credits

Spoiler!

Neptune · #10 Post by **Neptune** » Fri Aug 10, 2018 12:13 am

Trivia: If all of the lines of dialogue were pasted on A4 paper with 600 characters per page, the corpus would consist of an impressive 946 pages.

Now, imagine if TwoKinds was, instead, a novel (or epic which means a novel over 100,000 words), I'd imagine that if we were 30-50% through, that would be an amazing ~1850-3200 pages! Well, accounting for literary devices besides dialogue, and it all being compressed into one volume instead of several (considering the 300-page splitting rule, this would be around 6 to 11 volumes).

If TwoKinds was actually a book, then Tom would actually be done with it lol

Technic[Bot] · #11 Post by **Technic[Bot]** » Thu Sep 06, 2018 6:09 am

Deep Neural Keidran: Part 1
Ok this is something a bit different from what i did before.
In my line of work you get to play wiht al sort of programs and machines that do all sorts of crazy and interesting stuff. Both really old ~1960ś and really new stuff. This decade deep learning and neural networks are all the rage.
And since most of this stuff can run, somewhat, on consumer grade computers i could not resist but to grab some of the popular architectures and run them on the Twokinds Dataset, or as we know it the comic!
Fair waring thought this will be mostly me fawning machine learning and whatnot, also it might get a bit math heavy. So feel free to skip it if this is not your cup of tea but I do promise nice pictures and some insight on modern artificial intelligence. And hopefully you all have at least half the fun i had making this!
First some legal information, so Tom won't sue me to high heaven and back:

Spoiler!

With all the formalities covered lets start!

YOLO

Spoiler!

Well that was something...
Oh before I forget there is a part 2! This thing turned out to be longer and harder than expected so i decided to split it into two. Next time we will be seeing a different architecture and a different problem. So stay tuned for more Machine learned Keidran!!!

Arcus_Deer · #12 Post by **Arcus_Deer** » Thu Sep 06, 2018 8:09 pm

This is incredibly interesting! Though I do not have any experience with this sort of thing, would it be possible to theoretically (I understand it would be a [censored] load of work to actually do) replace the COCO Dataset with a custom one made up of Twokinds characters? Would the program be smart enough to, say, identify the characters in Tom's colored Patreon art if the database was built from all of the comic work?

Also very interesting that it for some reason cannot identify Trace, can it really identify Karen better? I mean she has unusual hair and ears, which I assumed would be less human looking than Trace!

I can't wait for the next bit, thank you very much for sharing!! If I find any pictures I think would be interested to test, I'll dm you. Thanks again!

aitaituo · #13 Post by **aitaituo** » Fri Sep 07, 2018 2:33 am

Maeve is a clock. It is known.

Technic[Bot] · #14 Post by **Technic[Bot]** » Fri Sep 07, 2018 5:15 am

You ask and i answer

Spoiler!

Anyhow now without more further ado:

Machine Learned Keidran: Part 2
As i said this was supposed to be one single post, but it was a bit more complicated than i expected it to be so i ended up splitting it in two so:
First some more legal disclaimers !!!!! :
For Tom:

Spoiler!

For CMU

Spoiler!

Hopefully that will get any lawyer off my back!

Openpose

Spoiler!

What i can't show you

Spoiler!

So that would be the end of the machine learned Keidran arc of my little thread. Hopefully you all had some fun.

So what did we learn today:
If the machine apocalypse ever happens, going around naked or dressing yourself as a giant cat might increase you chances of survival.
See you around!

Ddraig · #15 Post by **Ddraig** » Fri Sep 07, 2018 9:26 pm

Technic[Bot] wrote: ↑Fri Sep 07, 2018 5:15 am You ask and i answer
Spoiler!

Arcus_Deer wrote: ↑Thu Sep 06, 2018 8:09 pm This is incredibly interesting! Though I do not have any experience with this sort of thing, would it be possible to theoretically (I understand it would be a [censored] load of work to actually do) replace the COCO Dataset with a custom one made up of Twokinds characters? Would the program be smart enough to, say, identify the characters in Tom's colored Patreon art if the database was built from all of the comic work?

Also very interesting that it for some reason cannot identify Trace, can it really identify Karen better? I mean she has unusual hair and ears, which I assumed would be less human looking than Trace!

I can't wait for the next bit, thank you very much for sharing!! If I find any pictures I think would be interested to test, I'll dm you. Thanks again!
It is indeed completely plausible. A lot of people need object detection for specific tasks so COCO is not particularly useful for them and so they use their own dataset.
But it is a lot of work. You need to draw a box around every object you want to detect in your image set, manually and then set some configuration files. You do that for every image or until you go insane.
Personally I have done it once, for cutlery detection, and took me around 3 hours to label 500 images. The recommended number of examples per class y around a thousand so you can imagine this taking and incredible large amount of time.
But if you manage that i suppose it could recognize almost every character with over 80% accuracy. The problem is that the comic is fairly small, only a thousand images or so. It might not be enough to train the system.

On the other hand the problem with its his hair and long robe. As the system is trained only on real images of people it has a hard time recognizing his head a such, normal people don't have blue hair! And since he uses robes most of the time his hands and legs are obscured and such information would help the system.

I wonder if the blue triangle has anything to do with it

Twokinds Forums

TwoKinds [of] data

TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data

Re: TwoKinds [of] data