TwoKinds [of] data

The comic stuff here.

Moderators: Tom, avwolf

Post Reply
Message
Author
User avatar
Master
Posts: 227
Joined: Sat Jan 27, 2018 9:48 pm
Location: México
Contact:

TwoKinds [of] data

#1 Post by Technic[Bot] » Thu Jul 05, 2018 3:11 am

Probably you just clicked on this thread thinking: "What is.. even...?"
Well let me explain. In my line of work one of the things I do is analyze data. Mostly simply measurements like distances and whatnot. But i also enjoy working with any kind of data, for fun and sometimes profit.
So after being here for a bit over 5 months i decided to do some simple analysis of any TwoKinds related data that I could find. At first i just wanted to satisfy my own curiosity and, as I i said, I find this to be fun. But after a few days of work I decided to share some of what i found with everyone in the forum! Hopefully someone will find it interesting and we'll all learn something new.

A fair warning thought: if not obvious by now this post will be number heavy so if that makes you uncomfortable. Well i did warned you, in any casei plan to stuff all my math heavy stuff in this thread so you can easily ignore it. If that makes you happy.
I will organize this in spoiler tags, as not to make a massive wall of text and to have some sort of order to this little project of mine.

Also, feedback is welcome, if any graph or part of this is not clear or if you know some part is wrong don't hesitate to tell me. Also if you have any questions, i am all ears. Finally if for some reason this is not considered kosher for the forum, just shoot me a message and i will remove it.

And without any further ado lets cut to the meat of the business here, hope you find some of this insightful:

Comic Schedule
Spoiler!
One of the main questions we get in the forum and probably any other TK related place, is when the next page gonna be uploaded. Predicting the future is way out of my pay-grade, thought sometimes i try.
In any case we fortunately have the upload date for every page since the prologue so we know how long Tom took to make every page.
Image
This graph shows how long Tom took to make and upload each consecutive page in days, so he took 2 days between pages 7 and 8, 11 between page 1031 and 1032, and so on and so forth. I am surprised that the longest this has gone without updating is a little over a month. Tom is dedicated, that much is obvious.
In case of some hard numbers the average time between pages is 5.2 days and if you look close you can see two modes: On the early days he posted every two days, but after the spike in the middle of the graph he went to a 7 days schedule.
However we see relative high variance, meaning he sometimes takes more or less in publishing.
This graph if fine and dandy however is a bit cluttered, sorry about that it is actually a lot of information in a small form factor. However we can use the power of a different visualization: a histogram:
Image
Here we can see how many pages were published n days apart. So for example over 200 consecutive pages were posted with only 2 days of difference and only once he took more than a month.
As we can see Tom is quite consistent, publishing mostly every 7 days as he says he will. But it is interesting that he had a much more intense schedule way back before page 400. Honestly his tenacity is admirable.
Twokinds popularity
Spoiler!
One of my favorite sites on all of the Internet is the Google trends site. It lets you track the volume of searches by term on Google over the past 10 years or so. You can compare terms, check by region, dates etcetera.
I encourage you to go there and type your favorite movie/book/comic and see how it has fared over time.
In case of Twokinds we see this:
Image
A warning thought google does not tell you the number of searches explicitly, instead :
Google wrote: Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means there was not enough data for this term.
Meaning that the numbers are normalized with respect the highest value in the graph.
This is all well and good but unless you are a real die-hard fan and memorized when Tom posted each page it does not tell you much. Instead this next graph changes the x-axis for something more informative, the page number.
Image
If you pay attention you see this graph starts at page 28, Google records go back only to 2004 and the comic started on September 2003, so I have missing data there. Also it does not compile data until the end of the month, so July is missing.
As we can see the peak in Google searches is around June 2011 or pages 620's That is around the time when the group left the Basitin isles. And then it took a sharp drop near page 870, a little before the Edinmire incident.
Twokinds fandom
Spoiler!
Precise information about who and when someones visit the comic is something only Tom would now about, he does have Google analytics turned on on the landing siteafter all. But google trends does give us general geographic information about anyone who searches for TwoKinds on its platform. It likes to display that in a (heat)map with is nice and all that but it is hard to read. so i compiled it into a nice bar chart:
Image
However i think is a bit skewed, according to this Philippines is were most of searches originate from, since we have seen a few attempted scamers on the forum recently so i assume this is just bots looking for easy prey on a relatively small anthro comic.
So i dropped the Philippines from the list and renormalized the search:
Image
Here we can see Tom has a lot of searches coming from Canada and the US, nothing unexpected, but also most searches come from Finland and northern Europe.
On a semi related note, if you want to order Tom's merch from anywhere outside the US-Canada it cost a whole lot of money, and considering there are a lot of European fans maybe next time Tom should look for a shipping company that also works outside North America.
In terms of perspectives south of Rio Bravo, something I am an expert, Mexico is tied with Chile. Chile is the highest consumer of any sort of comic on South America whereas
my motherland is the closest to the US. So the comic does not seems to appeal much in this latitudes.
FAQ
Spoiler!
Some questions that you might be typing right now:
  • Just What.. how... why?
    Because I find this FUN!
  • You must be fun at parties
    Actually I am :mrgrin: !
  • Where did you get this information?
    As i said Google Trends, is public you should check it out.
    If you are asking about the dates, i recently re-read the comic simply wrote down the relevant information that is publicly available on the page
  • How accurate is this
    I get my date from the TwoKinds page itself. In respect to Google information i assume is pretty good. They manage to cross reference your search with what you actually click afterwards so they have a different class from the TwoKinds comic and everyone who misspells Two Kinds by Amy Tan
You can motivate people with the things; Money, fear and love.
Link to my ramblings:
Twokinds [of] data

User avatar
Master
Posts: 239
Joined: Tue Aug 08, 2017 9:56 am

Re: TwoKinds [of] data

#2 Post by NuclearBird » Thu Jul 05, 2018 6:49 am

Hmmmm, statistics. Always so nice.
Tbh, the searches coming from Hungary surprise me, all things considered.
If the universe is infinite, does that mean that there is a version of me out there who's thinking the exact same thing?

While we're on the topic of alternate universes, is there one where I'm a lawyer? If yes, then I may be more evil than I thought.

User avatar
Certified Fool
Posts: 1206
Joined: Mon May 26, 2014 3:32 pm
Location: Planet Zambodia

Re: TwoKinds [of] data

#3 Post by Vintage » Mon Jul 09, 2018 5:31 am

This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more :grin:
Image Image
*pssst* Want'a see what happens when I attempt art? (Avatar made by WoofSenpai & NowandLater)

User avatar
Master
Posts: 302
Joined: Sat Dec 26, 2015 11:06 pm

Re: TwoKinds [of] data

#4 Post by Ddraig » Mon Jul 09, 2018 9:28 pm

Vintage wrote:
Mon Jul 09, 2018 5:31 am
This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more :grin:
There does seem to be a bit of an explosion of interest there, doesn't there?
"Light thinks it travels faster than anything, but it's wrong. No matter how fast light travels, it always finds that darkness has gotten there first, and is waiting for it."

User avatar
Certified Fool
Posts: 1206
Joined: Mon May 26, 2014 3:32 pm
Location: Planet Zambodia

Re: TwoKinds [of] data

#5 Post by Vintage » Tue Jul 10, 2018 12:28 am

Ddraig wrote:
Mon Jul 09, 2018 9:28 pm
There does seem to be a bit of an explosion of interest there, doesn't there?
Makes me wonder if this data can be cross-referenced with the data that Google's indexed for the forum here. We still have the mystery of why we had our peak activity in August 2013.
Image Image
*pssst* Want'a see what happens when I attempt art? (Avatar made by WoofSenpai & NowandLater)

User avatar
Master
Posts: 227
Joined: Sat Jan 27, 2018 9:48 pm
Location: México
Contact:

Re: TwoKinds [of] data

#6 Post by Technic[Bot] » Tue Jul 10, 2018 2:30 am

Vintage wrote:
Mon Jul 09, 2018 5:31 am
This kind of data is stuff I absolutely love to see!

Interesting to see that the end of the Basitin Isles arc appears to have incredibly high interest relatively speaking.

I also think it's pretty cool to know that on average, new pages are published around 4 days after the last one. Always feels like a looooot more :grin:
Thanks! it was also fun to put together.
The mean is actually 5.2 days is the green dashed line on top of that mess of a graph.
Vintage wrote:
Tue Jul 10, 2018 12:28 am
Ddraig wrote:
Mon Jul 09, 2018 9:28 pm
There does seem to be a bit of an explosion of interest there, doesn't there?
Makes me wonder if this data can be cross-referenced with the data that Google's indexed for the forum here. We still have the mystery of why we had our peak activity in August 2013.
As much as it is useful and interesting Google trends data is a bit superficial at the end. Really nuanced data, like who, when and why someone queried a term Is indirectly sold by google to third parties via ads. What I am trying to say is that information regarding the specifics of this forum is unlikely to be public.
That being said if i could get some info about the posts frequency in this site cross referencing the information would not be that hard. I could, theoretically, scrape all the necessary information from the forum, doing it by hand would be impossible. But that would like painfully close to a DDoS so i would rather not do it without asking for clearance to the administration.

In any case my opinion about why the Bastitin Isles arc was so popular:
Spoiler!
Most of the time when you write something you either have to choose to either advance the plot or work on character progression. As a general rule of thumb Tom is not very good at moving the plot forward, but is pretty good when we he decides to develop his characters, his wished desires dream and relations.
The Basitin arc was unusual as he managed to do both thing at once and get a pretty good end product. The subplot moved forward, with high-stakes, action and comedy. But we also get to see more into the minds of fan favorite Natani and Keith (OTP) and got character development for pretty much everyone.
Well kind off. The basitin arc is like that side-mission on games you are expected to make half ways into the story but really has not much impact on the main plot.
In any case not that other point were not as good but in my opinion we have not seen such a good blend between plot and character development on other parts of the comic.
You can motivate people with the things; Money, fear and love.
Link to my ramblings:
Twokinds [of] data

User avatar
Master
Posts: 227
Joined: Sat Jan 27, 2018 9:48 pm
Location: México
Contact:

Re: TwoKinds [of] data

#7 Post by Technic[Bot] » Sat Jul 14, 2018 4:09 am

When you look at this comic you can see at least two different kinds of data (pun intended). First we have the meta-data, as publishing date, interest curves, geolocalization etcetera. Most of the previous post was based on that, metadata. But there is more information in the comic, namely the dialogue and the pages themselves. The pages, or images, are highly dimensional data so it is pretty hard to get something out of that. But thankfully we have the full transcript of the comic in plain text, sort of more on that later, so we can crunch some data about the trasncriot or as i like to call it the [n]"Twokinds dialogue/play"[/n] simply because that is how i stored it on my computer as a theather play.
So now this post is gonna deal mainly about the Twokinds transcript, how many lines and words, how many characters and all that good stuff. Again spoilered so the post looks nice and clean:

Something Meta first
Spoiler!
I know i just said that this post is all about the twokinds dialogue, but i actually forgot to talk about why i decided to start this. Besides you know, being pretty damn fun in itself.
Originally just after finishing reading the whole archive I, as many of us before me, was left aching for more and waiting for the update day every week, and as we have seen in previous graphs Toms sometimes takes a bit more than usual. So I wondered:
What is the probability that Tom publishes a new page today? Or In a more general manner:
What is the probability that Tom published a new page on any given weekday?
Since, again the publish date is freely available on the comic site this is not particularly complicated to do. So after some processing we got this:
Image
This graph show how many times the comic page was published on any given weekday. AS we can see it is most likely that Tom publishes on Wednesday but Fridays and Mondays are still good days to hope for a new page!
Now to the topic at hand
Spoiler!
Ok back to todays topic: The dialogue. First and to get it out of the way some numerics!
  • The total number of lines of dialogue in the comic as of page is 10,381 as of page 1033
  • The total number of speaking characters is 141 characters, that is not including Mrs Nibbly.
    Funnily enoguh there is a butterfly with one single dialogue: "!"
  • The comic is composed of 104,688 words, for comparison Harry Potter and the Prisoner of Azkaban is 107,253 words long. It seems a lot although by no mean large, really big books start at 500K words
  • The comic is written in 567,426 characters! more on this later.
Who speaks the most?
Spoiler!
"That was boring" I can almost hear you say, And well it is a bit but that is surface level information. Something I find more interesting is: Which characters do most of the talking? Since Tom has over 15 "main" characters and a lot more "secondary" and background characters, so who is doing all the talking?
Spoilers: Not many, considering characters only mentioned in the Characters page We can see how many "speech bubbles" or lines each one has had:
Image
But lest be honest here there are at least twice as many characters that are of some significance in the story, so how about them? Well considering characters who have had mare than 75 lines of dialogue in the comic we have this nice graph.
Image
As we can see the characters that speak the most are Trace, Flora, Keith and Natani. By a really large margin. Surprisingly enough in 5th place comes our "favorite pervert" Eric with a bit less than 400 lines, follower closely with everyone's favorite shape-shifter Raine.
A better visualization
Spoiler!
"So on proportion with the whole corpus of the comic how does this stack up?" I imagine you say, well there is a different graph style for that, the Pie Chart!
In this chart we can see all character under 1% percent of the comic lines got grouped into the others category:
Image
But just how much of the plot is driven by our 4 main heroes? Almost half of it:
Image
In this chart any character with less than 5% of the total number of lines got grouped in the Others category. Numerically 50.11 % of all lines belong to either Flora, Natani, Trace or Keith.
Honestly i am surprised by Natani, he has been in the comic for significantly less time than the other 3, not only that but in the story canon she can only speak Keidran, despite this a little more than 10% of the comic is Natani. This dude has a lot to say!
Random Curio
Spoiler!
Just some trivia i found while doing this: Minor fun details about the Twokinds Transcript
1.- Not quite plain text, you see for ellipsis (...) and apostrophes (') were not simply ASCII characters but rather utf-8 or unicode emojis if you will, that makes processing the text a bit more
complicated.
2.- Some names are changed: Despite rarely reading his name on comic, being mostly refered to by his last name: Alaric dialogues are always marked as Nikolai.
2.1.- On similar fashion Kathrin name was written as Kat before the Basitin Isles arc, changing to Kathrin afterwards.
2.2.- Young Natani has been both refered as "Young Natani" and as "Youngtani" the popular fanon abbrerviation for her.
And that is all for the moment, these things do take more time to make than I expected. There are a few more things i was planing to do with the data i have, you can expect it sometime on the future.
Also if you have any query about the data, something you want to know about the comic or so, feel free to leave a suggestion, one problem with datasets is that sometimes you have no idea what to do with them.

Also sorry for the double post but enough time has passed right?...
You can motivate people with the things; Money, fear and love.
Link to my ramblings:
Twokinds [of] data

User avatar
Grand Templar
Posts: 1055
Joined: Mon Aug 22, 2011 5:46 am
Location: Sunny Arizona

Re: TwoKinds [of] data

#8 Post by Dadrobit » Sat Jul 14, 2018 5:59 am

Keep it up,this stuff is fun! Did a bit of posting history sleuthing once or twice myself, so it's cool to see all of this laid out like this. :mrgrin:
Image

User avatar
Master
Posts: 227
Joined: Sat Jan 27, 2018 9:48 pm
Location: México
Contact:

Re: TwoKinds [of] data

#9 Post by Technic[Bot] » Thu Jul 19, 2018 1:04 am

Now to something a bit different. This will be a bit of a shorter post. More on that at the end.
In the las post we dealt with the Twokinds Transcript but we looked mostly at the question: Who said what? Or as it turn out How much each character spoke? This is interesting and all but it dealt mostly with information we can take from the structure of the text in this case: a theater play.
This is interesting and gave us a lot of insights but does not directly deals with the content of the dialogue. Wich i find a bit more interesting. So what can we do with that?
A kind cloud
Spoiler!
A word cloud:
In this case the word size is directly proportional to the frequency of the term in the corpus. That is the more times a word appears on the transcript the larger it appears in the word cloud.
This is not a very clean or "formal" way to present data as all the information is implicit on the word size, but oh boy is it information dense and ridiculous intuitive.
Image
Here we can see the most important words in the whole comic:
  • Characters: Keith, Natani and of course Trace and Flora
  • The races: Keidran, Basitins and humans
  • Important term: Know, think and well, this last case and interjection
It is interesting, this a comic heavily focused in characters and their relations, kinda was hoping for the word love to appear, so the appearance of names is not unexpected.
I think the terms like "know" and "think" relate to the amnesiac protagonist and how information about basic world-building is so scarce in the comic.
But by all means look at the image and make your own conclusions
That is all she wrote
Spoiler!
Well at least for now. This is all i had planned to do with the data i have available for now. not that i do not have enough data but i think this is enough for the moment. Call it a season finale, or something. Hopefully i will get to make a "season two" some time in the future, once i get more ideas what to do with all the data, or were I can get more tidbits of information.
Again any questions comments or anything is welcome. :mrgreen:
Credits
Spoiler!
A few shout out to all the people who made this little project of mine possible:
  • All those who spend their time to transcribe the comic
  • Mr Tom for keeping all this data available and free, and for writing the comic of course
  • The teams that develop and maintain python, numpy, pandas, matplotlib and nlltk
[/spoilers]
You can motivate people with the things; Money, fear and love.
Link to my ramblings:
Twokinds [of] data

Post Reply

Who is online

Users browsing this forum: Him and 6 guests