nnozomi: (Default)
nnozomi ([personal profile] nnozomi) wrote in [community profile] sid_guardian2020-03-22 04:29 pm
Entry tags:

Guardian-based Chinese study (???)

Edited 3/23 to note: thank you very much, people who commented with helpful information/offers/encouragement of all kinds! I do want to make this a thing, as a way to get through the way things are right now and because it would be FUN. Give me a little while to think through some logistics and I will post again shortly with some more specific ideas (if that's okay). Please continue to comment, anybody who has an opinion.

With a good number of Chinese learners around here, trobadora and I were saying it would be fun to have a Guardian-based Chinese study app, and wondering if the fandom could make one ourselves. Because I very badly need distraction right now, I got to working on it, and came to the conclusions that a) it should be possible, and b) it would require a lot of time, which I might have, and a good deal of technical expertise which I don’t.

Is this something people would be interested in using, should it come to exist? If so, would anyone also be interested in working on it, as a long-term project in this time of all times?


1. A full Chinese transcript, not just the Shen Wei/Zhao Yunlan cut available. (Am I right in thinking that because the Chinese subtitles are hardsubs, it wouldn’t be possible to use undeadrobins’ magic thing to create a transcript from the subs? Is it possible in some other way?)
2. Somebody respectably fluent in Chinese to check the content
3. Somebody able to do the actual programming/coding part (on the scale of technical ability I fall a lot closer to Shen Wei than Lin Jing, I don’t even know what exactly would be required here)
4. A lot of slow fiddly work with corpora and transcripts and language learning best practices (I know how to do this part, at least theoretically)
5. As optional daydreamy extras that people interested might take on, a lot more fiddly work with things like sound files for listening practice, images (for rewards?), base languages other than English (why shouldn’t there be a Chinese-Finnish or Chinese-bhs Malay app, say?) and so on.
6. Assurance that the above would all actually be legal? My assumption is that, as with fic/vids/etc., using Guardian-related material wouldn’t be a problem as long as it fell under the heading of labor-of-love rather than monetized, but I am not sure.

To test the idea out, I made a quick sample based on Episode 1 from the SW/ZYL transcript, available here. (Let me know if the link doesn’t work, and please note there are THREE sheets.) This is a very rough sketch indeed: it’s just an Excel file with a couple of formulas plopped into it, the learning content is a haphazard selection of prepositions, conjunctions and similar, the method is ultra-straightforward cloze exercises, and it all relies on my tenuous Chinese, but it ought to show sort of what I have in mind.

Honestly I want to do this partly so we could call it Dixing Power of Learning, The App…



・Zhao Yunlan ends practically all his sentences in 啊. Big headache for the translator trying to convey that particular speech rhythm…
・Shen Wei has variable pronouns! As the Envoy, he uses (I think) 本使 instead of 我. (Also, when he calls Zhao Yunlan on Li Qian’s phone, he just says 是我, it’s me, expecting Zhao Yunlan to know who it is, which I find kind of hot.)
・What exactly does 同学 convey? Shen Wei says it to Guo Changcheng, who also, many episodes later, says it to Li Qian. I know it’s literally “classmate,” but are they just using it as “person associated with academic endeavor, not a professor,” or is it supposed to imply that they’re all Dragon City University graduates alike, or what?
undeadrobins: (Guardian: Chu Shuzhi)

[personal profile] undeadrobins 2020-03-22 08:26 am (UTC)(link)
As someone who is in the very early stages of learning Chinese... Yes! Please! I want this!

I'm no help with any of the technical stuff, but will help out in any way I can!
naye: A cartoon of a woman with red hair and glasses in front of a progressive pride flag. (guardian - weilan)

[personal profile] naye 2020-03-22 11:48 am (UTC)(link)
Am I right in thinking that because the Chinese subtitles are hardsubs, it wouldn’t be possible to use undeadrobins’ magic thing to create a transcript from the subs?
Sadly yes! Unless there is a DVD/Blu-ray release that includes subs so you can switch from simplified to traditional? But I have absolutely no idea if such a release exists, or how one would get it if so.

Shen Wei has variable pronouns! As the Envoy, he uses (I think) 本使 instead of 我.
I hadn't realized that! Very cool detail. (I'm so curious what the Japanese audio drama and DVD subs will do with pronouns.)

As for the language stuff, it would be super cool to have, but I personally wouldn't find much value in having more than the Shen Wei/Zhao Yunlan part of the dialogue. There's just so much of it, for my memory to hold on to things I think it'd have to be something said by either of them (or maybe Ye Zun or the SID) rather than a random extra or a background exchange.

Apologies for not having more to add - I lack all of the skills required, but wish you the best of luck!
naye: shen wei & <hao yunlan from guardian kind of embracing (weilan)

[personal profile] naye 2020-03-24 08:13 pm (UTC)(link)
Among the things I love about this fandom is the number of people who think pronouns are exciting, because I sure do.
Pronouns are fascinating! Like - Shen Wei's going to be changing pronouns, right? Not just between Ghost Slayer and Professor Shen, but there's also the Little Ghost King...! (Does he use "Wei" like a child would?) And of course Kunlun could either use something majestic and old-fashioned - or be very down-to-earth? Something like a boku or an oira...I haven't read the novel in so long I didn't get much of his dialogue, and then I don't have a sense of the formality level in Chinese so I'm
curious how that will translate!

Honestly, the only thing I can say for absolute sure is that Zhao Yunlan is 100% an ore and occasional ore-sama.

Yeah, I was mainly thinking that it's not entirely necessary to transcribe the dialogue between people like the mirror girl and her fiance? For me the things that stick are ones with high emotional impact - so tying into the characters' lives and the overall plot rather than the nitty gritty. (So I'd be less interested in Lin Jing's long exposition talk about the online horror novelist and more interested in his "If you're getting this message I'm dead" speech!)
trobadora: (Black-Cloaked Envoy)

[personal profile] trobadora 2020-03-22 01:23 pm (UTC)(link)
I know I've said this before, but I would love this SO much. This looks like it would be a lot of fun!

Am I right in thinking that because the Chinese subtitles are hardsubs, it wouldn’t be possible to use undeadrobins’ magic thing to create a transcript from the subs?

Yes, unfortunately. A quick google suggests there are ways to extract hardsubs via OCR, but it seems like that would still be terribly work-intensive, and I have no idea if it would even work with Chinese characters. Really not worth the effort, I think.

Shen Wei has variable pronouns! As the Envoy, he uses (I think) 本使 instead of 我.

I noticed him using that a few times! So cool.

(Also, when he calls Zhao Yunlan on Li Qian’s phone, he just says 是我, it’s me, expecting Zhao Yunlan to know who it is, which I find kind of hot.)

Right??? :D

Honestly I want to do this partly so we could call it Dixing Power of Learning, The App…

Yes! Perfect! :D
solo: First Weilan collab (GD Collab)

[personal profile] solo 2020-03-22 02:03 pm (UTC)(link)
I think it's a great idea but beyond being happy to help with 'fiddly work', I'm afraid I lack the skills required.
Edited (me no spel gud) 2020-03-22 14:03 (UTC)

[personal profile] circumference_pie 2020-03-22 06:33 pm (UTC)(link)
I get the feeling you're envisioning a standalone mobile app. Can I ask why?

Why can't it be, say, a Pleco deck, or a series of Pleco decks? (And Anki supports Cloze-type cards, if Anki is your cup of tea).

Even if the goal was something an existing app can't support, like terms or constructs being hyperlinked back to their occurrences in the transcripts or something, you can implement this with a (pretty simple, I think) website. Without having to worry about things like choosing whether to support iOS or Android (my understanding is that supporting both is pretty labor-intensive) or app store licensing.

Three random suggestions:
First, with optical character recognition tools like Copyfish, even Chinese-language newbies can contribute to the transcription effort! [personal profile] fangirlishness is an expert on this.

Second, Guardian is now on Viki for people in the US and for those of us who can VPN into the US (and maybe some other countries, I didn't check). Viki's Learn Mode is a fantastic tool (please google it, I can't link). It basically makes the native subs interactive, as if you have Zhongwen for subs. The catch is that someone actually has to put the data there -- that is, you still have to transcribe it. However, Viki has tools to make transcribing and editing easier for a team of people. But I get that we have a very international community, and regional licensing restrictions might make a group effort for Learn Mode difficult. And there's the danger that the license might get pulled...and then the data would be inaccessible unless someone personally backed it up.

Third, if you're looking to get started quickly, without the overhead of compiling transcripts, I can suggest using the novel, which is already online in text format. It's then a piece of cake to pull out the most frequently used characters, as a whole or by chapters. It's a bit more work to pull out most frequently used words, but it's possible and I've done it. However, I 100% support crowdsourcing the TV transcripts anyway, and it might get done before the technical parts, depending on how ambitious those are. :)
Edited 2020-03-22 18:49 (UTC)

[personal profile] circumference_pie 2020-03-23 04:11 pm (UTC)(link)
I have honestly never even heard of Pleco decks, for instance.
Oh! In that case, I very much recommend this post by [community profile] disgracetoscholars!

Pleco is a Chinese-English dictionary app, but it has a fantastic flashcard add-on (not free, unfortunately, but 100000% worth the 5 or 10 USD or whatever it was). The great thing about Pleco's flashcards is that they're automatically created from dictionary entries so you just hit the + button on the dictionary entry, and then you have a flashcard.

You can also create decks just by listing the words that you want in a text file and using the Import option, which is how I'd make a Guardian-themed deck. I think there might be fancier config options by putting more stuff in the text file, but I've never explored that. Oh...a quick poke at Pleco suggests you can also import XML files.

I will come back and query you about this at some point, if that's okay.
Sure, if I can help let me know.

enviropony: (ye zun)

[personal profile] enviropony 2020-03-22 07:27 pm (UTC)(link)
Ooh, I really like circumference_pie's idea of a Pleco deck!

And I second the idea of a mobile-friendly website. That would be both more accessible and easier to code than a standalone app. Getting something to work on both Android and iOS is time-consuming even for professional developers. I know of some great apps that aren't cross-platform due to this issue. And I feel like there should be existing templates for WordPress, Joomla or some such that can be adapted as a learning app, if they aren't purpose-made for it already.

Of course a dedicated website will have hosting fees, so that's a consideration, too. And the templates usually aren't free, but they're rarely outrageously expensive.

I would be interested in being part of this team, if it takes off, since it'll keep me motivated and active in learning Chinese. :-) I'm very early in the learning stage of the language but I do have some web hosting, editing and QC experience.

ETA: If this gets off the ground and isn't a massive un-fun headache, expanding it to MDZS and other fandoms would be neat.
Edited 2020-03-22 19:31 (UTC)
enviropony: (ye zun)

[personal profile] enviropony 2020-03-25 04:26 am (UTC)(link)
So it looks like there a handful of people who are up for poking at this further. Yay! Were you thinking of making a DW community or some other place for us to plan and throw around ideas? Definitely keep me in the loop, this is super cool! (Sorry if I'm rushing you, but I may not be able to check in for the rest of the week so just wanted to confirm that I was interested).

Per tinny's suggestion, I'm going to explore how to use anki. So far this sounds like the best bet, platform- and cost-wise.

(Anonymous) 2020-03-23 11:31 am (UTC)(link)
Hey!! it's Fan [personal profile] fandoestrans on twitter, i'm just here to say that i would 100000% love to help out with the Chinese language aspect. I'm absolutely more than hopeless with technology or coding etc., but i CAN help with teaching the language, and if you look at my recent tweets, I've been contemplating how to teach Chinese online VERY FREQUENTLY the past few days! I've been thinking about how to teach mandarin to this fandom, esp using Guardian as a mechanism and tool, so i would love to help!

just answering some of your questions and replying to some of the remarks above so those learning chinese might benefit from this extra knowledge:

Zhao Yunlan ends practically all his sentences in 啊. Big headache for the translator trying to convey that particular speech rhythm…
SOOO TRUE!! It annoyed me while translating the guardian novel since novel!yunlan does it too. it's basically just a way of conveying casual-ness. 啊 can be tagged onto the end of a sentence to either mean that they are asking a question, or to be like "ah," in a very casual, friendly and informal way.

Shen Wei has variable pronouns! As the Envoy, he uses (I think) 本使 instead of 我
OKAY SO. in chinese, there's a DIFFERENT way of using 3rd person pronouns, and that's when it's done referring to YOURSELF. usually this pronoun that you use to refer to yourself is derived from your title, role, or position.

EXAMPLES:
a mother might say: "妈妈给你买." = "mum will buy it for you"

an emperor refers to himself as: 朕 (zhen). it literally means 'my highness'
e.g. "朕不想听" = "I (zhen) don't want to hear it."

so in the case of shen wei, "本使" is used to refer to himself because that's derived from his own title.
本 (ben) = this
使 (shi) = envoy, as in hei pao shi.
so he basically says "this envoy here," when referring to himself.

maybe i will make a google doc explaining this in further detail actually, because there's just SO MUCH to talk about for this phenomenon, so look out for that!!!

What exactly does 同学 convey? Shen Wei says it to Guo Changcheng, who also, many episodes later, says it to Li Qian
同学 translates to: classmate, student
when used in the context of "我的同学" it means "my classmate" but when it's used to call someone, it means "student" as a title(?) kinda?

e.g. 嘿,同学-- = hey, student--
这位同学... = this student...

it's used to refer to anyone who studies at an institute, or a graduate from the place. not used to refer to staff, teachers, or any other person aside from past and present students.

hope that helps everyone!
trobadora: (Black-Cloaked Envoy)

[personal profile] trobadora 2020-03-23 07:30 pm (UTC)(link)
Ooooh, thank you so much for all the info, and the offer to help! I hope something comes of this project because it really would be awesome. ♥♥♥
fandoestrans: (Default)

[personal profile] fandoestrans 2020-03-24 10:25 am (UTC)(link)
HELLO I HAVE SINCE BEEN PEER PRESSURED (in the bestest way ofc ;)) INTO MAKING A DW ACCOUNT

you're very welcome!!! I would be super super happy to help, it's the least I can do!!! And yeah I would really love to see where this project goes too, I think heaps of people would benefit hugely from it, and it would make learning super fun for everyone!
trobadora: (Black-Cloaked Envoy)

[personal profile] trobadora 2020-03-24 11:30 am (UTC)(link)
Hee! Welcome to DW! :D

It would be SO helpful. I've been saying I'm Not Learning Chinese for over a year now but this would do so much for my motivation and ability/willingness to make time for learning even when I'm busy. *g*
fandoestrans: (Default)

[personal profile] fandoestrans 2020-03-24 11:36 am (UTC)(link)
ahhhhhhh thank you!!!! i was...very strongly encouraged to join DW!

I agree, I reckon it would help boost other people's motivation as well. I've always been doing little lessons (at first all on Twitter, then I've been writing notes up as Google docs and sharing them, and NOW DW!!) but an app would make the process so much easier, flowing and handle-able for so many people~
fandoestrans: (Default)

[personal profile] fandoestrans 2020-03-28 05:38 am (UTC)(link)
THANK YOU!!!

Your nicknames-and-more posts are wonderful, and are giving me a lot of ideas, not to mention teaching me things I didn't know

omg i'm so glad to hear that!! i'm happy that they're recieved well and people have generally told me that they're helpful, which is all i wanted to achieve!!

let's see what we can make happen :)

yay i'm keen!

(Anonymous) 2020-03-23 12:02 pm (UTC)(link)
Heyy it's Fan, and i just wanted to mention that if you wanted a hand with that incredibly detailed and thorough spreadsheet, please contact me or let me know! i can help with editing things if you so desire :')

may I suggest a little tweak?
点 is more like 'bit'.
一点 = 'a bit'/'one bit'/'one part'
the 儿 tagging on the end is just a verbal quirk (and a thing used in beijing dialect) and you can basically disregard that
so really, "more~" would actually be "~多一点儿" and not JUST 点儿

thank you for making such a comprehensive file, it looks SO good so far!
954bunzat: (Default)

[personal profile] 954bunzat 2020-03-24 01:50 am (UTC)(link)
(Hello I'm Kei from [personal profile] 954bunzat on twitter!)

This sounds like a wonderful idea, especially since fans of Guardian who want to learn Chinese will be able to learn much easier with more motivation. I would really love to contribute to this. I'm not exactly fluent in Chinese but I do have a lot of years of experience learning it as I was raised and live in a Chinese city-- though my skill has dwindled often ahhhh. However, if it comes to explaining grammar points and their structures or just simple-intermediate phrases or words, I can definitely help with that!

Also, I wanted to ask what structure you're aiming for this app? As in, will you be sourcing the subtitles from all episodes and then categorizing which fit into (eg.) "Level 1" (so like Lv 1 to Lv 2 to Lv 3 and so on...) of Chinese or will the app be structured in a way that the user will be learning by episode? Moreover, will it be more of an app for those who already have a head start/foundation on Chinese or could it also be for those learning from the very start?

I feel like it'd be better to learn by episode as that would aid the app user's memory more since they'll be attaching certain phrases to specific situations in guardian, since the best way to learn a language is to fully understand the meaning/context behind certain words/patterns!
(deleted comment)
tinny: Shen Wei (Guardian) touching his heart with the text "my heart going boom boom boom" (guardian_shenwei heart going boom)

[personal profile] tinny 2020-04-27 12:37 pm (UTC)(link)
I had to delete the above comment, but I'm readding the important info back here:

We could create card decks for anki, for example. (I prefer anki, because pleco is only available on mobile, and anki is mobile/desktop/web.)

You can make those decks in a way that they contain instructions/comments (i.e. grammar hints), and include sound bytes (if we find someone who wants to either snip the drama or speak the words themselves).

The examples you made are perfect for that. We should just agree on a card layout, and then we can start.

[personal profile] circumference_pie 2020-03-29 05:44 am (UTC)(link)
About OCR -- I don't have anything useful to say, but I'm curious: all the existing open-source libraries can't do anything then? What about the Google Cloud API? What about a 2-step auto-OCR, then human correction, process?

FWIW I'm not seriously suggesting investing in OCR as a solution, just curious why you think it's absolutely a lost cause. Another person has been doing OCR+correction (using the Copyfish browser extension) to compile soft subs for another c-drama's Learn Mode, and it seems to do fine, though it can't be used in bulk. But that makes me think the underlying libraries are okay.

I can't promise to explore and support this avenue on any kind of timeline, but I am vaguely interested now. Maybe at some point in the ambiguous future, I'll try some different methods and report back.

P.S. Okay, I looked into the software backing Copyfish, and it's a tiered service. You could spend about 200 USD for 40 episodes if you didn't want to think about efficiency, but could probably lessen that...It might be possible to do an episode a day with the free tier. If the quality is okay.
Edited 2020-03-29 05:47 (UTC)
tinny: Bright yellow Zhu Yilong looking into the camera seductively, Chinese text: "kiss" (guardian_zhuyilong kiss)

[personal profile] tinny 2020-03-29 07:37 am (UTC)(link)
The usual OCR programs that you can let loose on local files only work with small alphabets. What they do is they parse each "frame" and each character in it, and then you have to correct that until it has learned every character. This is doable for alphabets with, say 80 characters (uppercase + lowercase + odd-looking combinations with commas etc.). I tried it once with Chinese and it is completely hopeless, because you have to correct each character at least once, and I despaired after a minute of screentime or so.

Copyfish is... the best I've found. It is not ideal, but it is pretty good. I actually have no idea how long it would take to do an entire episode with the free version by hand, but I think 8h is too low. It recognizes between 80% and 100% of any given line of subs (let's say 8 characters on average), and the rest must be corrected by hand, and there is no learning/teaching.

A native speaker typing them all up would very likely be faster.

I can't promise to explore and support this avenue on any kind of timeline, but I am vaguely interested now. Maybe at some point in the ambiguous future, I'll try some different methods and report back.

Yeah, do that! Better ways than what we have now are always appreciated! Thank you!