Last post

that first paragraph says everything.

tbh I’ve shared in many places, and people weren’t that jerk about it, in the Latin community (argentina for example) they were ok with it.

this forum was literally dead for Japanese content for a long time, and i was posting japanese dictionaries that i didn’t even had the obligation of posting, because i could literally keep them in private, but instead optioned to post them because i somehow wanted to contribute;

and this is what i get ? lmao
it’s not hard to understand , but @hua acts like a child or smth, i was expecting a more serious resolve from a forum admin.

go ahead and do whatever the hell you want with my account. I don’t need this forum, i was posting because there are a few people who want dictionaries and not “bytes”

Only anonymization is allowed. Maybe I should not close this topic. I want to watch your show.

this is called bias.
unilateral moderation , go ahead if you got the guts and delete my account or something like that.
you don’t have the guts to do it?

He’s Portuguese if memory serves me.

1 个赞

怪谁,怪中国人不说敬语

1 个赞

日本人说敬语。

1 个赞

have you said thank you once?

3 个赞

lmao :rofl:

写了个粗糙的 python 脚本对解包后的 txt 处理了一下:


with open('旺文社漢字典 第四版.mdx.txt', 'r', encoding='utf-8') as f:
    d = {}
    a = []
    i = 0
    for _ in f:
        line = _.strip()
        if not line.startswith('<'):
            hw = line
        elif line.startswith('<html>'):
            if line not in d:
                d[line] = hw
            elif hw != d[line]:
                at = f'{hw}\n@@@LINK={d[line]}\n</>\n'
                if at not in a:
                    a.append(at)
            i += 1
            if i % 3000 == 0:
                print(i, end='\r')
with open('output.txt', 'w', encoding='utf-8') as f:
    f.writelines(f'{v}\n{k}\n</>\n' for k, v in d.items())
    print(len(d), 'entries ...')
    f.writelines(a)
    print(len(a), '@@@LINKs ...')


得到 52656 个条目和 10444 个跳转链接(@@@LINK

原 txt 文件 340MB,处理后 ➔ 85MB mdx.txt.zip (8.0 MB)
原 mdx 文件 23.2MB,处理后 ➔ 10.1MB output.mdx (10.1 MB)

请帮我看看这个脚本的处理方式是否有问题?

1 个赞

give feedback = don’t like someone’s work and be jerk about it :star_struck:

没啥问题。

1 个赞

As you wish.

真是个活宝。发布词典文件,内容有重复,别人指出问题给出反馈,发布者应该感谢才对,因为这有利于改进品质,利益每一个人。但楼主的行为方式却好像这种有益的建议是冒犯一样,而且自我辩护的办法是文过饰非,答非所问,把论坛的人当傻子一样。这样的戏码在另一个我互动过的帖子里就已经上演过了。

论坛与人交流,别人和善友好地指出明显的错误和问题,在任何文化里都不是冒犯、无礼和不尊重。用汉语的人虽然很多都被围在墙内,但并不是所有人都是土鳖,国际性的大社区,比如twitter、reddit、facebook等不少人也是日常看的,并非不明白交流时的礼仪。

不是论坛的问题,也不是文化差异的问题,只是他个人自己的问题罢了,既然这样,就不要在公共论坛和人互动了。

1 个赞

其实就是文化差异,也有社交礼仪的问题。如果是我来描述这个问题,我会这样回复:

Hi OP,

Really appreciate you sharing this!

While looking through the .mdx file, I noticed what appears to be some duplicate content in a few places. Is this an issue with the raw data, or did something go wrong during processing?

Thanks again for your work!

前后两段感谢是开源社区向作者提问的时候用的常见模板,在 Github & Reddit 很多人应该都见过,我刚开始也无法理解他们的回复为什么要这么谦卑,后来才明白这不仅是社交礼仪的原因,有些作者是习惯站在施舍者的角度看问题的,你说的别人指出问题,作者应该感谢,这种逻辑对他们是不成立的,反而是这种行为和他们的预期不符,你必须老老实实的站在被施舍者的角度上,才有可能和他们正常沟通。

不得不说github上issue很多这么回的。。嘿你说的还真有道理。


我都关浏览器了,还是想来再说一句,确实英语社区很喜欢在前面在后面加一句 Thanks for your work! 吼。

1 个赞

你说的这种情况我也注意到了,英文社区有些人回复时,开头会说一段客气的感谢的话,然后婉转指出存在的问题。当然,这样会使整个交流更优雅圆润,原作者也有更多回旋的余地,但它不是必须的。那些真心愿意利益他人、贡献社区的人,在与人讨论时围绕的是核心问题、issue,而不是细枝末节的表达方式和所谓的礼仪,尤其当别人提出的问题和疑惑正中靶心,相当关键和致命的时候。

但它不是重点,楼主的行为最令人困惑的是其掩耳盗铃、鸵鸟式的自我回护方式:

string comparison != looking by eye

that’s the wrong page tho, smart ass.
the code will count the cover of the pdf as page 1.
Oubunsha oukogo 旺文社 古語辞典 第十版

这么明显的错误,论坛人人都能看出来,但他把大家都当成傻子;非但如此,不承认错误,还对指出问题的人发起(完全不成立的)一连串指责和攻击,完全不可理喻,在什么文化里我想都难以接受。

1 个赞

哈哈,很多人喜欢复述“外国人说话直接”“中国人拐弯抹脚,要你听弦外之音”,其实根本不是这回事,现实情况比一句概括的话复杂多了。

1 个赞