Wikipedia talk:WikiProject Punctuation
From Wikipedia, the free encyclopedia
- Old talk is archived at subpages: Round 1
[edit] Round 2
There is a new dump out, so I am going to try to get Round 2 going. The new dump is in XML format, however. Does anybody know how I can import the XML dump of just the cur table into mysql without installing mediawiki? — brighterorange (talk) 02:14, 7 September 2005 (UTC)
-
- Will Navicat work? --Viriditas | Talk 07:52, 16 September 2005 (UTC)
- I don't know, but I already got it to work (after a long ordeal) using the mediawiki tool called "mwdumper". Thanks for the suggestion, though! — brighterorange (talk) 14:12, 16 September 2005 (UTC)
- Will Navicat work? --Viriditas | Talk 07:52, 16 September 2005 (UTC)
- (just a comment)... 16th! one day over deadline of 15th! just kidding. -- WB 07:36, 16 September 2005 (UTC)
-
- I know.. ;) it took like ten times longer than I thought it would to import the database. It's up now, though. — brighterorange (talk) 14:12, 16 September 2005 (UTC)
- Seems like most of the entries found are now spotted because of lack of "list-izing," not because it lacked a period. lol. My work is mosltly "list-izing" than "adding periods now" ha. -- WB 01:09, 18 September 2005 (UTC)
- I think this is because many of the articles with low IDs were inserted en masse by computer, and contain lots of badly-formatted lists of facts. That phenomenon seems to die out with later dumps, although that may just be wishful thinking. Anyway, wikifying lists is still a valuable cleanup task, and will prevent them from showing up in the future. Thanks! — brighterorange (talk) 13:11, 18 September 2005 (UTC)
- You're welcome. I'm glad I'm doing this. There seems to be very badly done articles with User comments... Some are just sad to edit, but I can't really do anything about "List of rulers in some unknown country I never heard of"... There should be something like "WikiProject: Get rid of old pages that doesn't make any sense" lol -- WB 05:14, 20 September 2005 (UTC)
- I think this is also because when we went through all the articles the first time, many of the participants just left out the ones that were not "period++" Hopefully, when we go through this time, we correct them so they don't get caught in the next scan. -- WB 03:43, 21 September 2005 (UTC)
- I think this is because many of the articles with low IDs were inserted en masse by computer, and contain lots of badly-formatted lists of facts. That phenomenon seems to die out with later dumps, although that may just be wishful thinking. Anyway, wikifying lists is still a valuable cleanup task, and will prevent them from showing up in the future. Thanks! — brighterorange (talk) 13:11, 18 September 2005 (UTC)
[edit] UTF-8 encoding problems
For some reason, Periodbot does not insert a DOCTYPE declaration, so some characters are garbled up. It should check the wiki to see which encoding it is using and then use a DOCTYPE corresponding to that encoding. (I think all wikis are now UTF-8 encoded). Andrew pmk | Talk 21:12, 26 September 2005 (UTC)
- Will that fix it? I noticed this problem too (although it seems to work okay for links at least on Windows), and figured it was a result of the new xml dump format, which the wikitech folks claim "may have character set issues." I'll try the doctype thing; I can insert them manually in the dumps if it helps. — brighterorange (talk) 21:26, 26 September 2005 (UTC)
-
- Thanks, adding the "meta http-equiv" tag does seem to do it. It'll definitely be in there for the next run, and I'll see if I can batch-insert it in the exiting dumps, since it's kind of a serious issue on linux and maybe other platforms. — brighterorange (talk) 21:31, 26 September 2005 (UTC)
-
- I added the tag to all dumps. Let me know if you still have charset problems. — brighterorange (talk) 21:24, 27 September 2005 (UTC)
[edit] Latest dumps
Just moaning--the dump file items that I'm working through are almost all in need of more significant work than just inserting periods, so it's going to take a while to work through one set (for me anyway). Copyvio tags needed, probable insignificant article tags needed, cleanup tags needed, titles misspelled or mislabeled, stubs without appropriate stub labels... I'm hardly inserting any periods at all! Argh. Elf | Talk 00:30, 1 October 2005 (UTC)
- Yes, I can't guarantee that all the articles you see will be good except for missing periods; just that they will have missing periods as the least of their problems. ;) But fixing such errorful articles is at least as valuable as fixing punctuation, even if you are just inserting {{cleanup-date|October 2005}} and letting others deal with it... — brighterorange (talk) 02:24, 1 October 2005 (UTC)
Well, yup, that's what I've been doing in most cases. ...Oh, yeah, according to WP it's October now! (writing at 9:20 pm Sept 30...) Elf | Talk 04:25, 1 October 2005 (UTC)
- Don't forget to add speedy delete candidates to that list... lol! --Celestianpower hablamé 14:11, 1 October 2005 (UTC)
[edit] new idea?
Is it possible to detect unnecessary spaces? Every single day, I find pages that have two or three spaces on the top or some other space because they thought it would be necessary. Nothing urgent, but annoying (at least to me) For example:
Wikipedia is an encyclopedia.
|
instead of:
Wikipedia is an encyclopedia.
It is an enclyclopedia. |
I can explain a bit more if I'm vague on this one. -- WB 03:42, 10 October 2005 (UTC)
- Yes, this would actually be a lot easier than detecting missing periods. I see this a lot, too. Do you think it's worth searching for? I expect we would get lots of hits. — brighterorange (talk) 14:15, 10 October 2005 (UTC)
- I think we can ask a bot to do this task though. There aren't many reasons why there should be two or more spaces... It does improve Wikipedia though. Think of a book that has random spaces between paragraphs. We wouldn't want that. Anyway, my thoughts. -- WB 17:24, 10 October 2005 (UTC)
[edit] Observation
I have found, and I don't know if anyone else has, that the periodbot frequently picks up on alternate spellings, pronunciations, and synonyms as incomplete sentences. Almost 20% of my previous data file consisted of them.--Adun
- Often, many of the same type of false positive are clustered together, perhaps because all of those articles were added en masse and so they are near each other in the database. Can you elaborate on the pattern you saw? It may be pretty easy to filter out. I don't think I've ever seen it before. Brighterorange 15:40, 16 December 2005 (UTC)
-
- Sure thing. The part it would have in the dump would simply be the part (that I assume was at the top) Where it would say "Alternate: Moor, Mour" (I'm, just making this up). I think the PB picked it up because it didn't have a period at the end, which it doesn't have to.--Adun
[edit] Advice on "fixing" lists
Obviously we've all had lists come up in our dumps, but there doesn't seem to be anything on the project page describing how to deal with them (unless I need glasses!). Personally I just bullet point them (with *), mainly for style reasons - are there any other ways of dealing with lists that are formatted with overuse of the enter button? --Lox (t,c) 20:26, 12 January 2006 (UTC)
- Hey, I've noticed that sometimes album tracklistings are being changed and having a period added to the end of the list (eg Auf der Maur). I personally feel that this shouldn't happen - they're lists of titles that are named and punctuated as artist intent - what do you guys think? Satan's Rubber Duck 08:12, 18 March 2006 (UTC)
-
- When I edited that, I had thought that the writer accidentally left out the period. I suppose I'm wrong in assuming that? NapoleonB 01:45, 30 March 2006 (UTC)
-
-
- Nothing that can't be fixed :) It's probably not wrong grammatically, but tracklists seem to have their own styles. Satan's Rubber Duck 11:27, 30 March 2006 (UTC)
-
-
-
- Good point. I'll be more careful editing track names in the future. :D NapoleonB 16:40, 30 March 2006 (UTC)
-
- I think bullet-pointing the lists is a good idea, and it will prevent them from being identified by Periodbot in the future. But don't get stressed out over things that are not explicitly part of this project if you don't want! — brighterorange (talk) 18:53, 30 March 2006 (UTC)
[edit] Project pages
Could project pages be omitted from the dumps? File #285 has quite a few project pages such as Wikipedia:Naming conventions (Slovenian vs Slovene)/Archive 1. I'm guessing those don't need fixing. Gimboid13 22:15, 4 February 2006 (UTC)
- That's really weird; anything from the Wikipedia namespace shouldn't be considered at all, since we're only looking at the article namespace. It's most likely a problem with the database dumps (?). — brighterorange (talk) 14:00, 18 April 2006 (UTC)
[edit] Checking grammar
I know that parsing English, or any natural language is hard, but there are a few simple grammar checks that can be done. Collecting information on common mistakes is also useful for writers of checkers (mine are here).
One common mistake is the same word to appear twice twice. This is not always a bug, but it is often unintended.
If anybody is interested in running a grammar checker over the English Wikipedia articles there are some links to useful tools and data here.
[edit] Random Thought
Just found this project and am very glad to be able to help out on wikipedia without knowing a ton about some random area of knowledge! As I've just gotten really into those userboxes, I think it would be fun if someone with more know how than I could whip up one of those "this user participates in the punctuation wikiproject."--Lowfatsourcreme 17:36, 3 April 2006 (UTC)
I was just thinking that!!
Reedy Boy 06:39, 18 April 2006 (UTC)
- I created one. You can add {{User project punctuation}} to your userpage, which will produce:
• | This user participates in Project Punctuation. |
Enjoy! — brighterorange (talk) 14:16, 18 April 2006 (UTC)
Yay, Boxes
Maybe an option to put the amount done.... Or maybe not. lol
Thanks!
Reedy Boy 16:24, 18 April 2006 (UTC)
[edit] Taking over Dump Files Started by Other People
Are we allowed to do this?
As i've got few done tonight, and i noticed some are from February, and they really need completing
IF we are allowed, can i just complete it and then delete it?
Reedy Boy 19:25, 10 April 2006 (UTC)
- If it's more than a week or so old, go for it! We're almost done! — brighterorange (talk) 14:02, 18 April 2006 (UTC)
[edit] And we're done!!!
Yay!!
Reedy Boy 07:36, 6 May 2006 (UTC)
- I really hope that the use of 'were' is a joke on your part, seeing as this is the 'Project Punctuation' page. Berry 11:34, 6 May 2006 (UTC)
-
- Yup
LOL
Reedy Boy 20:54, 6 May 2006 (UTC)
- Well done, everyone! We'll take a break for a few months. I think that perhaps the next round will be a different (punctuation) analysis. Maybe the proper use of en dashes and em dashes? I'd also like to make the process somewhat more automated through the use of client-side scripting. If anyone wants to help out on the development side of this project (and has some expertise), let me know! — brighterorange (talk) 14:47, 9 May 2006 (UTC)
-
- It'd would be very good if you could get the server to reduce the amount of items created, such as ." not being included and so on...?
What is it written in? Reedy Boy 06:59, 10 May 2006 (UTC)
-
-
- The analysis code is in Standard ML. It does already filter out punctuation at the end of a quotation, though some things do confuse it. Do you have a specific rule in mind? — brighterorange (talk) 14:06, 10 May 2006 (UTC)
-
[edit] Project directory
Hello. The WikiProject Council has recently updated the Wikipedia:WikiProject Council/Directory. This new directory includes a variety of categories and subcategories which will, with luck, potentially draw new members to the projects who are interested in those specific subjects. Please review the directory and make any changes to the entries for your project that you see fit. There is also a directory of portals, at User:B2T2/Portal, listing all the existing portals. Feel free to add any of them to the portals or comments section of your entries in the directory. The three columns regarding assessment, peer review, and collaboration are included in the directory for both the use of the projects themselves and for that of others. Having such departments will allow a project to more quickly and easily identify its most important articles and its articles in greatest need of improvement. If you have not already done so, please consider whether your project would benefit from having departments which deal in these matters. It is my hope that all the changes to the directory can be finished by the first of next month. Please feel free to make any changes you see fit to the entries for your project before then. If you should have any questions regarding this matter, please do not hesitate to contact me. Thank you. B2T2 14:15, 26 October 2006 (UTC)
[edit] Wikipedia Day Awards
Hello, all. It was initially my hope to try to have this done as part of Esperanza's proposal for an appreciation week to end on Wikipedia Day, January 15. However, several people have once again proposed the entirety of Esperanza for deletion, so that might not work. It was the intention of the Appreciation Week proposal to set aside a given time when the various individuals who have made significant, valuable contributions to the encyclopedia would be recognized and honored. I believe that, with some effort, this could still be done. My proposal is to, with luck, try to organize the various WikiProjects and other entities of wikipedia to take part in a larger celebrartion of its contributors to take place in January, probably beginning January 15, 2007. I have created yet another new subpage for myself (a weakness of mine, I'm afraid) at User talk:Badbilltucker/Appreciation Week where I would greatly appreciate any indications from the members of this project as to whether and how they might be willing and/or able to assist in recognizing the contributions of our editors. Thank you for your attention. Badbilltucker 19:28, 30 December 2006 (UTC)
[edit] Come back project punctuation!
When will the next dump be out? I liked helping out with this project! J. Finkelstein 06:39, 25 April 2007 (UTC)
- Well, I don't have any immediate plans to run another round (unfortunately the size of the database dumps makes it rather a large effort for me and harder each time), but I have been working on ideas for the next iteration. Particularly, I've been writing a client-side script that automatically corrects punctuation errors for any page. You can take a look at User:Brighterorange/punctuation.js and User:Brighterorange/punctuationtest if you're interested in what I've done so far. I've been using it, but I'm not sure it's ready for others yet. — brighterorange (talk) 00:35, 26 April 2007 (UTC)