Talk:Thread (computer science)

From Wikipedia, the free encyclopedia

Thanks for fleshing out the stub, Lee. I knew this stuff, but it wasn't on the tip of my tongue (one step further buried in the memory banks). User:Ed Poor


Just for note. I renamed the article to thread (computer science) because threading not necessarily only in software enginnering. -- Taku 16:14, Mar 28, 2004 (UTC)

From the article:

"Systems like Windows NT, OS/2 and Linux (2.5 kernel or higher) are said to have "cheap" threads and "expensive" processes, while in other systems there is not so big a difference."

Eh? I though one major advantage of Linux was its relatively low process creation cost, and the new NPTL implementation of Linux threads is a 1:1 implementation where there is little difference between a thread and a process at the lowest level. -- The Anome 22:33, 22 Oct 2004 (UTC)

If you take a look at NetBSD 2.x or DragonFly BSD you will notice that they don't use 1:1, but instead N:M, which is a lot more efficient than Linux 1:1. I've added a link about NetBSD SA (Scheduler Activations) in this page that I advise you to look at. -- RuiPaulo 23:54, 24 Oct 2004 (UTC)

Someone please address the issue of multithreaded programming in dual vs quad+ cores. Nowhere does it talk about whether or not it is more difficult for the programmer to create a program to take advantage of >2 processors. We understand from the article that you must write the application with multiple threads in mind BUT nowhere does it talk about the issue of programming difficulty or lake thereof for >2 processors.

Basically: Will a multithreaded program of today have to be modified for a computer with 4, 8, 32 cpus?


Contents

[edit] Authors

The section titled "Processes, Threads, and Fibers" and its subsections were started by Daniel Barbalace.

Website: http://www.clearthought.info

Thanks for the excellent work, Daniel. -- The Anome 14:59, 24 Oct 2004 (UTC)

[edit] System call for a context switch?

The article says "Typically fibers are implemented entirely in userspace. As a result, context switching between fibers in a process is extremely efficient: because the kernel is oblivious to the existence of fibers, a context switch does not require a system call." Why would a context switch require a system call in any situation? Is it talking about non-preemptive multitasking? If so, wouldn't it be better if the article mentioned it? If the article isn't intended to say that a system call is required for a context switch, then it would be better if it were cleared up in the article. It's misleading. It would help if the article (that section) is clearer anyway.

Yeah, it would be more clear to say "a context switch does not require a kernel entry" or some such. I'm not sure that it matters whether or not the thread implementation is preemptively scheduled or not: in a cooperatively scheduled implementation of kernel threads, there is still some kernel state associated with the currently-running thread, and so a kernel entry is some kind is required to switch that state, right? The point is somewhat academic, anyway, as cooperatively-scheduled kernel thread implementations are rare AFAIK. Neilc 04:23, 20 Jun 2005 (UTC)
I made that change. I was trying to figure out how to get something about pre-emptive scheduling of threads involving the kernel programming a hardware interrupt in order to stop the current thread, but couldn't figure out how to word that decently. :) Dianne Hackborn 05:51, 22 Jun 2005 (UTC)

[edit] Multiprocess vs. Multithreaded vs. Fibers

Would anyone mind if I did some cleanup of this table? It seems to me that all the long discussion about particular systems (especially the big one on AmigaOS) hides the comparison it is trying to make. What about moving discussion about specific operating systems to a section below?

Also, I kind-of disagree with the definition of multiprocess that is being implied here -- that it means something about more than one user-level "application" running. I think it should be much more tied to the idea of memory protection: that is, an OS with memory protection has multiple processes. From this perspective, for example, the traditional AmigaOS (pre-PPC) would be better classified as multi-threaded only, and would serve as a good example to illustrate the difference between that and processes.

Anyway, I don't want to step on anyone's toes. :)

Dianne Hackborn

Sounds good to me. Be bold! :) Neilc 05:00, 20 Jun 2005 (UTC)
Also, I want to say that the text in the table is really hard to read; not everyone has a big display. -- Taku 05:59, Jun 20, 2005 (UTC)

[edit] table changes

Personally I like the old table format more -- the new format is too compact and is difficult to read. Neilc 00:48, 23 Jun 2005 (UTC)

Okay, I made another attempt. What do you think? Dianne Hackborn 02:59, 23 Jun 2005 (UTC)
I noticed that "modern" operating systems (second-to-last row in the chart) are marked as supporting multiprocessing, multithreading but not fibers (Y Y N), but then it says that nearly all operating systems after 1995 support all three (Y Y Y). This seems to be self-contradictory; for example, Mac OS X is a modern operating system, and also (if I remember correctly) came out after 1995. Does it fall in the second-to-last category (supporting multiprocessing, multithreading but not fibers), or does it fall in the last category (supporting all three)? 68.50.203.109 04:14, 9 July 2006 (UTC)
Why is the fibers colomn even in this table? By definition, fibers can be implemented in a user program and thus don't rely on the existance of either threads or processes (so one could, say, implement them in DOS). I personally think the fibers column should be removed from the table entirely or changed to all Y letters. --NotQuiteEXPComplete 12:40, 14 July 2006 (UTC)
Indeed, some programming languages (those with coroutines or continuations) support "fibers" out of the box, with little effort. In other languages (like C), it requires some systems-level wizardry, but such can be hidden behind a library. About the only use of the fibers column is documenting which OS's provide API calls in the standard library; I can't think of any modern systems which do not. --EngineerScotty 23:30, 18 July 2006 (UTC)
Why is Microsoft Windows listed as "Y Y N"? There are native functions to create and schedule fibers (CreateFiber, ConvertThreadToFiber,SwitchToFiber), contained within kernel32.dll, and supported since Win9x. Goffrie 16:54, 20 July 2006 (UTC)

[edit] threads vs processes

Not always are processes provided only by operating system. Erlang programming language has support for processes in language - they are completely isolated from one another (while running within single OS process). Pavel Vozenilek 22:02, 29 July 2005 (UTC)

The question is, are these truly "processes" if not provided by the operating system? For instance .NET provides AppDomains that are very "process-like", but are not true processes. That is to say that the AppDomains provide memory isolation, context, and fault tolerance (in that it can be unloaded or ab-end without affecting other appdomains in the system). These are still not considered processes, however, because they still run in the same (although isolated) address space as the rest of the process. Is this the same thing with Erlang? Being that I've never worked with it at all, I couldn't make a valid argument one way or another. - Sleepnomore 15:53, August 26, 2005 (UTC)

[edit] Books

I've added Three .NET threading resources - two of these are admittedly my own published works. I've added them only because there were no other .NET programming resources listed. Feel free to remove them if you feel they are inappropriate. Both of these books are out of print but you can still find them quite often at book stores and from secondary book stores. I have a fourth book published that is still on the market that I didn't list because it covers two other areas as well as threading. The title is Pro .Net 1. 1 Remoting, Reflection, and Threading and the ISBN is 1-590-59452-5. While it is more "current", the fact that it covers two other topics doesn't seem to make it appropriate for this article. Once again, feel free to add it if you feel otherwise. - Sleepnomore 15:46, August 26, 2005 (UTC)

[edit] Fiber, Fiber, Fiber?

Huh? Well, I'm no thread expert but I've never heard this term in this context before. Furthermore, it's spread all over the article, so one would think it's something widely known and used. However, when I search on the web, all I find is that it seems to be some seemingly unpopular concept used by Windows NT. It seems to be what I'd call user-level (or userland) threads e.g. GNU Pth or maybe it refers to anything thread-alike as used in programs with a main-loop which delegates/schedules short-lived tasks using state machines? So is "fiber" common terminology or just some marketing resp. product-specific term? --82.141.49.144 05:01, 4 December 2005 (UTC)

[edit] Threads are also used by web servers

Many web servers use threads somehow, sure. However, I find those two paragraphs not very useful. In my opinion, it gives the wrong idea that either multi-threading or multi-processing is necessary for web servers and similar kind of servers in general. That's far from the truth. At least on single-CPU machines an approach using select, poll, kevent, epoll etc. is usually far more efficient and also a fairly popular approach for all kinds of servers at least in C/C++. Even in Java using tons of threads does not seem to scale very well and using non-blocking I/O instead is actually better even if the naive approach of one thread per connection - or in C/C++ even one process per connection (resp. client) - is easier to code.

The part about Apache 1.3 using multi-processing almost qualifies as FUD. At the very least a lot of context or explanations are missing and most of those "dangers" apply to multi-threading equivalently. In short, I think the article was better without those two paragraphs. --82.141.49.144 05:14, 4 December 2005 (UTC)

You should get more informed before calling this FUD. Personally having extensive server-side programming experience, I can say that depending on the application, one of the following is a best approach: 1) single threaded, single process, using non-blocking I/O (i.e. an IRCd, where many memory needs to be shared among clients without the overhead of locks, and without need for any CPU intensive tasks or expensive system calls other than non-blocking I/O) 2) multiple processes using a pool of processes (ideal when you need very solid production quality, avoiding memory leaks of system libraries, avoiding a client from crashing the whole application, using processes of various credentials, avoiding thread-related issues if the application is non-trivial and a fair number of syscalls and processing is needed to serve the clients. 3) a multithreaded approach, which is prone to some of the problems of (1), but is more powerful and can allow more processing to be done to serve a client, but without some obvious security protections of (2) (at least if not also using multiple processes). I had to program applications using all of these models. For every situation, one model is clearly superior over the others, these are not merely style or paradigm programming issues. Of course, this doesn't mean that you can't appropriately use both multiple processes and threads in an application. You just can't in any sanity avoid using multiple processes for some applications.
It should also be noted that the launching time of processes or threads is a less important factor (because pools can be used, and because process creation is usually light enough on modern OSs using Copy-on-write) than a) the stability/security implications, b) frequency of shared resources access and c) type of processing needed, which will generally define which of the models an application should ideally use. Also note that systems with Symmetric multiprocessing permit both multiple processes and multiple threads to utilize more than a single processor.
Back on topic with Apache and PHP, it took a fair amount of time for PHP to decently run under Apache 2 in multithreaded mode, and depending on the PHP extensions and OS you're using, you can still experience issues. This is partly caused by the fact that many libraries expect to run under a normal process and use blocking or process-wide affecting system calls, and most PHP extensions just map to those library calls. As for the OS-specifics, despite the POSIX thread standard, which defines a widely spread API, implementations widely vary. There shouldn't be much issues if on your particular operating system every system call is thread-friendly, and its C library uses all the necessary hacks to mutex-wrap thread-unfriendly functions (other than security/stability/memory leaks issues solved by multiple processes). The PHP wrappers can also limit problems lock-wrapping calls made to a non-reentrant C function (loosing the concurrency adventage of threads when using this function of course). Also, weither using threads provides greater concurrency than processes also depends on your OS and the thread's implementation.
A few real-world examples of implementation specifics: some of my own multithreaded applications couldn't properly run under LinuxThreads (while ran fine under Solaris and NetBSD) because LinuxThreads wasn't up to the POSIX standard, other than providing a compatible API). These now run fine under Linux 2.6 with the Native POSIX Thread Library. Still today, however, a number of people decide to keep 2.4 on their production servers for stability considerations. Another example, several years back, on NetBSD it was only possible to use user-space threads (what some call fibers), through the GNU PTh library, unproven-threads or unreal-threads libraries. Under these libraries, some system calls had to be avoided alltogether to avoid problems, and heavy processing loops had to be done in a separate process, or to explicitely yield control frequently back to the thread scheduler voluntarily, because these particular implementations did not use a preemptive scheduler (not worthwhile to provide without special kernel support). Another aspect is that POSIX did not properly define a relation between threads and the standard I/O subsystems of unix systems using file desriptors. For instance, while a thread may wait into select(2) or poll(2), it cannot at the same time use a more efficient mode of inter-thread messages based on shared memory queues and notification through conditional variables, without some hackery, like dedicating a thread to file descriptor polling and having that thread communicate in a thread-efficient manner with other threads of the process. Therefore, if your application heavily relies on interprocess communication because of a need like privilege separation, is using threads still worth it considering the hassle it adds? (an implementation of a test project I wrote to fusion inter-thread messages and file descriptor polling can be found at [1] (does not work with LinuxThreads or PTh). I am certain that there are many other examples and that many programmers can relate. As an ending note, interestingly, the Apache 1.3 branch isn't about to die, and is still being actively maintained and used worldwide. The very powerful Postgresql database also benefits from multiple processes, and there have been heated discussions about the process vs thread models between its community and that of MySQL. MySQL uses a MySQL_safe process which role is to restart the crashing main MySQL multithreaded process (not that it crashes often in my personal experience, though :) but an example where an extra process was required.
This was a lengthy reply, but points which are worthwhile to consider. I'll leave for others to relate and if necessary update the article if they consider it useful. It's probably sufficient to keep these notes on this discussion page, though. --66.11.179.30 03:36, 3 March 2006 (UTC)
I think all your questions are answered in the Process, Thread, Fiber Table that was deleted. Why the hell was that table deleted? It was the most useful part of the article.

[edit] Example

A while ago I changed the example to a C# example that I believed showed purely the concept of multi-threading more clearly than our current example which would be more suited to Algorithms For Prime Number Generation ;) I just feel that the example should be very closely tied to threading as opposed to having an exterior purpose. Here is the reverted example I propsed before: [2] Any thoughts? Martin Hinks 17:43, 29 January 2006 (UTC)

The commentary pertaining to the example says, "of course, this problem [the race condition] is easily corrected using standard programming techniques" raises the question: How? 68.50.203.109 04:17, 9 July 2006 (UTC)

[edit] Getting It Str8

Ey people, i'm kind of a newbie to CPU's. The terminology used in this article (and most wikipedia articles) is kind of complex. So, i'm wondering if someone can tell me a little about it in plain english like;

1) What exactly are Threads, what do they do, and how do they work?

2) What is the difference between a thread, process, and the other thing?

3) What exactly does a kernel do? Is it embedded in an Operating System? Why 'Linux Kernel' and not just 'Linux'?

4) Tell me some other things i should know about this stuff please.

Any help would be greatly appreciated and if you do help me, ill make it worth a lot of other people's time by editing this main article into simpler words. Thanks. KittenKiller



It seems to me that the terminology used in most computer science wikipedia articles is kind of complex. However, the same cannot be said for most wikipedia articles. Care to comment? 68.50.203.109 04:21, 9 July 2006 (UTC)

[edit] Ambiguity

"""while in other operating systems there is not so big a difference"""
What exactly are the other operating systems? Linux, BeOS. It's nice to be specific.
Probably most other commonly used operating systems, those that I know well being all of the BSDs, Linux, Solaris, MacOS (I did not use BeOS, but it well might be using COW (Copy-on-write) in fork(2) to avoid actually creating all pages of a newly created process too). I would not be surprised if IRIX, HPUX, Tru64 also do, but someone who knows should confirm this. --66.11.179.30 09:54, 3 March 2006 (UTC)

[edit] process--thread--fiber interrelation

The concept of a process, thread, and fiber are interrelated by a sense of "ownership" and of containment. .... A fiber can be scheduled to run in any thread in the same process.

There seems a contradiction. If a fiber can be run in any thread, then it's not cointained within any specific thread, nor owned by any specific thread. --tyomitch 15:25, 8 May 2006 (UTC)


For context enrichment, could be nice to add some thread history (OS related, but makes sense also into this context). --User:faragon 12:52, 7 Jun 2006 (GMT+1)

[edit] Fibers and NPOV

Yeah, this whole Fiber stuff needs rewriting, the term is not used for much outside of the NTcentric world. Also curious is the table. What the hell is the last entry? It says most OSs since 1995 use a proces/thread/fiber model. But above it, it says that OS X Win 2k etc (Major OSes released AFTER 1995) use a process/thread model. What came in 1995? Win95?? What about NT (released 93) or Macs which were stuck with classic MacOS (system 7, I believe) until OSX. Seems a little Windowscentric. So much for NPOV!

[edit] OK...?

I don't see around here (The referenced talk page) who added the {{contradict}} tag, and more to the point, why? 68.39.174.238 03:40, 18 July 2006 (UTC)

The tag appears to have been added in this edit by an anon user. If there's no obvious reason for it (and I don't see one) then I suggest we just remove the tag. --Allan McInnes (talk) 05:11, 18 July 2006 (UTC)
They are right though. The table is, in some cases, blatently wrong. Since fibers seem to be just a Windows name for userspace threads from what I can gather, it does appear to be wrong since one should be able to implement them in all of these systems in userspace or one's program. --NotQuiteEXPComplete 17:39, 18 July 2006 (UTC)
Ok. But wrong isn't the same as self-contradictory, which is think what caused some confusion. Nor did the person who did the tagging explicate their reasons here on the talk page, as they were supposed - was there a reason for the tagging, or was it just idle vandalism? If the table's wrong, then by all means fix it (personally I'd be in favor of cutting it completely, since I don't think it adds much value). --Allan McInnes (talk) 18:40, 18 July 2006 (UTC)
I think the contradiction is pointed out under the "Fibers and NPOV" header in the talk page. The last row of the table says "Almost all operating systems after 1995 fall into this category." but many operating systems after 95' are mentioned in previous rows. Maybe not completely self-contradictory, but at least confusing. And I agree with the people that think we should get rid of all this fiver stuff which makes no sense outside NT-land. --Lost Goblin 08:28, 19 July 2006 (UTC)
Yeah, I'm going to go ahead and kill the table. --NotQuiteEXPComplete 19:17, 26 July 2006 (UTC)

[edit] Article restructuring

Forgive me if this is a poorly stated ... I'm rather tired. I think a good restucturing would go a long way to improving this article. The section headers are also inapropriate in some places were the discussion strays from what the section header talks about. In general, I think there should be sections comparing (or at least saying a small amount before linking to another article) the entire spectrum of: (1) implementation: kernel-space vs. user-space, (2) for kernel space, the model for interfacing kernel-threads and user-threads: many-to-one, one-to-one, one-to-many, (3) co-operitve scheduling (i.e. by the process) vs. premptive sceduling (i.e. by the kernel) and showing how the desireable properties of threads, namely continuing to execute the process in the face of blocking calls and being able to run on multiprocessor machines, pretty much follow directly from these three things. The article mentions most of this stuff, but the flow and cohesiveness is pretty poor.--NotQuiteEXPComplete 06:46, 9 August 2006 (UTC)

[edit] Green threads

I don't think "green threads" came from SunOS. Instead, I think they came from the Green project:

http://today.java.net/jag/old/green

http://www.jguru.com/faq/view.jsp?EID=416246

Calling user-level threads "green threads" probably resulted from the early Java JDK using that name for the thread implementation inherited from the Green project.

Booskunk 06:53, 23 October 2006 (UTC)