How to read a file line-by-line into a list?

Questions : How to read a file line-by-line into a list?

How do I read every line of a file in Python and store each line as an element in a list?

I want to read the file line by line and append each line to the end of the list.

Total Answers: 28 Answers 28


Popular Answers:

  1. This is more explicit than necessary, but does what you want.

    with open("file.txt") as file_in: lines = [] for line in file_in: lines.append(line) 
  2. According to Python’s Methods of File Objects, the simplest way to convert a text file into a list is:

    with open('file.txt') as f: my_list = list(f) # my_list = [x.rstrip() for x in f] # remove line breaks 

    If you just need to iterate over the text file lines, you can use:

    with open('file.txt') as f: for line in f: ... 

    Old answer:

    Using with and readlines() :

    with open('file.txt') as f: lines = f.readlines() 

    If you don’t care about closing the file, this one-liner will work:

    lines = open('file.txt').readlines() 

    The traditional way:

    f = open('file.txt') # Open file on read mode lines = f.read().splitlines() # List with stripped line-breaks f.close() # Close file 
  3. If you want the n included:

    with open(fname) as f: content = f.readlines() 

    If you do not want n included:

    with open(fname) as f: content = f.read().splitlines() 
  4. You could simply do the following, as has been suggested:

    with open('/your/path/file') as f: my_lines = f.readlines() 

    Note that this approach has 2 downsides:

    1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it’s not large, it is simply a waste of memory.

    2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).

    A better approach for the general case would be the following:

    with open('/your/path/file') as f: for line in f: process(line) 

    Where you define your process function any way you want. For example:

    def process(line): if 'save the world' in line.lower(): superman.save_the_world() 

    (The implementation of the Superman class is left as an exercise for you).

    This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.

  5. Having a Text file content:

    line 1 line 2 line 3 

    We can use this Python script in the same directory of the txt above

    >>> with open("myfile.txt", encoding="utf-8") as file: ...  x = [l.rstrip("n") for l in file] >>> x ['line 1','line 2','line 3'] 

    Using append:

    x = [] with open("myfile.txt") as file: for l in file: x.append(l.strip()) 

    Or:

    >>> x = open("myfile.txt").read().splitlines() >>> x ['line 1', 'line 2', 'line 3'] 

    Or:

    >>> x = open("myfile.txt").readlines() >>> x ['linea 1n', 'line 2n', 'line 3n'] 

    Or:

    def print_output(lines_in_textfile): print("lines_in_textfile =", lines_in_textfile) y = [x.rstrip() for x in open("001.txt")] print_output(y) with open('001.txt', 'r', encoding='utf-8') as file: file = file.read().splitlines() print_output(file) with open('001.txt', 'r', encoding='utf-8') as file: file = [x.rstrip("n") for x in file] print_output(file) 

    output:

    lines_in_textfile = ['line 1', 'line 2', 'line 3'] lines_in_textfile = ['line 1', 'line 2', 'line 3'] lines_in_textfile = ['line 1', 'line 2', 'line 3'] 
  6. To read a file into a list you need to do three things:

    • Open the file
    • Read the file
    • Store the contents as list

    Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:

    lst = list(open(filename)) 

    However I’ll add some more explanation.

    Opening the file

    I assume that you want to open a specific file and you don’t deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open, it takes one mandatory argument and two optional ones in Python 2.7:

    • Filename
    • Mode
    • Buffering (I’ll ignore this argument in this answer)

    The filename should be a string that represents the path to the file. For example:

    open('afile') # opens the file named afile in the current working directory open('adir/afile') # relative path (relative to the current working directory) open('C:/users/aname/afile') # absolute path (windows) open('/usr/local/afile') # absolute path (linux) 

    Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt or .doc, etc. are hidden by default when viewed in the explorer.

    The second argument is the mode, it’s r by default which means “read-only”. That’s exactly what you need in your case.

    But in case you actually want to create a file and/or write to a file you’ll need a different argument here. There is an excellent answer if you want an overview.

    For reading a file you can omit the mode or pass it in explicitly:

    open(filename) open(filename, 'r') 

    Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb:

    open(filename, 'rb') 

    On other platforms the 'b' (binary mode) is simply ignored.


    Now that I’ve shown how to open the file, let’s talk about the fact that you always need to close it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).

    While you could use:

    f = open(filename) # ... do stuff with f f.close() 

    That will fail to close the file when something between open and close throws an exception. You could avoid that by using a try and finally:

    f = open(filename) # nothing in between! try: # do stuff with f finally: f.close() 

    However Python provides context managers that have a prettier syntax (but for open it’s almost identical to the try and finally above):

    with open(filename) as f: # do stuff with f # The file is always closed after the with-scope ends. 

    The last approach is the recommended approach to open a file in Python!

    Reading the file

    Okay, you’ve opened the file, now how to read it?

    The open function returns a file object and it supports Pythons iteration protocol. Each iteration will give you a line:

    with open(filename) as f: for line in f: print(line) 

    This will print each line of the file. Note however that each line will contain a newline character n at the end (you might want to check if your Python is built with universal newlines support – otherwise you could also have rn on Windows or r on Mac as newlines). If you don’t want that you can could simply remove the last character (or the last two characters on Windows):

    with open(filename) as f: for line in f: print(line[:-1]) 

    But the last line doesn’t necessarily has a trailing newline, so one shouldn’t use that. One could check if it ends with a trailing newline and if so remove it:

    with open(filename) as f: for line in f: if line.endswith('n'): line = line[:-1] print(line) 

    But you could simply remove all whitespaces (including the n character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:

    with open(filename) as f: for line in f: print(f.rstrip()) 

    However if the lines end with rn (Windows “newlines”) that .rstrip() will also take care of the r!

    Store the contents as list

    Now that you know how to open the file and read it, it’s time to store the contents in a list. The simplest option would be to use the list function:

    with open(filename) as f: lst = list(f) 

    In case you want to strip the trailing newlines you could use a list comprehension instead:

    with open(filename) as f: lst = [line.rstrip() for line in f] 

    Or even simpler: The .readlines() method of the file object by default returns a list of the lines:

    with open(filename) as f: lst = f.readlines() 

    This will also include the trailing newline characters, if you don’t want them I would recommend the [line.rstrip() for line in f] approach because it avoids keeping two lists containing all the lines in memory.

    There’s an additional option to get the desired output, however it’s rather “suboptimal”: read the complete file in a string and then split on newlines:

    with open(filename) as f: lst = f.read().split('n') 

    or:

    with open(filename) as f: lst = f.read().splitlines() 

    These take care of the trailing newlines automatically because the split character isn’t included. However they are not ideal because you keep the file as string and as a list of lines in memory!

    Summary

    • Use with open(...) as f when opening files because you don’t need to take care of closing the file yourself and it closes the file even if some exception happens.
    • file objects support the iteration protocol so reading a file line-by-line is as simple as for line in the_file_object:.
    • Always browse the documentation for the available functions/classes. Most of the time there’s a perfect match for the task or at least one or two good ones. The obvious choice in this case would be readlines() but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension.
  7. Clean and Pythonic Way of Reading the Lines of a File Into a List


    First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:

    infile = open('my_file.txt', 'r') # Open the file for reading. data = infile.read() # Read the contents of the file. infile.close() # Close the file since we're done using it. 

    Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it. In the statement below, we’re opening the file for reading, and assigning it to the variable ‘infile.’ Once the code within this statement has finished running, the file will be automatically closed.

    # Open the file for reading. with open('my_file.txt', 'r') as infile: data = infile.read() # Read the contents of the file into memory. 

    Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:

    # Return a list of the lines, breaking at line boundaries. my_list = data.splitlines() 

    The Final Product:

    # Open the file for reading. with open('my_file.txt', 'r') as infile: data = infile.read() # Read the contents of the file into memory. # Return a list of the lines, breaking at line boundaries. my_list = data.splitlines() 

    Testing Our Code:

    • Contents of the text file:
     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri împãrãtesti, O prea frumoasã fatã. 
    • Print statements for testing purposes:
     print my_list # Print the list. # Print each line in the list. for line in my_list: print line # Print the fourth element in this list. print my_list[3] 
    • Output (different-looking because of unicode characters):
     ['A fost odatxc3xa3 ca-n povesti,', 'A fost ca niciodatxc3xa3,', 'Din rude mxc3xa3ri xc3xaempxc3xa3rxc3xa3testi,', 'O prea frumoasxc3xa3 fatxc3xa3.'] A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri împãrãtesti, O prea frumoasã fatã. O prea frumoasã fatã. 
  8. Introduced in Python 3.4, pathlib has a really convenient method for reading in text from files, as follows:

    from pathlib import Path p = Path('my_text_file') lines = p.read_text().splitlines() 

    (The splitlines call is what turns it from a string containing the whole contents of the file to a list of lines in the file).

    pathlib has a lot of handy conveniences in it. read_text is nice and concise, and you don’t have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it’s a good choice.

  9. Here’s one more option by using list comprehensions on files;

    lines = [line.rstrip() for line in open('file.txt')] 

    This should be more efficient way as the most of the work is done inside the Python interpreter.

  10. f = open("your_file.txt",'r') out = f.readlines() # will append in the list out 

    Now variable out is a list (array) of what you want. You could either do:

    for line in out: print (line) 

    Or:

    for line in f: print (line) 

    You’ll get the same results.

  11. Another option is numpy.genfromtxt, for example:

    import numpy as np data = np.genfromtxt("yourfile.dat",delimiter="n") 

    This will make data a NumPy array with as many rows as are in your file.

  12. Read and write text files with Python 2 and Python 3; it works with Unicode

    #!/usr/bin/env python3 # -*- coding: utf-8 -*- # Define data lines = [' A first string ', 'A Unicode sample: €', 'German: äöüß'] # Write text file with open('file.txt', 'w') as fp: fp.write('n'.join(lines)) # Read text file with open('file.txt', 'r') as fp: read_lines = fp.readlines() read_lines = [line.rstrip('n') for line in read_lines] print(lines == read_lines) 

    Things to notice:

    • with is a so-called context manager. It makes sure that the opened file is closed again.
    • All solutions here which simply make .strip() or .rstrip() will fail to reproduce the lines as they also strip the white space.

    Common file endings

    .txt

    More advanced file writing/reading

    For your application, the following might be important:

    • Support by other programming languages
    • Reading/writing performance
    • Compactness (file size)

    See also: Comparison of data serialization formats

    In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.

  13. If you’d like to read a file from the command line or from stdin, you can also use the fileinput module:

    # reader.py import fileinput content = [] for line in fileinput.input(): content.append(line.strip()) fileinput.close() 

    Pass files to it like so:

    $ python reader.py textfile.txt 

    Read more here: http://docs.python.org/2/library/fileinput.html

  14. The simplest way to do it

    A simple way is to:

    1. Read the whole file as a string
    2. Split the string line by line

    In one line, that would give:

    lines = open('C:/path/file.txt').read().splitlines() 

    However, this is quite inefficient way as this will store 2 versions of the content in memory (probably not a big issue for small files, but still). [Thanks Mark Amery].

    There are 2 easier ways:

    1. Using the file as an iterator
    lines = list(open('C:/path/file.txt')) # ... or if you want to have a list without EOL characters lines = [l.rstrip() for l in open('C:/path/file.txt')] 
    1. If you are using Python 3.4 or above, better use pathlib to create a path for your file that you could use for other operations in your program:
    from pathlib import Path file_path = Path("C:/path/file.txt") lines = file_path.read_text().split_lines() # ... or ...  lines = [l.rstrip() for l in file_path.open()] 
  15. inp = "file.txt" data = open(inp) dat = data.read() lst = dat.splitlines() print lst # print(lst) # for python 3
  16. If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.

    buffersize = 2**16 with open(path) as f: while True: lines_buffer = f.readlines(buffersize) if not lines_buffer: break for line in lines_buffer: process(line) 
  17. or
  18. with open(myFile, "r") as f: excludeFileContent = list(filter(None, f.read().splitlines()))
  19. Use this:

    import pandas as pd data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc. array = data.values 

    data is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist().

  20. Outline and Summary

    With a filename, handling the file from a Path(filename) object, or directly with open(filename) as f, do one of the following:

    • list(fileinput.input(filename))
    • using with path.open() as f, call f.readlines()
    • list(f)
    • path.read_text().splitlines()
    • path.read_text().splitlines(keepends=True)
    • iterate over fileinput.input or f and list.append each line one at a time
    • pass f to a bound list.extend method
    • use f in a list comprehension

    I explain the use-case for each below.

    In Python, how do I read a file line-by-line?

    This is an excellent question. First, let’s create some sample data:

    from pathlib import Path Path('filename').write_text('foonbarnbaz') 

    File objects are lazy iterators, so just iterate over it.

    filename = 'filename' with open(filename) as f: for line in f: line # do something with the line 

    Alternatively, if you have multiple files, use fileinput.input, another lazy iterator. With just one file:

    import fileinput for line in fileinput.input(filename): line # process the line 

    or for multiple files, pass it a list of filenames:

    for line in fileinput.input([filename]*2): line # process the line 

    Again, f and fileinput.input above both are/return lazy iterators. You can only use an iterator one time, so to provide functional code while avoiding verbosity I’ll use the slightly more terse fileinput.input(filename) where apropos from here.

    In Python, how do I read a file line-by-line into a list?

    Ah but you want it in a list for some reason? I’d avoid that if possible. But if you insist… just pass the result of fileinput.input(filename) to list:

    list(fileinput.input(filename)) 

    Another direct answer is to call f.readlines, which returns the contents of the file (up to an optional hint number of characters, so you could break this up into multiple lists that way).

    You can get to this file object two ways. One way is to pass the filename to the open builtin:

    filename = 'filename' with open(filename) as f: f.readlines() 

    or using the new Path object from the pathlib module (which I have become quite fond of, and will use from here on):

    from pathlib import Path path = Path(filename) with path.open() as f: f.readlines() 

    list will also consume the file iterator and return a list – a quite direct method as well:

    with path.open() as f: list(f) 

    If you don’t mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path object and the splitlines() string method. By default, splitlines removes the newlines:

    path.read_text().splitlines() 

    If you want to keep the newlines, pass keepends=True:

    path.read_text().splitlines(keepends=True) 

    I want to read the file line by line and append each line to the end of the list.

    Now this is a bit silly to ask for, given that we’ve demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let’s humor this request.

    Using list.append would allow you to filter or operate on each line before you append it:

    line_list = [] for line in fileinput.input(filename): line_list.append(line) line_list 

    Using list.extend would be a bit more direct, and perhaps useful if you have a preexisting list:

    line_list = [] line_list.extend(fileinput.input(filename)) line_list 

    Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:

    [line for line in fileinput.input(filename)] 

    Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:

    list(fileinput.input(filename)) 

    Conclusion

    You’ve seen many ways to get lines from a file into a list, but I’d recommend you avoid materializing large quantities of data into a list and instead use Python’s lazy iteration to process the data if possible.

    That is, prefer fileinput.input or with path.open() as f.

  21. I like to use the following. Reading the lines immediately.

    contents = [] for line in open(filepath, 'r').readlines(): contents.append(line.strip()) 

    Or using list comprehension:

    contents = [line.strip() for line in open(filepath, 'r').readlines()] 
  22. You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.

    import numpy data = numpy.loadtxt(filename, delimiter="n") 
  23. I would try one of the below mentioned methods. The example file that I use has the name dummy.txt. You can find the file here. I presume, that the file is in the same directory as the code (you can change fpath to include the proper file name and folder path.)

    In both the below mentioned examples, the list that you want is given by lst.

    1.> First method:

    fpath = 'dummy.txt' with open(fpath, "r") as f: lst = [line.rstrip('n t') for line in f] print lst >>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.'] 

    2.> In the second method, one can use csv.reader module from Python Standard Library:

    import csv fpath = 'dummy.txt' with open(fpath) as csv_file: csv_reader = csv.reader(csv_file, delimiter=' ') lst = [row[0] for row in csv_reader] print lst >>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.'] 

    You can use either of the two methods. Time taken for the creation of lst is almost equal in the two methods.

  24. Here is a Python(3) helper library class that I use to simplify file I/O:

    import os # handle files using a callback method, prevents repetition def _FileIO__file_handler(file_path, mode, callback = lambda f: None): f = open(file_path, mode) try: return callback(f) except Exception as e: raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")]) finally: f.close() class FileIO: # return the contents of a file def read(file_path, mode = "r"): return __file_handler(file_path, mode, lambda rf: rf.read()) # get the lines of a file def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0): return [line for line in FileIO.read(file_path, mode).strip().split("n") if filter_fn(line)] # create or update a file (NOTE: can also be used to replace a file's original content) def write(file_path, new_content, mode = "w"): return __file_handler(file_path, mode, lambda wf: wf.write(new_content)) # delete a file (if it exists) def delete(file_path): return os.remove() if os.path.isfile(file_path) else None 

    You would then use the FileIO.lines function, like this:

    file_ext_lines = FileIO.lines("./path/to/file.ext"): for i, line in enumerate(file_ext_lines): print("Line {}: {}".format(i + 1, line)) 

    Remember that the mode ("r" by default) and filter_fn (checks for empty lines by default) parameters are optional.

    You could even remove the read, write and delete methods and just leave the FileIO.lines, or even turn it into a separate method called read_lines.

  25. Command line version

    #!/bin/python3 import os import sys abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) filename = dname + sys.argv[1] arr = open(filename).read().split("n") print(arr) 

    Run with:

    python3 somefile.py input_file_name.txt 
  26. Likov’s Substitution Principle states that if a program module is using a Base class, then the reference to the Base class can be replaced with a Derived class without affecting the functionality of the program module.

    Intent – Derived types must be completely substitute able for their base types.

    Example – Co-variant return types in java.

  27. Here is an excerpt from this post that clarifies things nicely:

    [..] in order to comprehend some principles, it’s important to realize when it’s been violated. This is what I will do now.

    What does the violation of this principle mean? It implies that an object doesn’t fulfill the contract imposed by an abstraction expressed with an interface. In other words, it means that you identified your abstractions wrong.

    Consider the following example:

    interface Account { /** * Withdraw $money amount from this account. * * @param Money $money * @return mixed */ public function withdraw(Money $money); } class DefaultAccount implements Account { private $balance; public function withdraw(Money $money) { if (!$this->enoughMoney($money)) { return; } $this->balance->subtract($money); } } 

    Is this a violation of LSP? Yes. This is because the account’s contract tells us that an account would be withdrawn, but this is not always the case. So, what should I do in order to fix it? I just modify the contract:

    interface Account { /** * Withdraw $money amount from this account if its balance is enough. * Otherwise do nothing. * * @param Money $money * @return mixed */ public function withdraw(Money $money); } 

    Voilà, now the contract is satisfied.

    This subtle violation often imposes a client with the ability to tell the difference between concrete objects employed. For example, given the first Account’s contract, it could look like the following:

    class Client { public function go(Account $account, Money $money) { if ($account instanceof DefaultAccount && !$account->hasEnoughMoney($money)) { return; } $account->withdraw($money); } } 

    And, this automatically violates the open-closed principle [that is, for money withdrawal requirement. Because you never know what happens if an object violating the contract doesn’t have enough money. Probably it just returns nothing, probably an exception will be thrown. So you have to check if it hasEnoughMoney() — which is not part of an interface. So this forced concrete-class-dependent check is an OCP violation].

    This point also addresses a misconception that I encounter quite often about LSP violation. It says the “if a parent’s behavior changed in a child, then, it violates LSP.” However, it doesn’t — as long as a child doesn’t violate its parent’s contract.

  28. Liskov Substitution Principle

    [SOLID]

    Inheritance Subtyping

    Wiki Liskov substitution principle (LSP)

    Preconditions cannot be strengthened in a subtype.
    Postconditions cannot be weakened in a subtype.
    Invariants of the supertype must be preserved in a subtype.

    • Subtype should not require(Preconditions) from caller more than supertype
    • Subtype should not expose(Postconditions) for caller less than supertype

    *Precondition + Postcondition = function (method) types[Swift Function type. Swift function vs method]

    //Swift function func foo(parameter: Class1) -> Class2 //function type (Class1) -> Class2 //Precondition Class1 //Postcondition Class2 

    Example

    //C3 -> C2 -> C1 class C1 {} class C2: C1 {} class C3: C2 {} 
    • Preconditions(e.g. function parameter type) can be the same or weaker(strives for -> C1)

    • Postconditions(e.g. function returned type) can be the same or stronger(strives for -> C3)

    • Invariant variable[About] of super type should stay invariant

    Swift

    class A { func foo(a: C2) -> C2 { return C2() } } class B: A { override func foo(a: C1) -> C3 { return C3() } } 

    Java

    class A { public C2 foo(C2 a) { return new C2(); } } class B extends A { @Override public C3 foo(C2 a) { //You are available pass only C2 as parameter return new C3(); } } 

    Behavioral subtyping

    Wiki Liskov substitution principle (LSP)

    Contravariance of method parameter types in the subtype.
    Covariance of method return types in the subtype.
    New exceptions cannot be thrown by the methods in the subtype, except if they are subtypes of exceptions thrown by the methods of the supertype.

    [Variance, Covariance, Contravariance, Invariance]

  29. Let me try, consider an interface:

    interface Planet{ } 

    This is implemented by class:

    class Earth implements Planet { public $radius; public function construct($radius) { $this->radius = $radius; } } 

    You will use Earth as:

    $planet = new Earth(6371); $calc = new SurfaceAreaCalculator($planet); $calc->output(); 

    Now consider one more class which extends Earth:

    class LiveablePlanet extends Earth{ public function color(){ } } 

    Now according to LSP, you should be able to use LiveablePlanet in place of Earth and it should not break your system. Like:

    $planet = new LiveablePlanet(6371); // Earlier we were using Earth here $calc = new SurfaceAreaCalculator($planet); $calc->output(); 

    Examples taken from here

  30. @Override public void setHeight(double height) { this.height = height; this.width = height; // since it's a square } @Override public void setWidth(double width) { setHeight(width); }

Tasg: python, string